Saturday, February 23, 2019

Guessing Pareto Solutions

One of the challenges of multiple-criteria decision-making (MCDM) is that, in the absence of a definitive weighting or prioritization of criteria, you cannot talk meaningfully about a "best" solution. (Optimization geeks such as myself tend to find that a major turn-off.) Instead, it is common to focus on Pareto efficient solutions. We can say that one solution A dominates another solution B if A does at least as well as B on all criteria and better than B on at least one criterion. A solution is Pareto efficient if no solution dominates it.

Recently I was chatting with a woman about a multiple-criteria model she had. The variables were all binary (yes-no decisions) and were otherwise unconstrained. Normally my hackles would go up at an unconstrained decision problem, but in this case all constraints were "soft". For example, rather than specify a fixed budget, she would make cost one of the criteria (less being better) and let the decision maker screen out Pareto efficient solutions with unrealistic budgets.

I won't name the researcher or go into more details, since her work is currently under review (as of this writing), but I will look at a similar problem that is only slightly simpler. Namely, I will add the requirement that the criteria are additive, meaning that the value of any criterion for any "yes" does not depend on which other decisions wound up "yes" rather than "no". So, in mathematical terms, we have $N$ binary decisions ($x\in\{0,1\}^N$) and $M$ criteria. Let $c_{ij}$ denote the value of criterion $j$ when $x_i=1$. In what follows, I will want to be consistent in maximizing (rather than minimizing) criterion values, so I'll assume that (a) a "no" decision has value 0 for every criterion and (b) criteria are normalized so that $c_{ij}\in [0,1]$ for all $i$ and for all $j$ such that more is better, while $c_{ij}\in [-1,0]$ for all $i$ and for all $j$ such that (before scaling) less is better. For example, if cost is a criterion (and if you do not work for the Pentagon), the scaled values of cost will run from -1 for the most expensive choice encountered to 0 for any choice that is free.

The solution space contains $2^N$ candidate solutions, which means that enumerating them is out of the question for nontrivial values of $N$ (and, trust me, hers are nontrivial ... very, very nontrivial). In any case, the size of the Pareto frontier when $N$ is nontrivial will quickly outpace any decision maker's capacity (or patience). So it seems reasonable to sample the Pareto frontier and present the solution maker with a reasonably (but not excessively) large set of Pareto efficient solutions that hopefully is representative of the overall Pareto frontier.

There are methods for doing this, and the researcher pointed me toward one: the NSGA-II genetic algorithm [1]. Like any genetic algorithm, NSGA-II runs until you stop it via some limit (run time, number of generations, some measure of stagnation, ...). As I understand it, the final population of solutions are not guaranteed to be Pareto efficient, but the population converges in some sense to the Pareto frontier. That would likely be sufficient in practice, and the researcher had good luck with it initially, but ran into scaling problems as $N$ grew.

Looking at the problem reminded me of a sometimes troubling dichotomy faced by academics (in at least some disciplines, including operations research). On the one hand, real-world decision makers with real-world problems are often quite happy to get solutions that are obtained quickly and cheaply and are easily explained or understood, even if the technology is "primitive". Academics, on the other hand, usually need to publish their research. In the domain of the problem, low-tech algorithms may be perfectly fine (and easier to understand than more esoteric algorithms), but for a professor in IE or OR looking to publish in a journal in his or her field, "simple" usually means "too unsophisticated to publish". Thus we tend at times to "over-engineer" solutions.

Anyway, taking off my academic hat for a moment, I got to wondering whether a no-brainer random search approach would work for a problem such as this one. My premise was the following. Suppose I randomly select weights $\alpha_1,\dots,\alpha_M \in (0,1]$ and use them to form a single weighted criterion function $f_\alpha:\{0,1\}^N \rightarrow \Re$ as follows:$$f_\alpha (x)=\sum_{i=1}^N w_i(\alpha) x_i$$where$$w_i(\alpha)=\sum_{j=1}^M \alpha_j c_{ij}.$$Note that the weights $\alpha_i$ need to be strictly positive, not just nonnegative. Maximizing $f_\alpha$ over the solution space is trivial: set $x_i=1$ if $w_i(\alpha) > 0$ and $x_i=0$ if $w_i(\alpha) < 0$. (You can set $x_i$ to anything you like if $w_i(\alpha) = 0$, but that should have probability 0 of occurring given the random generation of $\alpha$.) Constructing the coefficients of $f_\alpha$ is $O(MN)$ and optimizing $f_\alpha$ is $O(N)$, so each solution $x$ produced this way takes $O(MN)$ time. In practice, $M$ is likely not to get too big, and very likely does not grow as $N$ does, so functionally this is really $O(N)$ in complexity.

It's trivial to show that, so long as none of the composite weights $w_i(\alpha)$ is 0, each $x$ found by optimizing some $f_\alpha$ must be Pareto efficient. So we can generate a sample of Pareto efficient solutions fairly efficiently ... maybe. I did deal one card off the bottom of the deck. Solutions will tend to repeat, so we will need to do some bookkeeping to ensure that, in the final results, duplicates are weeded out. That means comparing each new solution to its predecessors. Each comparison is $O(N)$, and if we assume that we only retain a "reasonable" number of alternatives (a number that does not grow with $N$) in order to avoid causing the decision maker's brain to explode, then time spent weeding duplicates should be $O(N)$. Overall, the load is $O(NR)$, where $R$ is the number of replications performed.

That left me with a few questions, the main one being this: for plausible sample sizes, how well does the set of solutions obtained represent the Pareto frontier? Is it "biased" in the sense that solutions cluster on one part of the frontier while missing other parts?

I'll show some experimental results in my next post, and let you decide.

[1] K. Deb, A. Pratap, S. Agarwal & T. Meyarivan (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6, 182-197.

I'm not sure about the best way to present a Pareto frontier myself. I did some pairwise plots of criteria, but plotting all $\binom{M}{2}$ pairwise plots might cause catatonia in the decision makers. I haven't tried this myself, but one possibility is to present selected pairwise plots ("selected" meaning comparisons in which the DM expresses an interest), and link them so that mousing over a point in one plot highlights the same solution in the other plots. It might or might not be effective as a decision aid, and in any case it's pretty decent eye candy. :-)