Monday, September 23, 2013

INFORMS 2013: Just Around the Corner

We're within about two weeks of the 2013 INFORMS Annual Meeting in beautiful Minneapolis, Minnesota. (Thank heavens for Google Maps. The atlas I had as a kid growing up in New York just said "Here Be Dragons" somewhere in the blank space west of Detroit.) I'll be a guest blogger during the meeting (October 6 through October 9), so I may be even less productive than usual on this blog. If you are curious, my first (pre-)conference post is already up (unless someone gets embarrassed and takes it down).

Thursday, September 5, 2013

A Minimum Variance Convex Combination

Someone asked a question on Quora that reduces to the following: how to solve the quadratic program (QP)\begin{gather*}\mathrm{minimize}\,\,f(x)=\frac{1}{2}x'Qx\\\mathrm{s.t.}\,\,\sum_{i=1}^n x_i=1\\x\in \Re^n_+.\end{gather*}The constraints make $x$ the weights in a convex combination of something. My impression of the original question was that $Q$ would be the covariance or correlation matrix of a set of $n$ random variables, in which case (a) $Q$ would be symmetric and at least positive semidefinite (positive definite if the variables are not multicollinear) and (b) the objective function would represent the variance (scaled in the case of that $Q$ is a correlation matrix) of the convex combination of those variables using weights $x$.

Assuming (as I will below) that $Q$ is symmetric and at least positive semidefinite, this problem is easily solved by a variety of QP solvers, and my first recommendation was to link to a suitable library. (The questioner intends to solve the problem within a C++ program.) The questioner, however, prefers to "roll his own" code: he will be solving a large number of small problems, and apparently believes the overhead getting in and out of libraries might be excessive. Personally, I'm a big fan of code libraries. I make every attempt not to reinvent the wheel, partly to save time and partly because my wheels somehow never come out completely round. To each his own, however. I suggested a few standard algorithms, including Rosen's gradient project method (for which, curiously, I cannot find a suitable link) and the generalized reduced gradient (GRG) algorithm.

While toddling over to a local coffee shop, though, a very simple (naive?) algorithm occurred to me. I do not have a formal proof of convergence, but it is easy to show that it cannot terminate at a suboptimal solution, so the only questions are (a) how slow it is compared to more sophisticated algorithms and (b) whether it is susceptible to jamming (zigzagging) (another term for which I cannot find a suitable link). The algorithm is predicated on the following observations:
  • if $x$ is feasible and I shift weight from one component of $x$ to another, moving from $x$ to $\hat{x}=x+\lambda d$ where\[d_k=\begin{cases}+1 & k=i\\-1 & k=j\\0 & i\neq k\neq j\end{cases},\]then$\sum_k \hat{x}_k=1$ and feasibility boils down to $\hat{x}_j\ge 0$, or equivalently $\lambda \le x_j$.
  • For $d$ defined as above, if we let $h(\lambda)=f(x+\lambda d)$, then \begin{gather*}h'(\lambda)=\nabla f(x+\lambda d)d \\ = (x+\lambda d)'Qd \\ = x'Qd + \left(d'Qd\right)\lambda \\ = \nabla f(x)d + \left(d'Qd\right)\lambda \\ = (g_i - g_j) + \left(Q_{ii} + Q_{jj} - 2Q_{ij}\right)\lambda,\end{gather*}where $g=Qx$ is the (transposed) gradient of $f$ at $x$.
Armed with that, the naive algorithm can be expressed in words as follows. Start with an interior feasible point (such as the center of the unit simplex). At each point, compute the gradient ($g$) and locate the smallest component among all variables (index $i$) and largest component among all variables that are not zero (index $j$). Clearly $g_i \le g_j$; if $g_i = g_j$, stop (you are unable to find an improving direction). Assuming $g_i < g_j$, set up the direction vector $d$ to shift weight from $x_j$ to $x_i$ in equal measure. The largest feasible step length is $x_j$; the step length at which $h$ turns from decreasing to increasing is $$\frac{(g_j - g_i)}{Q_{ii} + Q_{jj} - 2Q_{ij}}.$$The denominator of that fraction is strictly positive if $Q$ is positive definite. (If $Q$ is only positive semidefinite, there is a danger that it might be 0.) Pick whichever step length is smaller, take that size step in direction $d$, and iterate. Written in a mathematics/pseudocode mishmash:

$t\leftarrow 0$, $x^{(t)}\leftarrow\left(\frac{1}{n},\dots,\frac{1}{n}\right)$ // initialization
while (not converged) {
  $g^{(t)}\leftarrow Qx^{(t)}$ // gradient at $x^{(t)}$
  $i\leftarrow \mathrm{arg\,min}_k\left\{ g^{(t)}_k\right\}$ // index of smallest (most negative) gradient component
  $j\leftarrow \mathrm{arg\,max}_k\left\{ g^{(t)}_k | x^{(t)}_k > 0\right\}$ // index of largest gradient component with positive $x$
  $d^{(t)}_k\leftarrow \begin{cases} +1 & k=i\\ -1 & k=j\\ 0 & i\neq k\neq j \end{cases}$ // search direction
  $a^{(t)}\leftarrow g^{(t)}_i - g^{(t)}_j$
  $b^{(t)}\leftarrow Q_{ii} + Q_{jj} - 2Q_{ij}$ // derivative along search path is a + b*step
  if ($a^{(t)} = 0$) {
    STOP // solution is optimal
  } else if ($a^{(t)} + b^{(t)}x^{(t)}_j \le 0$) {
    $\lambda^{(t)}\leftarrow x^{(t)}_j$ // the objective decays along this direction all the way to the boundary
  } else {
    $\lambda^{(t)}\leftarrow -a^{(t)}/b^{(t)}$ // move until the directional derivative becomes zero
  $t\leftarrow t+1$
  $x^{(t)}\leftarrow x^{(t)} + \lambda d^{(t)}$ // take a step

I coded a couple of small tests in Java, using both CPLEX and the algorithm above. The CPLEX solver executed considerably faster than my scraggly, non-optimized Java code. No shock there. Interestingly, though, overall execution time (including printing and overhead for function calls) was very close, with my code about 1 ms. slower as measured by the system clock. Note that, at each iteration, I only have to do one matrix-vector multiplication (to get the gradient $g$, after which the remaining computations for that iteration are pretty trivial. Still, I'm surprised my algorithm came that close in terms of time. As far as accuracy goes, the solutions on test problems matched to within convergence limits.

Update: A couple of additional remarks are in order.
  • If $Q$ is only positive semidefinite and, at some iteration $t$, $b^{(t)}=0$, then one of two things happens. Either $a^{(t)}=0$ and we stop, or else $h'<0$ all the way to the boundary, and we take $\lambda^{(t)}=x^{(t)}_j$.
  • Other than at the outset, a full matrix multiplication is not required to update the gradient vector. It is easy to verify that $g^{(t+1)} = g^{(t)} + \lambda^{(t)}\left(Q_i - Q_j\right)$ where $i$ and $j$ are the indices selected at iteration $t$. (Sorry, I absolutely refuse to put a $(t)$ superscript on a subscript -- this is math, not particle physics.) So gradient updates amount to something like $n$ multiplications and $2n$ additions or subtractions. (If the iteration count gets high enough, you might want to do an occasional full $g=Qx$ multiplication just to clear out some of the rounding error, akin to occasionally refactoring an LP basis.)
I got curious enough to tweak my Java code and run a set of simulations. I generated covariance matrices randomly. In particular, the matrices were dense and scaled so that variances of the underlying random variables were typically around 2.5. My results might not apply if variances are larger or more spread out, or if there are lots of zero covariances. (Then again, those things might make no difference.) At any rate, I ran 100 replications with $n=10$, contrasting my results with those of CPLEX 12.5.1. I terminated iteration of my method when the step size fell below $10^{-9}$. What I got was the following:
  • mean execution times (including setup and extracting solution) of 0.13 ms. for my method v. 14.78 ms. for CPLEX;
  • mean objective ratio (my result/CPLEX result) of 0.9999999971843175 (so we're pretty much tied for optimality);
  • mean $L_2$ norm of the difference in weight vectors = $2.8\times 10^{-7}$.

Wednesday, September 4, 2013

"New" OR Educational Resource

Okay, it's not really new -- the author, Francisco Yuraszeck (professor at the Universidad Santa Maria in Chile) tells me he started it more than two years ago -- but it's new to me. Gestión de Operaciones  ("Operations Management") describes itself as a "blog on Management and Operations Research with tutorials and solved exercises". (At least that is what it says if Google Translate can be trusted; I had one Spanish class, in the summer before ninth grade, which barely allows me to order an occasional cerveza ... in a Castilian "lisp".) Professor Yuraszeck tells me he started it as an aid to OR students in all Spanish-speaking countries.

I have not actually poked around the examples and exercises, but they look pretty good on cursory inspection. The site is tied to a YouTube channel with OR videos (in Spanish), and there is at least one "sketch book" available for download. Almost all the content is free, the one current exception being an introductory ebook on linear programming available for a dollar or two (the minimal royalties for which help offset site costs).

Overall, the site seems to be very well done: organized well, easy to navigate, visually appealing, with lots of pertinent content. It would be nice to see the blog gain visibility.

Tuesday, September 3, 2013

There Was a World (and Music!) Before Punk Rock

This is in response to Laura McLay's latest blog post, "famous punk songs about climate change and optimization". Like so many young whippersnappers (and the former Soviet Union), she seems to think everything was invented by her generation. I'm here to refute that. (You'll probably want to read her post before this one, for context.)

Long before "punk rock" meant anything beyond the Jets and the Sharks:
I've probably just scratched the surface here. Heck, maybe it's true that "Everything Old Is New Again" (Peter Allen, cowritten with Carole Bayer Sager, 1974).

Update: Since Marc-André Carle brought up geometry, I feel obligated to point out that geometry begins with basic shapes:

STEM Crisis: Fact or Fiction?

After reading assorted articles about a looming crisis in the supply of qualified STEM (Science, Technology, Engineering, Mathematics) graduates, today LinkedIn pointed me to an article on the IEEE Spectrum site titled "The STEM Crisis Is a Myth". It seems to be cogent and fairly well researched, but I'm not sure I entirely buy the author's analysis. I will certainly stipulate this: determining whether there is a looming STEM crisis, its probable extent (if it does occur), and what to do about it are complex questions, involving fuzzy definitions, data that can be parsed in a variety of ways, and at times some rather heated opinions that can get in the way of careful analysis.

I don't have a position to defend in this debate, and I certainly don't have any answers. I do have some questions/concerns about some of the definitions, how some of the data is interpreted, and how some realities are perhaps ignored (or abstracted away) during the analysis ... and that's without getting into the questions of what constitutes a STEM degree or a "STEM job". In short, I'm happy to muddy the waters further. Here, in no particular order, are some thoughts:

Not all STEM graduates get "STEM jobs" ... nor need they

Do the debaters consider jobs in banking and finance to be "STEM jobs"? My guess is that the answer in most cases is no; and yet the banking and financial services industries employ significant numbers of mathematics and statistics graduates. Actuarial positions may be classified as "STEM jobs", but what about people who work designing portfolios or analyzing market trends? Wall Street is a leading employer of Ph. D. graduates in particle physics (see, for instance, this New York Times article), largely because the physicists (claim to) understand the Ito calculus, used to describe physical processes such as Brownian motion but also used in derivative pricing.

My personal opinion, formed well before the 2008 market crash, can be summed up as follows: Handing my retirement portfolio to people who think they can measure particles (tachyons) that exist only at speeds greater than the speed of light ... what could possibly go wrong with that? I cringed when I learned that my alma mater, Princeton University, has a department titled Operations Research and Financial Engineering -- and trust me, it's not the OR part at which I cringe. Personal prejudices aside, though, it seems reasonable to me that a portion of STEM graduates will legitimately be desired for (and hired into) jobs that are probably not considered "STEM jobs", siphoning a portion of our university output away from the jobs specifically targeting holders of STEM degrees.

People have changes of heart.

Years ago, I had a student in an MBA course (now a lifelong friend) who had a bachelors degree in a scientific field (chemistry or something related). She had worked for the Michigan Department of Natural Resources as a lab chemist before deciding that it was not her passion, and that she would rather work in the business sector. Years later, I had another student with a science degree (chemistry? biochemistry? chemical engineering?) who had worked in the pharmaceutical industry before joining the MBA program. After graduating, he worked in management jobs that leveraged his scientific background but would not be classified as STEM jobs. Another MBA student had a degree in nuclear engineering and had served as a propulsion engineer aboard a US Navy submarine.

In fact, a fairly sizable portion of our MBA program consisted of people with undergraduate STEM degrees (and, often related work experience) who had decided to go over to the Dark Side and join the ranks of management. In comparing supply of and demand for STEM degrees, we need to allow for the fact that some STEM degree holders will naturally choose, for reasons other than a lack of job opportunities, to pursue non-STEM careers.

There is more to employability than having the right degree.

An administrator once made the mistake of inviting me to participate in an orientation session for new MBA students, designed to foster camaraderie and esprit de corps. During the session, I remember saying the following: "Look at the person on your left. Now look at the person on your right. Statistically, one in three people is an asshole ... so if it's not one of them, it's you." (I made the statistic up, based on my experience growing up in New York.) The point I was making then was that candidates would be required to work in teams, just as in industry, and it was naive to assume that it would always be easy to get along with all your teammates.

Sadly, though, it is also true that some people just cannot coexist at all with other workers. Some are chronically absent or late. Some need to be nagged, wheedled or hand-held just to get things done. Some are larcenous. I see no inherent reason why a STEM degree would preclude any of those possibilities (and in fact I've met walking, talking counterexamples to that conjecture). Those people will often "fail" a job interview, no matter how solid their technical credentials, or they will be involuntarily separated from an employer in a way that follows them through subsequent interviews. Thus we have to allow for some slice, hopefully small, of the STEM degree holders to be unsuitable for employment in STEM jobs (unless the recruiters are desperate).

Educational standards in the US ain't what they used to be.

During my graduate studies in mathematics at Michigan State University, I was a teaching assistant. Like most TAs, I began teaching recitation sections of a large service course, whose title was (approximately) "Introduction to College Algebra". The course was taught in a large lecture by a faculty member. The primary roles of the TAs handling the recitation sections were to answer questions and grade homework and exams.

A year or so after I arrived, we were joined by a new graduate student with a bachelor's degree in mathematics from a "prestigious" university. (I shall not name the university, so as to avoid embarrassing the fine folks in Ann Arbor.) He too had a teaching assistantship. Fall quarter of his first year, he was assigned to teach a couple of sections of the introductory algebra course. Winter quarter he was pulled out of the classroom, and his sole duty as TA was to work out the answers to the problems in the textbook ... because the Powers That Be had discovered that he could not answer simple algebra questions in class (and not, apparently, due to stage fright). Had he chosen to work in industry, rather than going straight to graduate school, and had the recruiters done their jobs well, he might have contributed to the statistics representing STEM degree holders not working in STEM jobs.

Employers frequently complain (particularly when lobbying for more H1B visas) that they cannot find a sufficient number of STEM degree holders with the "right skill set". We can argue about who bears responsibility for a genuine lack of the right skills: universities with outmoded curricula; employers unwilling to pay for training; or degree holders unwilling to upgrade their skills on their own time. We can also speculate (and many people do -- see the comments on the IEEE Spectrum post) on how often the "right skill set" translates to "willing to work cheap". We also need to accept that in some cases the "right skill set" translates to "actually knows the subject matter for their awarded degree", and that this is not a given.