OR in an OB World: mathematics

Showing posts with label mathematics. Show all posts

Friday, July 15, 2022

Left Matrix Inverses in R

The following question popped up on OR Stack Exchange: given an $m\times n$ matrix $A$ with $m > n,$ how can one find all left inverses of $A$ in R? The author mentions that dimensions in their case would be around 200x10.

A left inverse is a matrix $B\in \mathbb{R}^{n\times m}$ such that $B A = I.$ In order for a left inverse to exist, $A$ must have full column rank. If it does not, $Ax = 0$ for some nonzero $x\in \mathbb{R}^n,$ in which case $$x = Ix = (BA)x = B(Ax) = B\cdot 0 = 0,$$ a contradiction.

Henceforth we assume that $A$ has full column rank, in which case the left null space $\mathcal{N} \subseteq \mathbb{R}^m$ will have dimension $m-n.$ Now suppose that $C\in \mathbb{R}^{n\times m}$ has the property that every row of $C$ belongs to $\mathcal{N}.$ Then $CA = 0$ and so $(B+C)A = BA+0 = I,$ making $B+C$ another left inverse of $A.$ That means $A$ has an uncountably infinite number of left inverses. Conversely, if $DA=I$ then the rows of $D-B$ belong to $\mathcal{N}.$ So the set of left inverses of $A$ can be fully characterized by any individual left inverse and a basis for the left null space of $A.$

Getting this information in R is remarkably easy. There are multiple ways to compute a left inverse, but if you have the pracma library loaded, then the function pracma::pinv() can be used to compute the Moore-Penrose pseudoinverse, which is a left inverse of $A.$ To get a basis for the left null space of $A,$ we apply the function pracma::nullspace() to $A^T,$ which computes a basis for the right null space of the transpose of $A,$ and then transpose the results.

I have a small R notebook that demonstrates this on a random 200x10 matrix.

Monday, April 12, 2021

A Math Puzzle as a Network

There is a standard type of math puzzle that has been around at least since I was a child. The details vary, but the concept is consistent. You are typically given a few initially empty containers of various (integer) capacities, an essentially infinite reservoir of something that goes in the containers, and a goal (integer) for how much of that something you want to end up with. You have to figure out how to reach the goal without having any measuring instruments, meaning that your operations are limited to emptying a container into the reservoir, filling a container from the reservoir, or moving content from one container to another until you empty the source or fill the destination, whichever happens first. (All this is done under the assumption of no spillage, meaning the originator of the puzzle did not have me in mind.) I think I've seen a variant that involves cutting things, where your ability to measure where to cut is limited to stacking pieces you already have as a guide to the piece you want to cut.

A question popped up on Mathematics Stack Exchange about how to solve one of these puzzles using dynamic programming (DP) with backward recursion. The problem at hand involves two jugs, of capacities seven and three liters respectively, and a lake, with the desired end state being possession of exactly five liters of water. The obvious (at least to me) state space for DP would be the volume of water in each jug, resulting in 32 possible states ($\lbrace 0,\dots,7\rbrace \times \lbrace 0,\dots,3 \rbrace$). Assuming the objective function is to reach the state $(5,0)$ with a minimal number of operations, the problem can be cast just as easily as a shortest path problem on a digraph, in which each node is a possible state of the system, each arc has weight 1, and arcs fall into one of the categories mentioned in the previous paragraph.

I was looking for an excuse to try out the igraph package for R, and this was it. In my R notebook, a node label "5|2" would indicate the state where the larger jug contains five liters and the smaller jug contains two. Arcs are labeled with one of the following: "EL" (empty the larger jug); "FL" (fill the larger jug); "ES" (empty the smaller jug); "FS" (fill the smaller jug); "PLS" (pour the larger jug into the smaller jug); or "PSL" (pour the smaller jug into the larger jug).

Assuming I did not screw up the digraph setup, a total of nine operations are required to get the job done. If you are interested, you can see my code (and the solution) in this R notebook.

Tuesday, October 8, 2019

Numerical Inaccuracy (Linear Algebra Edition)

"There are three kinds of lies: lies, damned lies, and statistics." The origin of the statement is lost in time, but the sentiment lives on. I'm tempted to give it a 21st century update. "There are three kinds of lies: lies, damned lies, and floating point arithmetic."

I am currently working on a research project in which the following problem arises. We start with an $m \times n$ matrix $A$ with $m < n$, and a set of indices $I\subseteq \{1,\dots,n\}$. I need to confirm that $x_i = 0$ for all $i\in I$ and for all $x\in \mathcal{N}$, where $\mathcal{N} = \{x\in \mathbb{R}^n : Ax = 0\}$. There are various names for $\mathcal{N}$; I'll go with the null space of $A$.

How do you test whether a particular component of every vector in an uncountably infinite set is zero? Since $\mathcal{N}$ is a vector space, we can compute a basis for it and just check the basis vectors. If $x_i =0$ for every basis vector $x$ of $\mathcal{N}$, it will be true for every vector in $\mathcal{N}$.

This is going on in a Java program, and I have tried multiple linear algebra packages. Initially I had the idea to use a QR decomposition of $A$ to get a basis for the null space, but that requires (at least as far as I know) an upper triangular $R$ component, and neither of the two most promising packages I tried delivered one. (I'm pretty sure the Javadocs for one or both said $R$ would be upper triangular ... but it wasn't.) So I switched to singular value decomposition (SVD) using the EJML library. SVD is generally slower than QR decomposition, but tends to be more accurate.

I got an answer, but a check performed by the Java program warned me that some of the $x_i$ values that were supposed to be zero were in the vicinity of $10^{-6}$, which is a bit dicey to shrug off as rounding error. (The basis vectors are all normalized, by the way.) So I decided to port $A$ and some other stuff over to R and do some testing there ... which is when the adventure began.

The dimensions of this $A$ matrix were $m=401$ and $n=411$. EJML gave me a basis of 17 vectors, indicating that $A$ had nullity 17 and thus rank 394. (The nullity of matrix $A$ is the dimension of $\mathcal{N}$, and is in turn the difference between $n$ and the rank of $A$.) I tested this by doing QR and SVD decompositions of $A$ (using the qr() and svd() functions from the R base package) and the Rank() function from the R pracma package. The QR and SVD results indicated that $A$ had rank 393 and nullity 18. Only pracma::Rank() agreed with EJML ... and that came with the warning "Rank calculation may be problematic". (As if the rest of this stuff isn't.)

So EJML might have shorted me one dimension in the null space ... or maybe not. The smallest eigenvalues of $A$ from the SVD were as follows: 9.16e-05, 5.37e-06, 1.09e-10, 1.13e-15, 7.56e-16, 6.20e-16, 4.87e-16, 3.79e-16, 3.37e-16, 3.16e-16. So the last seven (italicized) were pretty clearly zero and the preceding one (bold) is maybe zero, maybe not. The SVD does not produce the ten zero eigenvalues resulting from the difference between $m$ and $n$, so basically we're looking at a nullity of 18 if you treat the bold one as zero and 17 if you don't.

The 17 basis vectors EJML gave me were all at least close to zero in all components $i\in I$. R had almost all the designated components $10^{-7}$ or smaller. The pracma::nullspace() function gave me 18 basis vectors (despite pracma::Rank() coming up with a nullity of 17), and some of the corresponding $x_i$ were pretty big (0.29, for example) -- way too big to be zero. So my condition is (probably) satisfied if I believe EJML and (definitely) not satisfied if I believe the R computations.

Hoping for some clarity, I calculated (in R) $Ax$ for each basis vector $x$ from either EJML or pracma::nullspace(). The products with the EJML basis vectors were all zero vectors (meaning no components larger than about $10^{-15}$ in magnitude). For the pracma basis vectors, the products had components around $10^{-9}$ in many cases. So much for clarity. Do we treat $10^{-9}$ as approximately zero and believe pracma, or do we call it nonzero and believe EJML?

I'm not new to the limitations of double-precision arithmetic. My first course in grad school was numerical analysis, and I've had a fair share of adventures with MIP models in CPLEX where the numerics were, shall we say, dicey. Still, it's rather frustrating to have to try multiple packages and then have to judge (meaning guess) which one is lying to me the least.

Tuesday, July 31, 2018

NP Confusion

I just finished reading a somewhat provocative article on the CIO website, titled "10 reasons to ignore computer science degrees" (when hiring programmers). While I'm not in the business of hiring coders (although I recent was hired as a "student programmer" on a grant -- the Universe has a sense of humor), I find myself suspecting that the author is right about a few points, overstating a few and making a few that are valid for some university CS programs but not for all (or possibly most). At any rate, that's not why I mention it here. What particularly caught my eye was the following paragraph:

It’s rare for a CS major to graduate without getting a healthy dose of NP-completeness and Turing machines, two beautiful areas of theory that would be enjoyable if they didn’t end up creating bad instincts. One biologist asked me to solve a problem in DNA sequence matching and I came back to him with the claim that it was NP-complete, a class of problems that can take a very long time to solve. He didn’t care. He needed to solve it anyway. And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms. But theoreticians are obsessed with the thin set that confound the simple algorithms, despite being rarely observed in everyday life.

Any of my three regular readers will know that I periodically obsess/carp about NP-fixation, so I'm sympathetic to the tenor of this. At the same time, I have a somewhat mixed reaction to it.

"... NP-complete, a class of problems that can take a very long time to solve." This is certainly factually correct, and the author thankfully said "can" rather than "will". One thing that concerns me in general, though, is that not everyone grasps that problems in class P, for which polynomial time algorithms are known, can also take a very long time to solve. One reason, of course, is that "polynomial time" means run time is a polynomial function of problem size, and big instances will take longer. Another is that $p(n)=n^{1000}$ is a polynomial ... just not one you want to see as a (possibly attainable) bound for solution time. There's a third factor, though, that I think many people miss: the size of the coefficients (including a constant term, if any) in the polynomial bound for run time. I was recently reading a description of the default sorting algorithm in a common programming language. It might have been the one used in the Java collections package, but don't quote me on that. At any rate, they actually use two different sorting algorithms, one for small collections (I think the size cutoff might have been around 47) and the other for larger collections. The second algorithm has better computational complexity, but each step requires a bit more work and/or the setup is slower, so for small collections the nominally more complex algorithm is actually faster.
"He didn’t care. He needed to solve it anyway." I love this. It's certainly true that users can ask coders (and modelers) for the impossible, and then get snippy when they can't have it, but I do think that mathematicians (and, apparently, computer scientists) can get a bit too locked into theory. <major digression> As a grad student in math, I took a course or two in ordinary differential equations (ODEs), where I got a taste for the differences between mathematicians and engineers. Hand a mathematician an ODE, and he first tries to prove that it has a solution, then tries to characterize conditions under which the solution is unique, then worries about stability of the solution under changes in initial conditions or small perturbations in the coefficients, etc., ad nauseum. An engineer, faced with the same equation, tries to solve it. If she finds the solution, then obviously one exists. Depending on the nature of the underlying problem, she may or may not care about the existence of multiple solutions, and probably is not too concerned about stability given changes in the parameters (and maybe not concerned about changes in the initial conditions, if she is facing one specific set of initial conditions). If she can't solve the ODE, it won't do her much good to know whether a solution exists or not.</major digression> At any rate, when it comes to optimization problems, I'm a big believer in trying a few things before surrendering (and trying a few optimization approaches before saying "oh, it's NP-hard, we'll have to use my favorite metaheuristic").
"And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms." I find this part a bit misleading. Yes, some NP-complete problems can seem easier to solve than others, but the fundamental issue with NP-completeness or NP-hardness is problem dimension. Small instances of problem X are typically easier to solve than larger instances of problem X (although occasionally the Universe will mess with you on this, just to keep you on your toes). Small instances of problem X are likely easier to solve than large instances of problem Y, even if Y seems the "easier" problem class. Secondarily, the state of algorithm development plays a role. Some NP-complete problem classes have received more study than others, and so we have better tools for them. Bill Cook has a TSP application for the iPhone that can solve what I (a child of the first mainframe era) would consider to be insanely big instances of the traveling salesman problem in minutes. So, bottom line, I don't think a "few pathological instances" are responsible for "gum[ming] up our algorithms". Some people have problem instances of a dimension that is easily, or at least fairly easily, handled. Others may have instances (with genuine real-world application) that are too big for our current hardware and software to handle. That's also true of problems in class P. It's just that nobody ever throws their hands up in the air and quits without trying because a problem belongs to class P.

In the end, though, the article got me to wondering two things: how often are problems left unsolved (or solved heuristically, with acceptance of a suboptimal final solution), due to fear of NP-completeness; and (assuming that's an actual concern), would we be better off if we never taught students (other than those in doctoral programs destined to be professors) about P v. NP, so that the applications programmers and OR analysts would tackle the problems unafraid?

Tuesday, July 19, 2016

Finding the Kernel of a Matrix

I'm working on an optimization problem (coding in Java) in which, should various celestial bodies align the wrong way, I may need to compute the rank of a real matrix $A$ and, if it's less than full rank, a basis for its kernel. (Actually, I could get by with just one nonzero vector in the kernel, but I'm greedy.)

So I spent a couple of days doing Google searches to see which open-source linear algebra libraries do what. My ideal library would be easy to install (no compiling from source), would support sparse matrices (which mine will be), would make it easy to find the kernel (which turned out not to be a given), and would run fast (without using my computer's GPU, which some of the libraries do). My search led me to install and test several libraries, only to discover that some did not support sparse matrices and some did certain things in rather peculiar ways, designed to make it hard if not impossible to drill down to the basis of the kernel. (One particularly annoying library threw an exception because the matrix I had just constructed, in the immediate preceding line using a sparse matrix constructor, was not considered sparse.) I found something that works, and I thought I'd document it here. If I find something better in the future, I'll post that as well.

Before proceeding, let me point out the Java Matrix Benchmark, which provides useful benchmarking information (and links to) a number of linear algebra packages.

What works for me, at least for now, involves the Apache Commons Mathematics library. This is one of the more commonly used (no pun intended) mathematics libraries in Java-land. Commons Math supports sparse matrices, at least for storage. (I'm not sure if computational operations, other than add/subtract/multiply, exploit sparsity.) It also does both QR and SVD decompositions. A number of responses on Q&A sites suggested using either QR (faster) or SVD (more numerically stable) decomposition of the matrix $A$ to get to its kernel. I opted for a QR decomposition. As I found out the hard way, though, not all QR decompositions are created equal.

Cutting to the chase scene, the key is to do a "rank-revealing" QR decomposition, which means using the RRQRDecomposition class, not the QRDecomposition class. What you decompose is actually $A^T$, the transpose of $A$. So if $A$ is an $m \times n$ matrix, the decomposition looks like$$A^T P = Q R,$$where

$P$ is an $m \times m$ pivot matrix,
$Q$ is an $n \times n$ orthonormal matrix (i.e., $Q^T Q = I$), and
$R$ is an $n \times m$ upper triangular matrix.

If $A$ has rank $r$, the last $n - r$ columns of $Q$ provide a basis for the kernel of $A$. (If $r = n$, $A$ is full column rank and the kernel is just $\{0\}$.)

I wrote a little test program (one short Java file) to make sure I was doing things correctly. It generates a random matrix, decomposes it, and confirms that the last however many columns of $Q$ really belong to the kernel of the matrix. If you want to see things in action, you can get the code from the blog's GitLab repository. You'll need to have a recent version of the Commons Math library (I used 3.6.1) on your class path. There are various parameters you can play with: a random seed; the dimensions of $A$; how dense $A$ should be; a rounding tolerance (how close to 0 counts as 0); and a flag which, if set true, tells the matrix generator to replace one column of $A$ with a random linear combination of the others (just to ensure that $A$ does not have full column rank).

Friday, February 20, 2015

The Geometry of a Linear Program

I frequently see questions on forums, in blog comments, or in my in-box that suggest the person asking the question is either unfamiliar with the geometry of a linear program, unsure of the significance of it, or just not connecting their question to the geometry. This post is just the starting point for addressing some of those questions. (I dislike reading or writing long posts, so I'm "chunking" my answer, and this is the first chunk.)

Ideally, the feasible region of a linear program (henceforth "LP") is a convex polytope, a geometric object that is convex, closed, bounded, and has a finite number of flat sides (intersecting at vertices or extreme points). Figure 1 shows a two dimensional example. The shaded area is the feasible region; the dots are the vertices.


Figure 1: Polytope

Being closed and bounded (and therefore, in finite dimensions, compact) ensures, via the extreme value theorem, that any continuous objective function attains its maximum and minimum over the feasible region at one or more points. Of course, the objective function of a linear or quadratic program is continuous. For a linear objective function, we can be more specific: the optimum will be attained at one or more extreme points. So we only need to check the vertices, and this in essence is what the famed simplex algorithm does.

To see why that is important, consider the following candidate for world's most trivial LP:\[ \begin{array}{lrcl} \textrm{minimize} & x\\ \textrm{subject to} & x & \ge & 0. \end{array} \] Its solution is, shockingly, $x=0$. Now suppose we try to use a strict inequality in the lone constraint:\[ \begin{array}{lrcl} \textrm{minimize} & x\\ \textrm{subject to} & x & \gt & 0. \end{array} \]The lower bound of the objective function is again 0, but it is never attained, since $x=0$ is not a feasible solution. This is a consequence of the feasible region not being closed. Technically, while the objective function ($f(x)=x$) has an infimum over the feasible region, it does not have a minimum. While trivial, this is a useful example of why we never use strict inequalities (>, <) in a mathematical program.

One step more general than a polytope is a polyhedron, which retains all the characteristics of a polytope except boundedness. Figure 2 illustrates a polyhedron.

Figure 2: Polyhedron

Again, we have a finite number of flat sides (the dark line segments) intersecting in vertices (the dots), and again the region is convex and closed. The arrows indicate recession directions, directions in which we can continue forever without leaving the polyhedron. (Any direction between those two is also a recession direction.) The presence of recession directions makes the question of whether the (continuous) objective function attains a maximum or minimum a bit trickier:

if the objective function degrades (gets bigger if minimizing, smaller if maximizing) along every recession direction, the optimal value of the objective function will be attained at one or more extreme points;
if the objective function improves (gets smaller if minimizing, bigger if maximizing) along any recession direction, the problem is unbounded (the objective function does not have a maximum or minimum, whichever one you were hunting), which likely means you omitted or screwed up a constraint; and
if the objective function is constant along at least one recession direction
and does not improve along any of them, the objective function achieves its optimum at one or more extreme points and along every ray starting from one of those extreme points and pointing in a recession direction where the objective stays constant.

Finally, a word about the role of convexity. Consider the feasible region in Figure 3, which is a polytope but is not convex.

Figure 3: Nonconvex polytope

Suppose that we want to find the point furthest to the right (maximize $x$, assuming that the horizontal axis represents $x$ and the vertical axis represents a second variable $y$). The optimal solution is clearly at point C. Suppose further that we currently find ourselves at point E, which is a local optimum but not a global one. For any algorithm to get from E to C, it has to move in a direction that decreases $x$ (making the objective function worse than at E, at least in the short run). That is a consequence of the fact that any line segment from E in the direction of C initially leaves the feasible region, which in turn follows from the lack of convexity. Most optimization algorithms (including the simplex algorithm) are not "smart" enough to do this; they tend to be myopic, moving only in directions that immediately improve the objective function. LPs are as tractable as they are in large part because their feasible regions are guaranteed to be convex.

Saturday, January 31, 2015

Making Math Fun

I was a math major from undergrad to doctorate, so obviously I think math is fun. Equally obvious to me (especially after teaching a variety of mathematical topics to college students), not everyone shares that opinion, which is too bad. Recently, though, I came across an organization devoted to making math fun for small children, and I wanted to share the link: Bedtime Math

Bedtime Math provides a variety of resources for introducing children to math in a fun way. The organization's name comes from the notion that a vignette with a math problem at its core makes a good bedtime story. They also encourage development of after-school math clubs, where kids can do fun activities with a mathematical theme.

If you are a parent of small children, and you want your kids to learn to enjoy mathematics, this seems to be a good starting point. There are books, videos and a daily math problem via email. There's even (shock!) an app for that.

Thursday, August 1, 2013

Numerical Analysis: Size Matters

A pure mathematician, an applied mathematician and an accountant all apply for the same job on Wall Street, and during their interviews they are all asked the same question: "What is the value of $(1-1/N)\times N + 1 - N$?" The pure mathematician replies "Zero, of course!" The applied mathematician asks "How large is $N$?" The accountant asks "What would you like it to be?" The accountant gets the job ...

... but the applied mathematician is on the right track, especially in an era where much research and virtually all application of mathematics is done on digital computers. I've been meaning to do some posts on numerical analysis for a while now, motivated in part by a large number of questions appearing on optimization support forums that indicate the topic is foreign to a sizable number of people using mathematical programming software. (The poster child for questions of this type: "Why did solver XXX say that the optimal value of my variable is 0.999999823 when it is a 0-1 variable??")

I was a math major end to end. In my first quarter of graduate school, I took a course on numerical analysis -- I can't recall now why, as I was a "pure" mathematician in those days -- and it was a real eye-opener. Identities that hold in symbolic mathematics (including "chalkboard arithmetic") don't necessarily survive the conversion to a digital medium. I have no intention of trying to compress that first course in numerical analysis into a blog post, or even a series of posts, but perhaps an example or two will at least get the attention of some people who previously had not heard of it. The numerical analysis course turned out to be one of the most valuable math courses I took (at any level), and I invariably recommend it whenever someone asks "What courses should I take to prepare for a career in Operations Research?"

I tested the formula in the above joke in a small Java program, using double-precision arithmetic, and the reported answer was zero no matter how large a value of $N$ I tried. This might or might not be a consequence of compiler optimizations -- the compiler doing the arithmetic in a different order than I specified it. So I'll fall back on an example from that graduate course: computing a partial sum of the (divergent) harmonic series $\sum_{n=1}^\infty 1/n$.

One of the first things anyone learns about math, at an early age, is that addition is ~~commutative~~ associative ... except, as it turns out, when a computer is involved. Here is a small Java program that computes $\sum_{n=1}^N 1/n$ twice, in one case summing in ascending order of $n$ ($1 + 1/2 + 1/3 + \dots + 1/n$) and in the other case summing in descending order of $n$ ($1/N + 1/(N-1) + \dots + 1$). I start with $N=1000$ and double $N$ each time through the main loop. To save myself repetition, let me stipulate that henceforth "large" and "small" refer to the magnitudes of numbers, without regard for their sign. If addition is commutative on the computer, the difference between summing in ascending and descending order should be zero.

for (int limit = 1000; limit <= 1024000; limit *=2) {
  double up = 0;
  double down = 0;
  for (int n = 1; n <= limit; n++) {
    up += 1.0/n;
  }
  for (int n = limit; n >= 1; n--) {
    down += 1.0/n;
  }
  double diff = up - down;
  System.out.println("Limit = " + limit + ", up = " + up + ", down = " 
                     + down + ", diff = " + diff);
}

Here is the output it generated:

N	1 + ... + 1/N	1/N + ... + 1	Difference
1000	7.485470860550343	7.485470860550341	2.6645352591003757E-15
2000	8.178368103610284	8.178368103610278	5.329070518200751E-15
4000	8.871390299795198	8.871390299795223	-2.4868995751603507E-14
8000	9.564474984261391	9.564474984261425	-3.375077994860476E-14
16000	10.257590915797937	10.257590915797914	2.3092638912203256E-14
32000	10.95072247160203	10.950722471602038	-8.881784197001252E-15
64000	11.64386183972301	11.643861839723002	8.881784197001252E-15
128000	12.337005114048072	12.337005114048194	-1.2256862191861728E-13
256000	13.030150341487087	13.030150341486952	1.3500311979441904E-13
512000	13.72329654548546	13.723296545485356	1.0480505352461478E-13
1024000	14.41644323776362	14.416443237764337	-7.176481631177012E-13

The first takeaway is that we're not seeing a difference of zero, ever. (Actually, for $N$ up to about 40, the reported difference is zero.) The differences are small, but not zero. There is no obvious pattern to their sign, nor is their size changing monotonically, but they do seem generally to be growing with $N$, which among other things means that if we have a really large value of $N$, we could get a fairly seriously inaccurate answer (at least in absolute, rather than relative, terms).

Why does this happen? The short answer is that computers store non-integral numbers only to a specific finite precision. That creates "truncation errors" in the representations of the numbers (0.1 as stored in a computer is not exactly 1/10) and "rounding errors" when arithmetic is performed. In particular, adding or subtracting a large number and a small number can produce rounding errors. In the extreme, the small number is smaller than the precision at which the large number is stored, and so the sum or difference of the two is identical to the large number as far as the computer is concerned. That sort of error is more likely to occur when $n$ is ascending (we are adding progressively smaller terms $1/n$ to progressively larger partial sums) than when $n$ is descending (at the outset, we add small numbers to small partial sums, and eventually larger numbers to larger sums); so I am inclined to trust the second set of partial sums more than I do the first. Neither, though, is exactly correct.

You might not be too excited about the differences in the table -- even at $N=1,024,000$ the difference is in the thirteenth decimal place -- but consider the following coding error probably made by every rookie programmer to tackle a mathematical computation: testing whether the difference of two floating point (non-integer) numbers equals zero when they should be testing whether it is close enough to zero. Many a program has entered an indefinite loop thanks to a test for equality and a smidgen of rounding error.

Update: John D. Cook has a related post in a different context (probability and statistics).

Saturday, November 17, 2012

NP-dopey

I have to make an up-front disclaimer: I have a hard time stifling yawns when discussions of which problems are in NP, NP-complete, NP-hard or NP-anything come up. In fact, I have a tempered enthusiasm for computational complexity in general. The mathematics is perfectly solid, it's of theoretical interest to some people, but I find it hard to connect much of it to the practice of mathematical optimization.

A couple of things prompt this post. First, Jean Francois Puget wrote a very nice blog post about what it means for a problem to be in class NP versus class P (and what the distinction is between an optimization problem and decision problem, as it relates to complexity classification). More importantly, from my perspective, he tied the discussion to customer concerns ("Why does it take so long to solve my problem?" "Can I solve larger instances in reasonable time?"). Part of what I'm going to say here was originally contained in a comment I posted to his blog, but apparently commenting on that blog is NP-difficult; my comment shows up blank. So I'll express myself here, where I control what actually gets approved.

The second trigger was an incident at the recent INFORMS 2012 conference. During one of the coffee breaks, I had a discussion with a few colleagues about publishing optimization papers, during which at least a couple of us agreed that authors have an unfortunate tendency to say "Our problem is NP-something, so we have to use a heuristic [and here's the wonderful heuristic we concocted]." Well, the Traveling Salesman Problem is NP-hard. Heck, I think it was voted the poster boy for NP-hard problems. Nonetheless, give me a network with three nodes, and I will give you an exact solution to it. Catch me after a cup of coffee and before too many beers, and I can even do a five node problem without resorting to a heuristic. Sure enough, not long after the coffee break I was in a session where a presenter said that his problem was NP-hard, and therefore he and his coauthor were forced to use a heuristic. (The presenter was a student, so it's not too late to redeem his soul.)

The following are my personal opinions and, based on conversations I've had, not widely endorsed within the academic community:

Small instances of NP-hard problems can be solved exactly. Large instances are intractable. The same is true for polynomial problems (class P). The definition of "small" is "I can solve this easily". The definition of "large" is "this bugger takes too long to solve".
There was excitement in the academic community when it was discovered that linear programs (or at least those satisfying some conditions I never bothered to memorize) can be solved in polynomial time. As George Dantzig pointed out (and if anyone has the reference for this, please post it in a comment!), a polynomial can be a big number. If I tell you that a particular problem class has an algorithm that is polynomial with complexity $O(n^{2593})$, where $n$ is, say, the number of nodes in a network, does that make it sound easy to you? It sure doesn't to me.
Complexity results are asymptotic. To me, saying that problem A is in class P and problem B is in class NP means that as instances of the two problem grow larger and larger, I'll probably hit the barrier for what I can solve (with a contemporary computing platform, in acceptable time) sooner with problem B than with problem A. The thing is, I don't usually have problems where the instances grow and grow without bound. I have small instances that I use for testing and debugging, and I have the actual instances that need to be solved. If I can solve the instances I need to solve, I really don't care if the algorithm is polynomial time or not.
Implicit in my previous statements were some things that I think get lost sometimes when people are throwing NP-ness around. A problem class belongs to P or NP. An individual instance of that problem does not. Also, computational complexity needs to be viewed at the algorithm level. Problems in class NP are those for which nobody knows a polynomial-time algorithm. Problems in class P are those for which we do know polynomial-time algorithms. That doesn't mean you're necessarily using one. The simplex algorithm has nonpolynomial (worst case) complexity, but it is applied to linear programs -- which, as mentioned above, are class P -- quite commonly.
Computational complexity of an algorithm is a worst-case, asymptotic bound. I've already taken exception with the asymptotic part, since I don't typically worry about how computation time changes as my problems morph from huge to disgustingly huge. As far as the worst-case aspect, I don't typically expect to be solving the worst case. (Some days it seems like I am, but I'm probably just being grumpy.) I'd be more interested in knowing what complexity is on "average" problems, but even that might be deceptive - there might be something specific to my versions of the problem that would favor an algorithm that doesn't have the best "average" run times.
In complexity measurement, there are constants that typically get overlooked. Suppose we have a class of problems where we can represent the size of an instance by a single dimension $n$. Suppose further that we have two algorithms, one (A) with complexity $O(n^3)$ and the other (B) with complexity $O(n)$. It's pretty clear that want to use the linear time algorithm (B) rather than cubic time algorithm (A), correct? Well, what the complexity stuff means is that (worst case) the time/work for A is bounded by $M_A n^3$ while the time/work for B is bounded by $M_B n$, where $M_A$ and $M_B$ are constants (that we typically do not know). Regardless of the values of those constants, \[\lim_{n\rightarrow \infty} \frac{M_B n}{M_A n^3}=0,\]so for large enough $n$, B is the winner. If $M_A \ll M_B$, though, then for small $n$, A is the faster algorithm. This isn't just a hypothetical. Take sorting a list. Tree sort ($O(n\log(n))$) is asymptotically faster than bubble sort ($O(n^2)$), but tree sorts involve a bunch of overhead, so bubble sorts really are faster on short lists. If you are going to sort a large number of short lists, and you implement tree sort (because, hey, it's got better complexity), you'll pay for it.
The question of whether P = NP remains undecided, as far as I know. (There have been a few allegations of solutions in the last few years, but I don't think any have survived close scrutiny.) Like pretty much everyone, I believe they're not equal ... but if it turns out that P really does equal NP, won't all the people wringing their hands over this or that being NP-hard feel silly?
For authors of academic papers, it's fine to include a statement (and, if it's not obvious, a proof) that your problem (class) is NP-whatever. That's another tidbit added to the body of human knowledge. Just don't think it's license to launch into building some funky new heuristic without first testing standard or obvious solution approaches (assuming such approaches exist).

So, am I out to thoroughly diss computational complexity? No. First, it gives us an indication of whether a problem class is likely to be difficult to solve. I don't know a better way to classify problems into scary-looking versus not scary-looking. I just wish people would not read more into the classification than it warrants.

Second, implementing algorithms usually involves more algorithms. Solving a MIP by branch and bound? How are you going to solve the node problems (meaning what algorithm will you use)? Computing shortest paths through a network as a step in some larger algorithm? Will you use Dijkstra, Floyd-Warshall or something else? How will you sort lists of things? What data structures (which contain algorithms for insertion and retrieval) will you use? When I'm coding an algorithm, I don't have time to test all possible combinations of all possible component algorithms; so I frequently grab the option that I think has best complexity because, frankly, I don't have anything better to go on.

So, bottom line, many instances of problems in class NP can be solved, and large enough instances of class P problems cannot be solved. To paraphrase a former boss (discussing hard work), I'm not afraid of NP-hard problems. I can lie down right next to them and sleep peacefully.

Other voices (in no particular order):

re: The Humbling Power of P v NP (Managing organized complexity)
NP Or Not NP? That Is The Question (IT Best Kept Secret is Optimization)
No, The TSP Isn't NP Complete (IT Best Kept Secret is Optimization)
Ekthetikophobia: The Fear of the Exponential (Operations Research and Analytics)

Sunday, September 23, 2012

Separable Benders Decomposition

A reader (I have readers??) asked me a question about Benders decomposition of a mixed-integer program (MIP) when the linear program (LP) subproblem is decomposable: how do we glue together a mix of optimality and feasibility cuts from the subproblems? The short answer is that we don't.

Rather than repeat a lot of definitions, I'll just refer you to a previous post for some background (if you need it; chances are you don't, because if you're not familiar with Benders, you've already tuned out). I will repeat the formulation of a generic decomposable MIP, just to establish notation:\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \]The MIP decomposes into a master MIP (containing $x$ and a real variable $z$ that serves as a surrogate for $c_{2}'y$) and a subproblem (LP) containing $y$.

Now suppose that $B$ and $E$ are both block-diagonal, with the same block structure and with $K$ blocks. We can decompose the subproblem into $K$ smaller LPs. In the master problem, we replace $z$ with $z_1+\dots+z_K$, with $z_k$ the surrogate for the objective contribution $c_{2k}'y_k $ from the $k$-th subproblem.

When we obtain an incumbent solution $(\hat{x},\hat{z}_1,\dots,\hat{z}_K)$ in the master, we pass $\hat{x}$ to each subproblem and solve. Some subproblems may be infeasible (generating feasibility cuts); others may have optimal solutions with $c_{2k}'y_k\gt\hat{z}_k $ (in which case we generate an optimality cut); and some may have optimal solutions with $c_{2k}'y_k=\hat{z}_k $, requiring no action. If any subproblem is infeasible, then $\hat{x}$ is infeasible in the original problem regardless of what goes on in other subproblems, and the feasibility cut from the subproblem is a valid feasibility cut in the master. Similarly, if $\hat{z}_k$ underestimates the objective value in subproblem $k$, then the optimality cut generated there is valid in the master regardless of what happens in other subproblems. So each feasibility/optimality cut generated in a subproblem can be added to the master problem without modification. There is no need to combine cuts (and, in fact, combining cuts might weaken them).

I'll close with two comments. First, although Benders is usually taught as "solve master, solve subproblem, add one cut, repeat", adding multiple cuts during one iteration is certainly legitimate and possibly beneficial (assuming, as in this case, that you discover multiple valid cuts). The cuts need not all be a single type (optimality or feasibility); a mix is perfectly fine. Second, while it is fine to solve every subproblem at every iteration, you may also opt to solve the subproblems in some order (perhaps cyclical?), abort the cycle the first time a subproblem generates a cut, and add just that one cut. This speeds up each cycle but delays finding cuts. I really do not know which way is better, but I suspect that optimality cuts generated from a value of $\hat{x}$ that is found to be infeasible in another subproblem may not be all that useful, and multiple feasibility cuts lopping off the same $\hat{x}$ might be a bit redundant. So I would be inclined to abort the cycle the first time I got a feasibility cut, and add just that cut, but add all the optimality cuts I found if no feasibility cut was generated.

Sunday, July 29, 2012

Is Mathematics Only for the Mathematically Inclined?

A colleague just sent me a link to an opinion piece in the New York Times Sunday Review with the somewhat provocative title "Is Algebra Necessary?" The author (Andrew Hacker, an emeritus professor of political science), while acknowledging the profound role of mathematics in society, questions whether we should require algebra (or, as I read the piece, anything beyond basic arithmetic and some basic level of statistical numeracy) of all school children. He quotes a professor of educational psychology (John P. Smith III) from Michigan State University asserting that “mathematical reasoning in workplaces differs markedly from the algorithms taught in school.” (For what it's worth, MSU has a nationally renowned College of Education ... but I have a hard time trusting anyone who looks that much like a young Geraldo Rivera.)

I don't want to get bogged down in the details of mathematics education in the U.S., and I certainly agree that the curriculum content (as well as the delivery) needs to be updated periodically. The mathematics of computing was not on anyone's radar when I was an urchin, and it's extremely important today. I'll also stipulate that not every adult in the U.S. (or elsewhere for that matter) needs to be proficient in algebra or trigonometry, let alone calculus. That said, my immediate reaction was that I thought I saw two flaws, one perhaps minor but the other crucial, in the author's argument.

I'll address the minor flaw first. Professor Hacker says (near the middle of the second page):

Of course, people should learn basic numerical skills: decimals, ratios and estimating, sharpened by a good grounding in arithmetic.

By "estimating" I assume he means computing (or intuiting) an approximate value (order of magnitude, interval, ...) for some uncertain quantity (variable). That implies the ability to determine a formula for the variable in terms of other parameters of the problem. For example, if I have a recipe (for a meal, or for a medication I must administer) that calls for a water-based solution with 40% active ingredient by volume, and I need five fluid ounces of the solution, about how much active ingredient do I need? That sounds to me like solving a simple linear equation, which I believe falls into the category of "algebra". (As a sidebar, in the event that this is a medication to be administered to me, and a nurse or pharmacist is doing the calculation, I would really appreciate it if they could do better than just an "estimate".) What if I know that I have five fluid ounces of active ingredient (which must all be used) and require a 40% solution? Can I estimate the amount of water required without knowing some basic algebra?

The major flaw with Professor Hacker's argument is that if we reduce K-12 mathematics education to the bare minimum, as he suggests, it will shut the doors to many college majors before students reach college. When I was in school, separate math classes did not begin until junior high school, so through grade 6 I received the same mathematics instruction everyone else did ... which in Professor Hacker's universe would be thin gruel indeed. I assume the system still operates approximately that way. The mathematically inclined public school student (which described me at that age) will hopefully be able to go beyond the basic curriculum ... somehow ... at least if his or her parents can afford private tutoring. The generic college-bound student, however, will matriculate with the ability to figure out the tip on a restaurant check, and not much else.

I'm reminded of an incident from graduate school. Back then, registration was done manually in a large gym called "The Pit", and our math department used graduate students to relieve the secretaries of some of the duty (doling out punch cards to students seeking classes). One summer, I was working The Pit with a fellow doctoral student, "Mean Dave" Green. (Don't ask about the nickname, just trust me: he earned it.) An undergrad came up and requested Algebra I. Dave looked at him and asked "Didn't you take that from me fall quarter?" The kid indeed had (and failed the class). Dave asked why he hadn't repeated it immediately, in winter quarter, while the memory was fresh. The kid had - and failed again. As he did spring quarter. So here he was asking for it for the fourth time. I innocently inquired what his major was. His answer: electrical engineering. The requirements for EE then were: algebra I and II; calculus I through IV; linear algebra; and ordinary differential equations.

Now picture that kid's children (or, um, grandchildren? sigh) reaching college today with Professor Hacker's suggested mathematics curriculum under their belts. Electrical engineering? Not going to happen; the math prerequisites alone would add two years to their degree, charitably assuming that a university was willing and able to staff that many remedial math classes, and charitably assuming that someone 18 years old with a carefully honed disdain or fear (or both) of mathematics is still able to learn math. Other engineering majors? Not likely. (If, somehow, Professor Hacker's position prevails and universities continue to pump out civil engineers nonetheless, remind me to stay off bridges and out of tunnels.)

Computer science uses enough mathematics that I suspect it would be tough to handle. Some business majors might still be open (personnel management?), but the more lucrative ones (finance and accounting in particular) might be off the table for those who had not gotten private tutoring. Physical sciences? So long.

Before I added "emeritus" to my title, I sometimes found myself answering questions from students (or parents) about how much math to take during freshman and sophomore years of college. My response was typically some variation of "as much as you can stand". Most college students do not commit to a major until their junior year. At that point, if you have racked up more math than your major requires, you can usually count the rest as elective credits; but if you are short of math, it can freeze you out of some majors. I've seen a similar phenomenon with graduate school: students who did a non-technical undergraduate major, taking the minimum amount of mathematics they could (sometimes none), and then decided they wanted to go to graduate school in a program with stiff mathematics requirements. Extrapolating that to cover students whose public school mathematics training stopped at counting and basic calculator punching frightens me.

We can legitimately debate what the basic K-12 mathematics curriculum -- the foundation for both attending college and going into a trade -- should be; but if we are going to reduce it as much as Professor Hacker suggests, we'd better come up with a genetic test that can identify a fetus's future trade or college major by birth.

Update: To my utter lack of surprise, other people are weighing in on this. I'm not actively looking for all the commentary (and I'm sure it will be voluminous), but as I come across posts I'll link them here.

Thursday, June 7, 2012

Sage Shutdown

I'm starting to learn how to use the Sage mathematics system, an incredibly comprehensive compilation of open-source mathematics software integrated with a unified scripting language (a dialect of Python) and a nifty browser-based notebook interface. So far, so good, with one minor but annoying exception. I invariably use the notebook interface, which launches a local web server (in Python, of course). When I exit all open notebooks, click the "sign out" link and zap the browser window(s), a bunch of processes spawned by Sage (including the server) remain running in the background.

You wouldn't know it to see my house or office, but I'm a bit of a "neat freak" when it comes to processes running on my PC. I'd like a clean shut down, but strangely enough there is no link in the browser interface that will shut down all Sage-related processes. One of those processes is the Sage "cleaner daemon", which theoretically watches for shutdown and tidies up ... except it doesn't, and there seems to be no way for me to tell it that I want Sage to shut down.

There is an answer, posted by William Stein (originator of the Sage project) himself. It's just not entirely satisfying. Following his advice, I created a two line shell script that looks up the PID of the cleaner process and kills it, which triggers a cascade that kills all other Sage-related processes. It works as advertised. I parked it in my $HOME/.sage directory (so that it will survive any upgrades to Sage) and added it to Mint's menu. Presumably the same thing, with minor modifications, works on Windows and OS/X.

All this strikes me as a bit user-unfriendly, though. It cannot be that hard to make the same thing available from the notebook interface, at least if you are logged in as the administrator. (In a multi-user environment, which does not apply to me, you would not want a generic user to have the ability to kill the server.) I'm not the only one wishing for this, either.

Saturday, July 16, 2011

Facts, Beliefs ... and Budgets

Melissa Moore (@mooremm), Executive Director of INFORMS, recently tweeted the following question: "What #ORMS or #Analytics tools would you use if you were asked to help solve the US Federal #budget/#debt impass?" My initial reaction (after verifying that a flammenwerfer is not considered an OR tool) was that OR and analytics would be of no use in the budget debate (debacle?). OR and analytics rely on facts and logic; it is unclear that either side of the debate is interested in facts or willing to be constrained by logic.

The question did set me to thinking about the difference between facts and beliefs. I have a hard time sorting out when demagogues, whether politicians or media bloviators, are espousing positions they actually believe and when they are simply pandering for ratings/votes. (My cynicism is hard won: I grew up in New York, went to school in New Jersey, and cast my first vote to reelect Richard M. Nixon. It's been downhill from there.) For the sake of argument, let's stipulate that both sides are acting on beliefs they truly hold. When I was younger it seemed to me that, however venal either side's motives might be, both the left and the right were capable of negotiating based on some common understanding of governance and the political, social and economic realities of the country they governed. It's hard to trade horses, though, when one side can't tell a horse from a zebra and the other can't tell a horse from a camel. Today, one party thinks that the answer to any question that does not contain the phrase "gay marriage" is "cut taxes". The other side thinks that the answer to any question that does not contain the phrase "gay marriage" is "tax the rich". That the proposed solution might not work is simply inconceivable (as is the possibility that the other side's solution might work).

The somewhat unnerving truth, however, is that everything we think we know as a fact (raw data aside) is ultimately a belief. My training is in mathematics. Casual users of mathematics, and even forgetful mathematicians, tend to think that what has been "proved" (i.e., a theorem) is definitively true. In reality, theorems are merely statements that must follow logically from a set of axioms (beliefs). The system of logic we accept is itself a matter of belief, but in the interest of avoiding a painful flashback to an undergraduate formal logic course I'll drop that line of thought right now. As in mathematics, so too in the physical sciences: theory arises from a mix of assumptions and empirical evidence; when new evidences clashes with the theory, modifications are made; and when the modifications become untenable, some assumption is altered or deleted and the theory is rebuilt. (Remember when the speed of light was a constant?)

So if mathematics and physical sciences are built on leaps of faith, we really cannot fault elected representatives (and economists) from doing the same. What we perhaps can demand, though, is that these beliefs at least be acknowledged as beliefs (not "proven facts"), and that decision makers attempt to examine the likely impact of any of those beliefs turning out false. As a parallel (pun deliberate), consider Euclid's Elements, written ca. 300BC, in which Euclid developed many theorems of what we now refer to as "Euclidean" geometry based on five postulates. The postulates appear self-evident, and mathematicians over the centuries tried unsuccessfully to derive one from the others (turning the derived one into a theorem). In the 19th century, Nikolai Lobachevsky famously replaced Euclid's fifth postulate with a negation of it, perhaps hoping to prove the fifth postulate from the others by contradiction. Rather than finding a contradiction, he invented hyperbolic geometry, which is not only consistent as a mathematical system but has actually found use (those bleeping physicists again).

So, back to the original question: can OR bring any useful tools to bear on the budget debate? With enough time and effort, and exploiting the systems perspective that underlies OR, perhaps we could diagram out the interplay of all the assumptions being made (consciously or unconsciously) by each side; and perhaps, using simulation models based on those assumptions and calibrated to historical data, we could explore the consequences of each side's preferred solution (or, for that matter, any compromise solution) should any specific assumption not hold up. It would be a massive undertaking, and I am not confident it would be productive in the end. Zealously held beliefs will not yield easily to "what if" analyses.

Tuesday, March 1, 2011

Math and Science Can Be Sexy

I just tripped over the following facts (one of which I already knew):

Hedy Lamarr, vintage Hollywood hottie, co-invented frequency hopping, used by cell phones today.
Danica McKellar, teen age TV star (and successful actress since), is co-author of the Chayes–McKellar–Winn theorem (as an undergraduate math major). (This one I knew.)
Mayim Bialik, who as a teen had the title role in the TV series "Blossom" and now plays a neurobiologist on "The Big Bang Theory", actually has a Ph.D. in neurobiology.

Now we just need some sexy male mathematicians and scientists (besides me, that is) to balance the list.

Sunday, January 9, 2011

Physiology and Mathematics Education

As a math lifer, I'm concerned about what more and more people are calling a crisis in STEM (Science, Technology, Engineering and Mathematics) education in the United States. It seems as if every few days we read about the U. S. lagging behind other countries in math or science test scores. We continue to lead the world in innovation for the moment, but it's hard to picture how we can sustain that lead without producing (or importing) more scientists and engineers (and, dare I say, mathematicians). Equally importantly, in the age of the "information economy", if average U. S. citizens (not the ones in the right tail on math scores) come up short in STEM ability, where are they going to find jobs that pay respectable salaries? (This is very important to me: I'm on the cusp of retirement, and I'm counting on them both to pay down the burgeoning national debt and to sustain my lifestyle once my paychecks stop.)

The press (and, thus, the populace) are gradually waking up to the looming crisis, as evidenced by government panels, legislation, academic workshops and periodic episodes of public hand-wringing. Naturally, this spawns a wave of finger-pointing as we seek (a) one single explanation for what is undoubtedly a phenomenon with multiple roots and (b) someone (other than ourselves) whom we can blame. A backgrounder from the Heritage Foundation aptly points out that if the K-12 pipeline does not produce students well grounded in science and math, colleges and universities will be hard-pressed to find students willing and able to major in those subjects (and, by extension, graduate program enrollments will also suffer).

Now please indulge me in an apparent digression that will actually tie back (I hope). I recently found myself engaged in a conversation about math education, math ability and its connection (if any) to gender. In this conversation, I recalled a colleague (a professor of economics) complaining to me that his daughter had essentially been told outright by a high school teacher that, being female, she should have diminished expectations for learning upper level mathematics (I suspect this meant calculus, but the conversation was long ago and the details elude me). My colleague was justifiably apoplectic.

I also recalled some misadventures from my graduate student days, when I taught a math class for elementary education majors. On paper, the course dealt with how to teach math to K-6 students. In practice, it meant (gulp) teaching K-6 math to college students majoring in elementary education. Lest you think I exaggerate, let me share an anecdote. My then girlfriend also taught the course, in the summer, when the students were elementary school teachers returning to pick up additional credits. One student asked if she could bring her eight year old son to class to avoid daycare hassles, which request my girlfriend was happy to accommodate. On the day of the first exam, she saw him sitting with nothing to occupy him, so she game him a spare copy of the exam, figuring he could color on the back. Instead, he flipped it over, did the exam -- and received the highest score in the class!

So, on the one hand, we have people telling children that they are doomed to be weak at math (and science?) because they lack a Y chromosome. I suspect similar comments are made based on race or other factors. On the other hand, we have teachers (at least in elementary school) who themselves are weak in math (and science?) and are inclined to pass their fear of the subject on to their students. (If the teacher finds something difficult, he or she is likely to communicate to the pupil that the pupil should not worry about finding it challenging.) On the gripping hand (and this is purely my conjecture), I suspect we also discourage students of all levels from taking STEM courses by rewarding them for weak work in non-STEM courses. I think it's harder to give inflated grades in STEM subjects because they tend to have definitive correct and incorrect answers. (At least it's harder for me to give them.) At the same time, it's easy for a student to be discouraged at having to work hard for medium grades in STEM subjects when they can get better grades with less effort in other classes.

What ties this to the STEM crisis, for me, is a recent article in Newsweek about physiological triggers for improvement (or degradation) in the human brain. Specifically, according to author Sharon Begley,

Finally, being told that you belong to a group that does very well on a test tends to let you do better than if you’re told you belong to a group that does poorly; the latter floods you with cortisol, while the former gives you the wherewithal and dopamine surge to keep plugging away.

So the effect of communicating diminished expectations to students may be more than psychological; it may trigger physiological changes that create a self-fulfilling prophecy ... and, in doing so, deepen the STEM crisis.

Thursday, December 23, 2010

Interesting Essay on Mathematics and Education

Many thanks to Larry at IEORTools for pointing out an essay by Dr. Robert H. Lewis of Fordham University entitled "Mathematics: The Most Misunderstood Subject". It's a worthwhile read for any math geek (and, face it, if you're reading OR blogs you're probably a math geek). Besides some insightful comments about the state of mathematical education, it's a nice reminder that math is not just about answers.

Tuesday, December 21, 2010

New Toy for the Holiday: Google N-gram Viewer

Staring at one of the graphs in a recent post on the bit-player blog ("Googling the lexicon"), I was struck by a downward trend in the frequency of the phrase "catastrophe theory". That led me to do a few searches of my own on Google's n-gram viewer, which is one of those insidiously fascinating Internet gadgets that suck away so much productivity. (Who am I trying to kid? It's Christmas break; how productive was I going to be anyway??) Being a closeted mathematician, I first did a comparison of three recent "fads" (perhaps an unfair characterization) in math: catastrophe theory, chaos theory and fractal geometry.

Then I took a shot with a few terms from optimization: genetic algorithms, goal programming, integer programming and linear programming.

I find it a bit surprising that chaos theory gets so much more play than fractal geometry (which has some fairly practical applications, as Mike Trick noted in his eulogy for Benoit Mandelbrot). Two other things (one I find serendipitous, one less so) struck me about the graphs.

The serendipitous note is that references to catastrophe theory are tailing off. I recall sitting at a counter in a chain restaurant during my graduate student days, scarfing down breakfast while trying to chew my way through a journal article on catastrophe theory (which was rather new at that time). Differential equations (especially nonlinear PDEs) were not my strong suit, to put it charitably, and the article was heavy going. A few years later social scientists heard about catastrophe theory. As I recall, the canonical image for it was a trajectory for a PDE solution on a manifold that "fell off a cliff" (picture your favorite waterfall here). Social scientists are in the main clueless about differential equations (let alone PDEs on manifolds), but waterfalls they get. The next thing I knew, all sorts of political science and maybe anthropology papers (not sure about the latter) were saying "catastrophe theory this" and "catastrophe theory that", and I'm pretty sure they were abusing it rather roundly. So I interpret the decline in references as at least in part indicating it has reverted to its natural habitat, among a fairly small group of mathematicians who actually grok the math.

The less serendipitous note is that all the graphs are showing downward trends. Since the vertical axis is in percentage terms, I cannot attribute this to a reduction in the number of published books, nor blame it on the Blackberry/iPod/whatever technical revolution. My fear is that interest in mathematics in general may be on the decline. Perhaps what we need now is for the next installment of the "Harry Potter" series to have Harry defeating the Dark Lord by invoking not incantations but mathematical models (but not catastrophe theory, please).

OR in an OB World