Yesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started paging memory to the hard drive, and the resulting hard drive thrashing made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.
How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)
Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.