OR in an OB World: Threads and Memory

Friday, January 25, 2019

Threads and Memory

Yesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started paging memory to the hard drive, and the resulting hard drive thrashing made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.

How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)

Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.

2 comments:

Richard OberdieckJanuary 28, 2019 at 3:06 AM
Thanks for sharing this, this is really interesting. And computational tests seem to confirm that with the current software setup there is definitely an upper bound to the number of threads that give a speedup (seems to be 10-20 at the very most).

My question is though whether you think that we could fundamentally change this by changing the architecture of the algorithms? I.e. do all the nodes need to have a local copy of the model? Or is this just the way it is going to be?
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World

Friday, January 25, 2019

Threads and Memory

2 comments:

Previous Posts

Labels