Thursday, October 16, 2014

Java Gotchas

I was writing what should have been (and, in the end, was) a very simple Java program yesterday. It wound up taking considerably longer than it should have, due to my tripping over two "gotchas", which I will document here for my own benefit (the next time I trip over them).

Issue 1: Importing from the default package


The program I was writing was actually a homework assignment for a MOOC. It involved a single Java class, and I was writing it in the NetBeans IDE (which will prove germane). I had earlier taken another MOOC (Algorithms, Part I, taught by Robert Sedgewick and Kevin Wayne of my alma mater, Princeton University -- highly recommended!) and downloaded a pair of Java libraries associated with their book (Algorithms, 4th ed.). The libraries are licensed under the GPLv3, and I wanted to use a convenience class from one in the new program. So I added it to the library list in my NetBeans project and serenely tried to import the class I wanted. And failed. Miserably.

After various unproductive dinking around (and swearing), I realized that the problem was that my new Java class was in a named package but the class I was importing was in the library's default package. Using named packages is the norm in Java -- NetBeans will issue a warning if you try to park a class in the default package -- but for reasons related to grading in their MOOC, Sedgewick and Wayne had put all their classes in the default package (and required classes written by students for homework to be in the default package). It turns out that Java will not let a class in any other package import from the default package. Classes in the default package can import from each other and from classes in named packages. I guess this is a security matter. Sedgewick and Wayne provide alternative versions of their libraries using named packages, and point out this very issue in the "Q & A" section of their download page, which I had read and immediately forgotten. Hence this note to myself (since I will inevitably forget again, in some other context).

Issue 2: "Could not find or load main class ..."


Once I resolved the aforementioned problem, I thought I was good to go -- no compilation errors -- and so I tried to run my program (emphasis on the word "tried"). NetBeans announce that it could not find or load my main class. The message is a trifle unhelpful, as it does not specify whether the issue is that the main class could not be found or that it could not be loaded.

Since I selected the main class from NetBeans's run configuration menu, it seemed apparent to me that the class could be found, and for the life of me I had no idea why it could not be loaded. So I was off on a Google search that found a fair number of (somewhat dated) pages with people asking for help and specifying the same error message. Unfortunately, none of the specifics were germane to me, and I did not find any responses that helped.

Eventually I correctly guessed this issue, and I'll record it here in case anyone else is searching for that error message. The problem in my case was the path to the NetBeans project. Recall that the program I was writing was for another MOOC. I had named the parent folder with the full name of the MOOC, including spaces and a colon (which I escaped). My operating system (Linux Mint) has no problem with the presence of the colon, but apparently NetBeans (or Ant, which NetBeans uses to process build scripts) did. As a veteran of DOS and survivor of early versions of Windows, I cut my teeth on 8.3 file names (and directory names), so I suppose I should know better than to get creative with folder names. Renaming the parent folder without any punctuation marks solved that problem.

Not an issue: Regex


This last item is not a problem, it's just a neat web site I found (in an answer on Quora) whose link I wish to preserve. Online regex tester and debugger provides a web application that lets you test regular expressions to see if they do what you think/want them to do. Very useful -- which I'd seen it sooner!

Sunday, October 12, 2014

The Reciprocal Normal Distribution

A recent question on OR-Exchange dealt with the reciprocal normal distribution. Specifically, if $k$ is a constant and $X$ is a Gaussian random variable, the distribution of $Y=k/X$ is reciprocal normal. The poster had questions about approximating the distribution of $Y$ with a Gaussian (normal) distribution.

This gave me a reason (excuse?) to tackle something on my to-do list: learning to use Shiny to create an interactive document containing statistical analysis (or at least statistical mumbo-jumbo). I won't repeat the full discussion here, but instead will link the Shiny document I created. It lets you tweak settings for an example of a reciprocal normal variable and judge for yourself how well various normal approximations fit. I'll just make a few short observations here:
  • No way does $Y$ actually have a normal distribution.
  • Dividing by $X$ suggests that you probably should be using a distribution with finite tails (e.g., a truncated normal distribution) for $X$. In particular, the original question had $X$ being speed of something, $k$ being (fixed) distance to travel and $Y$ being travel time. Unless the driver is fond of randomly jamming the gear shift into reverse, chances are $X$ should be nonnegative; and unless this vehicle wants to break all laws of physics, $X$ probably should have a finite upper bound (check local posted speed limits for suggestions). That said, I yield to the tendency of academics to prefer tractible/well-known approximations (e.g., normal) over realistic ones.
  • The coefficient of variation of $X$ will be a key factor in determining whether approximating the distribution of $Y$ with a normal distribution is "good enough for government work". The smaller the coefficient of variation, the less likely it is that $X$ wanders near zero, where bad things happen. In particular, the less likely it is that $X$ gets anywhere near zero, the less skewness $Y$ suffers.
  • There is no one obvious way to pick parameters (mean and standard deviation) for a normal approximation to $Y$. I've suggested a few in the Shiny application, and you can try them to see their effect.
I'd also like to give a shout-out to the tools I used to generate the interactive document, and to the folks at RStudio.com for providing free hosting at ShinyApps.io. The tool chain was:
  • R (version 3.1.1) to do the computations;
  • R Studio as the IDE for development (highly recommended);
  • R Markdown as the "language" for the document;
  • Shiny to handle the interactive parts;
  • various R packages/tools to generate the final product.
It's obvious that a lot of loving effort (and probably no small amount of swearing) has gone into the development of all those tools.

Thursday, October 2, 2014

Multiple Children - Again!

I thought I had exhausted this topic, but no such luck ...

As noted in a previous post ("When the OctoMom Solves MILPs"), CPLEX (and I believe most other integer programming solvers) are have a design limitation of at most two child nodes per parent node in the search tree. Personally, I don't consider that limitation a problem, but occasionally someone comes along wanting more children. In the aforementioned post, I described a way to work around the limitation by creating nodes that combined two or more of the intended children and then splitting those nodes. In a follow-up post ("Birthing a Litter in CPLEX"), I proved a test case (assigning students to teams so that average GPAs for all teams are comparable) and supplied some Java code. I'll repeat the key illustrations here for context.

The search tree our hypothetical user wants contains a child under the root for each choice for the number of teams to create.

What can actually be accomplished, using the workaround, looks like the following.

Note the presence of three "meta-nodes" (with team ranges marked alongside them in blue) in addition to the nodes from the first diagram.

A point was recently raised in a discussion about this on OR-Exchange: those extra meta-nodes require the solution of additional node LPs. My first reaction is that the extra pivots are not likely to be a problem, for two reasons. First, the default for CPLEX (and, again, I suspect most solvers) is to use dual simplex pivots to restore feasibility when solving the child node's LP (which differs from the parent's LP just by the cut(s) added to create the child node). Second, I suspect that some of the same pivots would occur if you were to solve the leaf node LPs directly. If a substantial portion of the pivots that solve the "6-7 teams" node would occur in both the "6 teams" node and the "7 teams" node, were those nodes spawned directly from the root, then solving the meta-node LP might actually be more efficient (by virtue of doing those pivots once rather than twice).

All that said, I got curious whether I could con (er, "induce") CPLEX into creating meta-nodes that were direct clones of their parents. The goal was a tree looking as follows, where the nodes marked with an asterisk (*) are direct clones of (identical to) their parents.

The answer, for the moment at least, is "yes". I have placed sample Java code in a Git repository open to the public for download. (There's also an issue tracker in the unlikely event that you should find a bug.) See the side panel for license information. It's based on the code from the second post mentioned above, and as such is slightly over-engineered (I did not take the time to eliminate features from the previous code that are not needed here).

The key lines, appearing in the branch callback, are the ones that create a child that is a direct clone of the parent:

// second child clones the parent with adjusted node data
info = new BranchInfo(min + 1, max);
id = makeBranch(new IloRange[0], obj, info);

The info variable holds "node data" that is attached to the clone node; variable id contains the node ID assigned by CPLEX to the clone node. Fair warning: the code relies on an "undocumented feature" (programmer oversight?) of the Java API, which means it could cease to work in some future version. The trick is to pass the makeBranch() method an empty (length 0) array of "cuts" to add. Note that passing null, rather than an empty array, results in an uncaught exception.

I have no idea whether other APIs have the same "feature".