Saturday, June 27, 2015

The rJava Nightmare

I like R. I like Java. I hate the rJava package, or more precisely I hate installing or updating it. Something (often multiple somethings) always goes wrong. I forget that for some reason I need to invoke root privileges when installing it. It needs a C++ library that I could swear I have, except I don't have the developer version of that library installed. Whatever. See this post from 2011 for a previous misadventure.

Today's battle was fought after a "successful" installation on my laptop. The good news: the package was installed, and RStudio knew it was installed. The bad news: neither RStudio, nor R run in a terminal, would let me load it. The key nugget in the error message: cannot open shared object file: No such file or directory
Now was definitely installed (as part of Oracle Java 8), so this was a matter of somehow breaking the news to R and/or RStudio. Search for "rJava" and "libjvm" on Google and you'll find a lot of hits, because I'm apparently not the first person to trip over this. Most solutions involving adding the requisite directory (which for me is ${JAVA_HOME}/jre/lib/amd64/server) to the LD_LIBRARY_PATH environment variable in your .profile file. I tried both that and .bashrc, with the result that R in a terminal agreed to load rJava but R running in RStudio still did not. Some of the posts I found said that this was expected if you made the tweeked .bashrc, but tweaking .profile instead was supposed to fix things. Except it didn't.

Fortunately, my laptop and desktop both run the same OS (Linux Mint), pretty much the same version, and rJava was working fine on the desktop. So the trick was to figure out how they differed. The route to does not appear in LD_LIBRARY_PATH on my PC, and is not mentioned in either of those profile files. It turns out that the path is set in the profile for R (/usr/lib/R/etc/ldpaths, to be specific), in the definition of a new variable R_JAVA_LD_LIBRARY_PATH. On the laptop, that was set to ${JAVA_HOME}/lib/amd64/server; on the PC, it was ${JAVA_HOME}/jre/lib/amd64/server. The latter path is correct on both the PC and the laptop. So I edited ldpaths on the laptop (as root), inserted the missing piece of the path, and lo and behold both R in a terminal and R inside RStudio could execute "library(rJava)" without bellyaching!

Note that, per my 2011 post, I had run sudo R CMD javareconf (more than once), with the correct value of the JAVA_HOME environment variable in place. Why that did not fix the path in ldpaths, and why the path was wrong in the first place, remain mysteries to me.

Update (12/30/16): I ran into the same problem, with a somewhat different cause. Updated details are in this post.

Thursday, June 25, 2015

Selecting the Least Qualifying Index

Something along the following lines cropped up recently, regarding a discrete optimization model. Suppose that we have a collection of binary variables $x_i \in B, \, i \in 1,\dots,N$ in an optimization model, where $B=\{0, 1\}$. The values of the $x_i$ will of course be dictated by the combination of the constraints and objective function. Now suppose further that we want to identify the smallest index $i$ for which $x_i = 1$ for use within the model (hence before solving the model). I'll consider two possibilities:
  1. we want to store that minimal index in an integer variable (which I will call $z$); or
  2. we want to create an alternative set of binary variables ($y_i \in B, \, i \in 1,\dots,N$) where \[ y_{i}=1\iff i=\text{argmin}\left\{ j:x_{j}=1\right\}. \]
I'll focus on the second case, which is a stepping stone to the first case. Once we have $y$ defined as in the second case, $z = \sum_{i=1}^n i y_i$ solves the first case.

To handle the second case, we can add the following constraints: \begin{alignat*}{2}
y_{i} & \le x_{i} & \; & \forall i\\
y_{i} & \ge x_{i}-\sum_{j<i}x_{j} &  & \forall i\\
iy_{i} & \le i-\sum_{j<i}x_{j} &  & \forall i>1
\end{alignat*}Left to the reader as an exercise: to confirm that this works (or, if you find an error, to kindly post it in a comment).

Friday, June 19, 2015

Alternative Versions of R

Fair warning: most of this post is specific to Linux users, and in fact to users of Debian-based distributions (e.g., Debian, Ubuntu or Mint). The first section, however, may be of interest to R users on any platform.

An alternative to "official" R

By "official" R, I mean the version of R issued by the R Foundation. Revolution Analytics has been a source of information, courses and a beefed-up commercial version of R (Revolution R Enterprise). They have also produced an open-source version, Revolution R Open. It is freely available from their Managed R Archive Network (MRAN). I'll leave the description of its features to them, and just mention the one that caught my eye: coupled with the Intel Math Kernel Library, it provides out-of-the-box multithreading, which should speed up long computational runs.

Revolution Analytics is now a Microsoft property, and I can't being to describe the cognitive dissonance caused by downloading open-source software from Microsoft. Nonetheless, I decided to install RROpen. One catch is that it requires reinstalling pretty much every R package I use, which can be a bit time consuming. The remainder of this post is about the other catch: how to get RROpen and "official" R (henceforth "R" without any qualifications) to coexist peacefully, on a Debian-based distribution (Mint in my case).

What's the problem?

RRO and R install in different directories, so why is anything special necessary to let them coexist? I found out the hard way. After installing RRO (on a system that already had R), I found that the command R in a terminal launched /usr/bin/R, which RRO had replaced with its R binary, making itself the default choice. That was fine for me. This morning, though, the system package manager notified me that the r-base package had been updated. I authorized the update, which among other things resulted in <sigh>/usr/bin/R being replaced by the "official" R version</sigh>.

That can be fixed manually from a terminal (as root), but it appeared that I was doomed to perpetually mediating between the two versions whenever the r-base package updated. This could be why Rich Gillin ("owner" of the Google+ Statistics and R community) recommended uninstalling R before installing RRO.

What's the solution?

Debian-based distributions of Linux use the Debian alternatives system to manage contention between different programs that have the same use. In essence, the system puts a link in a directory on the system command path that points not to a specific executable but to a directory containing choices for what program should answer to that name. Applied to our case, /usr/bin/R becomes a symbolic link to /etc/alternatives/R, which in turn is a symbolic link to the version of R you want to execute when you type R at a command prompt (or when, say, the RStudio IDE launches). In my case, that link points to /usr/lib64/RRO-3.2.0/R-3.2.0/lib/R/bin/R. (How would like to have to type that every time you wanted to run RRO?)

To manage alternatives, you can either use the update-alternatives command in a terminal or the Galternatives GUI for it. On my system, though, Galternatives does not entirely work. It lets you add alternatives once a base configuration is set, but it will not let you create (or delete) the initial configuration. So I went with the command line.

Assuming you have both RRO and R installed, here are the commands to set them up as alternatives:

sudo rm /usr/bin/R
sudo update-alternatives --install /usr/bin/R R /usr/lib64/RRO-3.2.0/R-3.2.0/lib/R/bin/R 200
sudo update-alternatives --install /usr/bin/R R /usr/lib/R/bin/R 100

The anatomy of the middle line is as follows:
Run this stuff as root.
Run the configuration update tool.
I want to install a new alternative for a program. Other choices include --configure to alter the default choice, --list to show alternatives for a program, and --display to show a list including priorities.
Where the link to the various alternatives will go. This should be somewhere on your command path. A program (such as RStudio) trying to execute R with no path, or using the default path for R, will look here and be forwarded to the chosen version. Similarly, if you just type R in a terminal, this is the program that will run.
The name used in the alternatives directory for the symbolic link to the actual executable.
The executable to be run.
A priority value. The highest priority alternative becomes the default choice, unless you manually override it.
To confirm that it worked, try running update-alternatives --display R in a terminal (no need to sudo this one). It should list your two choices for R, with priorities, and indicate which is the default choice. For me, that is RRO. If you want "official" R to be the default choice, switch the priorities.

The first time I tried this (with /usr/bin/R being the RRO executable), I omitted the first line, and update-alternatives informed me that it had not created /usr/bin/R (presumably because it was reluctant to overwrite what was already there, despite running with root privileges). So I suggest deleting the current entry before setting up the alternatives. Also, as an aside, after running the first two lines, I was able to us Galternatives to enter the third one, and I can use Galternatives to change which is the default choice for R.

I tried reinstalling the r-base package after doing this, and the default R choice remained RRO, so I think this will survive upgrades to either system. We'll see.