OR in an OB World: RStudio

Showing posts with label RStudio. Show all posts

Monday, April 14, 2025

Retaining Libraries During R Upgrades

Today I was able to upgrade R to version 4.5, and was reminded in the process of a tedious "feature" of the upgrade.

R libraries are organized into two distinct groups. If you use RStudio, look at the "Packages" tab. You will see the heading "User Library" followed by whatever packages you have installed manually. Scroll down and you will get to another heading, "System Library", followed by another group of packages. These are the packages that were automatically installed when you installed R itself.

The upgrade from R 4.4 to R 4.5 was very easy, since it comes as a system package, at least on Ubuntu and Linux Mint (and presumably other Linux distributions). I'm not sure about Windows, macOS etc. The Mint update manager offered me updates to several system packages (r-base, r-base-core, r-base-html and r-recommended) among the morning's gaggle of updates. I just installed those and R 4.4.3 was replaced by R 4.5.0. That part could not be easier.

After the updates were done, I opened RStudio and looked to see if any packages there needed updates. The "System Library" group was there, and none of them needed updates (no shock since they had just been installed during the upgrade). The "User Library" did not exist. I should have known this was coming based on previous R upgrades, but I forgot.
You can of course reinstall all your previously installed libraries manually (if you can remember which ones they were), or you can just wait until something doesn't work due to a missing library and install it then. I prefer to reinstall them all at once, and I most definitely do not have the list memorized. The fix is easy if you know how to do it (and remember that you have to do it).

The first step is to open a file manager and navigate to the system directory where your libraries for the previous R version are stored. They will still be there. If you do not know where they are hiding, you can run the command .libPaths() and get a list of the directories in which R will look for libraries. One of them will contain the R version number. (It is consistently the first entry in the list when I do this, but I do not know if that will always be true.) In my case, the entry is "/home/paul/R/x86_64-pc-linux-gnu-library/4.5", which means I want to open "/home/paul/R/x86_64-pc-linux-gnu-library" in the file manager. There I find two directories, one for the previous version ( "/home/paul/R/x86_64-pc-linux-gnu-library/4.4") with lots of subdirectories and one for the new version ( "/home/paul/R/x86_64-pc-linux-gnu-library/4.5") that is empty. All it takes is copying or moving the contents of the older directory to the newer one. Once you have confirmed that the new R version can see the libraries (for instance, by observing that the "User Library" section has returned to the "Packages" tab in RStudio), you can delete the folder for the older version.

With that done, you will want to check for updates to the "User Library" packages. Several that I had installed needed updates today after moving to R 4.5. Updating them is done in the usual way.

I wonder if either R or RStudio has a "inherit libraries from previous version" function stashed away that would automate this? If so, I haven't found it.

Sunday, November 3, 2024

Xpress and RStudio

The following is probably specific to Linux systems. I recently installed FICO Xpress optimizer, which comes with an R library to provide an API for R code. FICO requires a license file (or a license server -- I went with a static file since I'm a single user) and adds an assortment of environment variable to the bash shell, including one pointing to the license file. So far, so good.

Xpress comes with example files, including example R scripts. So I cranked up RStudio, opened the simplest example ("first_lp_problem.R", which is just what it sounds like) and executed it line by line. The problem setup lines worked fine, but the first Xpress API call died with an error message saying it couldn't find the license file in directory "." (i.e., the current working directory). The same thing happened when I tried to source the file in the RStudio console.

To make a long story somewhat shorter, after assorted failed attempts to sort things out it occurred to me to run R in a terminal and source the example file there. That ran smoothly. So the problem was with RStudio, not with R. Specifically, it turns out that RStudio runs without loading any bash environment variables.

After assorted failed attempts at a fix (and pretty much wearing out Google), I found the following solution. In my home directory ("/home/paul", a.k.a. "~") I created a text file named ".Renviron". In it, I put the line "XPAUTH_PATH=/home/paul/.../xpauth.xpr", where "..." is a bunch of path info you don't need to know and "xpauth.xpr" is the name of the license file. If you already have a ".Renviron" file, you can just add this line to it. The example script now runs fine in RStudio. Note that there are a gaggle of other bash environment variables created by Xpress, none of which presumably are known to RStudio, but apparently the license file path is the only needed by the API (at least so far). If I trip over any other omissions later on, presumably I can add them to ".Renviron".

Monday, April 8, 2024

File Access in RStudio

I've been spending a fair bit of time in RStudio Desktop recently, much of it related to my work with INFORMS Pro Bono Analytics. I really like RStudio as a development environment for R code, including Shiny apps. It does, however, come with the occasional quirk. One of those has to do with how RStudio accesses the file system.

I tripped over this a couple of times recently when I wanted to open an R file that I had dropped in the /tmp directory on my Linux Mint system. The Files tab in RStudio appeared to be limited to the directory tree under my home directory. There was no way to browse to system directories like /tmp. Similarly, there is a way to set the default working directory (Tools > Global Options... > General > Basic > R Sessions). RStudio does not let you type in a directory name (perhaps a defense against typos?), and the Browse... button will not leave your home tree.

Initially I decided this was not important enough to worry about, but then I saw a post on the Posit Community forum by someone who was stuck trying to work from home due to a related issue. So I did a little experimentation and found a workaround, at least for the first problem (accessing files in places like /tmp). If I run setwd("/tmp") in the Console tab (which sets the working directory for the current R session), then click the More menu in the Files tab and select Go To Working Directory, the Files tab now browses /tmp, and I can navigate up to the system root directory and then down to anywhere within reason.

Changing the default starting directory is not something I actually care to do, but I'll document it here in case a reader might wish to do so. You can go to the IDE configuration directory (~/.config/rstudio on Linux and OS X, %appdata%\RStudio on Windows), open the rstudio-prefs.json file in a text editor, and change the value of the "initial_working_directory" entry to whatever starting directory you want. Save it, (re)start RStudio Desktop, and hopefully you begin in the right place.

Saturday, October 28, 2023

The Trouble with Tibbles

Apologies to Star Trek fans for the title pun, but I couldn't resist. (If you're not a Trekkie, see here for clarification.)

I've been working on a interactive R application (using Shiny, although that's probably irrelevant here). There are places where the code needs to loop through a data frame, looking for consecutive rows where either three text fields match or two match and one does not. Matching rows are copied into new data frames. The data I'm testing the code on has a bit over 9,000 rows, and the time spent on this process can take upwards of nine seconds -- not an eternity, but a bit annoying when you are sitting there waiting for it to hatch.

I decided to use the profiler in RStudio to see where time was being eaten up. Almost all the nine seconds was blamed on two steps. The biggest time suck was doing case-insensitive string comparisons on the three fields, which did not come as a big surprise. I went into the profiling process thinking the other big time suck would be adding a data frame to a growing list of data frames, but that was actually quite fast. To my surprise, the number two consumer of time was "df <- temp[n, ]", which grabs row n from data frame "temp" and turns it into a temporary data frame named "df". How could such a simple operation take so long?

I had a hunch that turned out to be correct. Somewhere earlier in the code, my main data frame (from which "temp" was extracted) became a tibble. Tibbles are modernized, souped-up versions of data frames, with extra features but also occasional extra overhead/annoyances. One might call them the backbone of the Tidyverse. The Tidyverse has its adherents and its detractors, and I don't want to get into the middle of that. I'm mostly happy to work with the Tidyverse, but in this case using tibbles became a bit of a problem.

So I tweaked the line of code that creates "temp" to "temp <- ... %>% as.data.frame()", where ... was the existing code. Lo and behold, the time spend on "df <- temp[n, ]" dropped to a little over half a second. Somewhat surprisingly, the time spent on the string comparisons dropped even more. So the overall processing time fell from around 9 seconds to under 1.3 seconds, speeding up the code by a factor of almost 7.

I'll need to keep this in mind if I bump into any other places where my code seems unusually slow.

Friday, July 6, 2018

Mint 19 Upgrade: Adventures #1-3

I use my laptop as the "canary in the coal mine" when it comes to do operating system upgrades, since there's nothing awesomely important on it. So today I tried upgrading from Linux Mint 18.3 to 19.0. Note that I used the upgrade path, rather than downloading the installer, burning it to a bootable disk, then installing from there. In hindsight, that might have been the faster approach. The upgrade took over an hour, and that's before any debugging.

The case of the not-so-missing library file

I hit the first of what will no doubt be several adventures when I reinstalled RStudio desktop and discovered it would not run. Despite the installer saying that all dependencies were satisfied, when I tried to run it from a command line I was told that a library file (libGL.so.1) could not be found.

I'll skip over another hour or so of pointless flailing and cut to the chase scene. It turns out that libGL.so.1 actually was installed on my laptop, as part of the libgl1-mesa-glx package. It was hiding in plain sight in /usr/lib/x86_64-linux-gnu/mesa/. Somehow, that folder had not made it onto the system library path. (I have no idea why.) So I ran the command

sudo ldconfig /usr/lib/x86_64-linux-gnu/mesa

and that fixed the problem.

Editor? We don't need no stinkin' editor

Next up, I couldn't find a text editor! Note that LibreOffice was installed, and was the default program to open text (.txt) files. Huh?? Poking around, I found nano, but xed (the default text editor in Mint 18) and gedit (the previous default editor) were not installed (even though xed was present before the upgrade).

Fixing this was at least (to quote a math prof I had in grad school) "tedious but brutally straightforward". In the software manager, I installed xed ... and xreader, also MIA. For whatever reason, the other X-Apps (xviewer, xplayer and pix) were already installed (as they all should have been).

The mystery of the launcher that wouldn't launch

Mint has a utility (mintsources) that lets you manage the sources (repositories, PPAs etc.) that you use. There is an entry for it in the main menu, but clicking that entry failed to launch the source manager. On the other hand, running the command ("pkexec mintsources") from a terminal worked just fine.

I found the original desktop file at /usr/share/applications/mintsources.desktop (owned by root, with read and write permissions but not execute permission). After a bunch of messing around, I edited the menu entry through the menu editor (by right-clicking the menu entry and selecting "Edit properties"), changing "pkexec mintsources" to "gksudo mintsources". That creating another version at ~/.local/share/applications/mintsources.desktop. After right-clicking the main menu button and clicking "Reload plugins", the modified entry worked. I have no idea why that works but "pkexec mintsources" does not, even though it does from a terminal. I tried editing back to "pkexec", just in case the mere act of editing was what did the trick, but no joy there. So I edited back to "gksudo", which seems to be working ... for now ... until the gremlins return from their dinner break.

Update: No sooner did I publish this than I found another instance of the same problem. The driver manager would not launch from the main menu. I edited "pkexec" to "gksudo" for that one, and again it worked. I guess "pkexec" is somehow incompatible with the Mint menu (at least on my laptop).

I'll close for now with a link to "Solutions for 24 bugs in Linux Mint 19".

Friday, May 6, 2016

Accessing R Objects By Name

At a recent R user group meeting, the discussion at one point focused on two of the possibly lesser known (or lesser appreciated?) functions in the base package: get and assign. The former takes a string argument and fetches the object whose name is contained in the string. The latter does the opposite, assigning an existing object to a variable whose name is a string argument.

I’ve actually used get once or twice, though not often. As an example of a use case, suppose that you create a Shiny interactive application that lets a user select one of the 104 data sets in the “datasets” package that comes with R, and then torture it in some way. In the user interface, you might give the user a list of names (or descriptions) of the data sets in a select list. Let’s say that the user chooses Fisher’s iris data set, and the name “iris” is returned in the variable input$ds. In the server code, you want to create a variable (we’ll call it df) containing the data frame to be analyzed. You can do that with df <- get(input$ds). After executing that line, df will contain the Fisher iris data.

I have a harder time finding a use for assign, but one of the moderators at the meeting said he uses it (and get) regularly to create new names for objects. As an example (mine, not his, so I’ll take the blame for it), you might decide for some reason that you want the 104 data frames in the “datasets” package renamed as “foo1” through “foo104”. His approach would be to use assign to do this. Given the same task, my first impulse would be to create a list associating “foo1” with the first data frame and so on.

We got into a discussion of whether a dynamically expanding list would be slower than using assign. I decided to do a little experiment, documented below. Spoiler alert: For 100 or so objects, there’s not a measurable difference.

The experiment

As noted above, the “datasets” package, which ships with R and is automatically put in the search path by default (i.e., you don’t have to load it explicitly with library or require), contains 104 data frames. I’ll assign each of them a new name, “foo1” through “foo104”, and then access them, first with assign/get and then with a list.

Setup

The first step is to load the list of database names into variable data.sets and then create the new names (in foonames).

data.sets <- ls("package:datasets")              # list of data set names
foonames <- paste0("foo", seq_along(data.sets))  # creates "foo1" ...

Next, I’ll count them (and put the result in count, for use later in loop constructs).

count <- length(data.sets)                       # count = 104

Timing an empty loop

To make sure the timing measurements that follow do not get too polluted by the time R needs to execute a simple “for” loop 104 times, I’ll separately time an empty loop first. The identity function is about as close as I could find to a “no-op” function in R. The last column in the output (“elapsed”), measured in seconds, is what is of interest.

system.time(for (i in 1:count) identity(i))

##    user  system elapsed 
##       0       0       0

So, on my fairly modest desktop PC, the looping itself consumes negligible time.

Using get and assign

I’ll now store the 104 data frames in a list, using the get function, and then assign new names to the list entries using the list. The reason for doing this, rather than just using get and assign together in a single line, is that I want to separate the timing of assign and the timing of get.

temp <- as.list(1:count)                                          # create a list of length 104
system.time(for (i in 1:count) (temp[[i]] <- get(data.sets[i])))  # put each data frame in the list

##    user  system elapsed 
##    0.02    0.00    0.02

Note that datasets[i] is a string containing the name of the i-th data frame. According to the output, 104 calls to get took negligible time.

Next, I’ll assign each data frame to the corresponding “foo” variable.

system.time(for (i in 1:count) assign(foonames[i], temp[[i]]))    # assign each data frame a new name

##    user  system elapsed 
##   0.001   0.000   0.001

Once again, the time is negligible.

Last, I’ll fetch each data frame using its new name.

system.time(for (i in 1:count) get(foonames[i]))                  # fetch each data frame by its "foo" name

##    user  system elapsed
##       0       0       0

This too takes negligible time. (Methinks a pattern is emerging.)

One more thing: just to make sure the code does what I claim, let’s print the fifth data frame and “foo5” and see if they match.

data.sets[[5]]    # get the name of the fifth data frame in the list

## [1] "anscombe"

anscombe          # list that data frame explicitly

##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

get("foo5")       # access foo5 using the get method

##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

foo5              # access foo5 by name

##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

Everything matches, so the code is behaving as expected.

Using a list

Before trying the list method, I will clear memory and recreate the list of foo names to avoid any corruption by results of the previous steps.

rm(list = ls())                                  # empty the environment
data.sets <- ls("package:datasets")              # list of data set names
foonames <- paste0("foo", seq_along(data.sets))  # creates "foo1" ...
count <- length(data.sets)                       # count = 104

My preferred approach starts by creating a list of entries with the form fooname = data frame:

my.list <- list()                                                              # create an empty list
system.time(for (i in 1:count) my.list[[foonames[i]]] <- get(data.sets[[i]]))  # fill the list with data frames

##    user  system elapsed 
##   0.001   0.000   0.001

Again, negligible time was consumed making the list. Next, I’ll time accessing the 104 data frames.

system.time(for (i in 1:count) my.list[[foonames[i]]])  # access each data frame from the list

##    user  system elapsed 
##       0       0       0

Finally, I’ll look at “foo5” to make sure it is what we expect (the Anscombe data base).

my.list[["foo5"]]  # is this Anscombe?

##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

Yes, it is.

Conclusion

For something on the order of 100 objects, using get and assign and using an ordinary list for them seem to work equally well, with no meaningful difference in execution time.

This may not generalize to really large numbers of objects, but individually naming (and accessing by name) tens or hundreds of thousands of objects (or more) strikes me as a fairly low likelihood scenario.

Source code

Reproducible research seems to be a popular topic these days. I generated this post (give or take a little massaging of HTML to fit the blog style) in RStudio using an R Markdown file, which you can download here. If you “knit” it in RStudio on your own machine, you will get timing results for your setup.

Tuesday, December 16, 2014

RStudio Git Support

One of the assignments in the R Programming MOOC (offered by Johns Hopkins University on Coursera) requires the student to set up and utilize a (free) Git version control repository on GitHub. I use Git (on other sites) for other things, so I thought this would be no big deal. I created an account on GitHub, created a repository for my assignment, cloned it to my PC, and set about coding things. As a development IDE, I'm using the excellent (and free) RStudio, which I was happy to discover has built-in support for Git. All went well until I committed some changes and tried to push them up to the GitHub repo, at which point RStudio balked with the following error message(s):

error: unable to read askpass response from 'rpostback-askpass'
fatal: could not read Username for 'https://github.com': No such device or address

I searched high and low in RStudio but could not find any place to enter credentials for the remote repository. No worries, thought I; I'll just add my public encryption key on GitHub, and use the private key on the PC, which works for me when I'm using the NetBeans IDE with BitBucket. Alas, no joy.

According to the error messages, the immediate issue seems to be not my password (I don't think the challenge got that far) but my user name. Git has a global value for my user name recorded on my PC, but it's not the same as my user name on GitHub. I was able to set a "local" user name, matching the one I have on GitHub, by opening a terminal in my R project directory and entering the command

git config user.name <my GitHub name, in quotes>
git config user.email <my email address>

That's a bit more arcane than what I would expect a beginner to know, but so be it. I thought that would fix the problem. It did not; the error message remained unchanged. I suspect that the issue is that Git now has two names for me (global and local-to-the-R-project). If I run

git config -l

in the project directory, I see the following:

user.name=<my global user name>
user.email=<my global email address>
...
user.name=<my GitHub user name>
user.email=<my GitHub email address, same as the global one>

With two user names to choose from, perhaps RStudio is grabbing the global one? Or perhaps I'm barking up an entirely incorrect tree trying to find the source of the error.

At any rate, I can't seem to push updates from the PC to GitHub using RStudio. Not to worry, though. There are other options. You can do it from the command line (if you are command-line user of Git, which for the most part I'm not). You can also use a separate Git client program, which is what I did. My Git GUI of choice is SmartGit, from which it is no chore to push (or pull) updates.

OR in an OB World