At a recent
R user group meeting, the discussion at one point focused on two of the possibly lesser known (or lesser appreciated?) functions in the base package:
get
and
assign
. The former takes a string argument and fetches the object whose name is contained in the string. The latter does the opposite, assigning an existing object to a variable whose name is a string argument.
I’ve actually used
get
once or twice, though not often. As an example of a use case, suppose that you create a
Shiny interactive application that lets a user select one of the 104 data sets in the “datasets” package that comes with R, and then torture it in some way. In the user interface, you might give the user a list of names (or descriptions) of the data sets in a select list. Let’s say that the user chooses Fisher’s iris data set, and the name “iris” is returned in the variable
input$ds
. In the server code, you want to create a variable (we’ll call it
df
) containing the data frame to be analyzed. You can do that with
df <- get(input$ds)
. After executing that line,
df
will contain the Fisher iris data.
I have a harder time finding a use for
assign
, but one of the moderators at the meeting said he uses it (and
get
) regularly to create new names for objects. As an example (mine, not his, so I’ll take the blame for it), you might decide for some reason that you want the 104 data frames in the “datasets” package renamed as “foo1” through “foo104”. His approach would be to use
assign
to do this. Given the same task, my first impulse would be to create a list associating “foo1” with the first data frame and so on.
We got into a discussion of whether a dynamically expanding list would be slower than using
assign
. I decided to do a little experiment, documented below.
Spoiler alert: For 100 or so objects, there’s not a measurable difference.
The experiment
As noted above, the “datasets” package, which ships with R and is automatically put in the search path by default (i.e., you don’t have to load it explicitly with
library
or
require
), contains 104 data frames. I’ll assign each of them a new name, “foo1” through “foo104”, and then access them, first with
assign
/
get
and then with a list.
Setup
The first step is to load the list of database names into variable
data.sets
and then create the new names (in
foonames
).
Next, I’ll count them (and put the result in
count
, for use later in loop constructs).
Timing an empty loop
To make sure the timing measurements that follow do not get too polluted by the time R needs to execute a simple “for” loop 104 times, I’ll separately time an empty loop first. The
identity
function is about as close as I could find to a “no-op” function in R. The last column in the output (“elapsed”), measured in seconds, is what is of interest.
So, on my fairly modest desktop PC, the looping itself consumes negligible time.
Using get and assign
I’ll now store the 104 data frames in a list, using the
get
function, and then assign new names to the list entries using the list. The reason for doing this, rather than just using
get
and
assign
together in a single line, is that I want to separate the timing of
assign
and the timing of
get
.
Note that
datasets[i]
is a string containing the name of the i-th data frame. According to the output, 104 calls to
get
took negligible time.
Next, I’ll assign each data frame to the corresponding “foo” variable.
Once again, the time is negligible.
Last, I’ll fetch each data frame using its new name.
This too takes negligible time. (Methinks a pattern is emerging.)
One more thing: just to make sure the code does what I claim, let’s print the fifth data frame and “foo5” and see if they match.
Everything matches, so the code is behaving as expected.
Using a list
Before trying the list method, I will clear memory and recreate the list of foo names to avoid any corruption by results of the previous steps.
My preferred approach starts by creating a list of entries with the form
fooname = data frame:
Again, negligible time was consumed making the list. Next, I’ll time accessing the 104 data frames.
Finally, I’ll look at “foo5” to make sure it is what we expect (the Anscombe data base).
Yes, it is.
Conclusion
For something on the order of 100 objects, using
get
and
assign
and using an ordinary list for them seem to work equally well, with no meaningful difference in execution time.
This may not generalize to really large numbers of objects, but individually naming (and accessing by name) tens or hundreds of thousands of objects (or more) strikes me as a fairly low likelihood scenario.
Source code
Reproducible research seems to be a popular topic these days. I generated this post (give or take a little massaging of HTML to fit the blog style) in
RStudio using an
R Markdown file, which you can
download here. If you “knit” it in RStudio on your own machine, you will get timing results for your setup.