get
and assign
. The former takes a string argument and fetches the object whose name is contained in the string. The latter does the opposite, assigning an existing object to a variable whose name is a string argument.I’ve actually used
get
once or twice, though not often. As an example of a use case, suppose that you create a Shiny interactive application that lets a user select one of the 104 data sets in the “datasets” package that comes with R, and then torture it in some way. In the user interface, you might give the user a list of names (or descriptions) of the data sets in a select list. Let’s say that the user chooses Fisher’s iris data set, and the name “iris” is returned in the variable input$ds
. In the server code, you want to create a variable (we’ll call it df
) containing the data frame to be analyzed. You can do that with df <- get(input$ds)
. After executing that line, df
will contain the Fisher iris data.I have a harder time finding a use for
assign
, but one of the moderators at the meeting said he uses it (and get
) regularly to create new names for objects. As an example (mine, not his, so I’ll take the blame for it), you might decide for some reason that you want the 104 data frames in the “datasets” package renamed as “foo1” through “foo104”. His approach would be to use assign
to do this. Given the same task, my first impulse would be to create a list associating “foo1” with the first data frame and so on.We got into a discussion of whether a dynamically expanding list would be slower than using
assign
. I decided to do a little experiment, documented below. Spoiler alert: For 100 or so objects, there’s not a measurable difference.The experiment
As noted above, the “datasets” package, which ships with R and is automatically put in the search path by default (i.e., you don’t have to load it explicitly with
library
or require
), contains 104 data frames. I’ll assign each of them a new name, “foo1” through “foo104”, and then access them, first with assign
/get
and then with a list.Setup
The first step is to load the list of database names into variable
data.sets
and then create the new names (in foonames
).data.sets <- ls("package:datasets") # list of data set names
foonames <- paste0("foo", seq_along(data.sets)) # creates "foo1" ...
Next, I’ll count them (and put the result in
count
, for use later in loop constructs).count <- length(data.sets) # count = 104
Timing an empty loop
To make sure the timing measurements that follow do not get too polluted by the time R needs to execute a simple “for” loop 104 times, I’ll separately time an empty loop first. The
identity
function is about as close as I could find to a “no-op” function in R. The last column in the output (“elapsed”), measured in seconds, is what is of interest.system.time(for (i in 1:count) identity(i))
## user system elapsed
## 0 0 0
So, on my fairly modest desktop PC, the looping itself consumes negligible time.
Using get and assign
I’ll now store the 104 data frames in a list, using the
get
function, and then assign new names to the list entries using the list. The reason for doing this, rather than just using get
and assign
together in a single line, is that I want to separate the timing of assign
and the timing of get
.temp <- as.list(1:count) # create a list of length 104
system.time(for (i in 1:count) (temp[[i]] <- get(data.sets[i]))) # put each data frame in the list
## user system elapsed
## 0.02 0.00 0.02
Note that
datasets[i]
is a string containing the name of the i-th data frame. According to the output, 104 calls to get
took negligible time.Next, I’ll assign each data frame to the corresponding “foo” variable.
system.time(for (i in 1:count) assign(foonames[i], temp[[i]])) # assign each data frame a new name
## user system elapsed
## 0.001 0.000 0.001
Once again, the time is negligible.
Last, I’ll fetch each data frame using its new name.
system.time(for (i in 1:count) get(foonames[i])) # fetch each data frame by its "foo" name
## user system elapsed
## 0 0 0
This too takes negligible time. (Methinks a pattern is emerging.)
One more thing: just to make sure the code does what I claim, let’s print the fifth data frame and “foo5” and see if they match.
data.sets[[5]] # get the name of the fifth data frame in the list
## [1] "anscombe"
anscombe # list that data frame explicitly
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
get("foo5") # access foo5 using the get method
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
foo5 # access foo5 by name
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
Everything matches, so the code is behaving as expected.
Using a list
Before trying the list method, I will clear memory and recreate the list of foo names to avoid any corruption by results of the previous steps.
rm(list = ls()) # empty the environment
data.sets <- ls("package:datasets") # list of data set names
foonames <- paste0("foo", seq_along(data.sets)) # creates "foo1" ...
count <- length(data.sets) # count = 104
My preferred approach starts by creating a list of entries with the form fooname = data frame:
my.list <- list() # create an empty list
system.time(for (i in 1:count) my.list[[foonames[i]]] <- get(data.sets[[i]])) # fill the list with data frames
## user system elapsed
## 0.001 0.000 0.001
Again, negligible time was consumed making the list. Next, I’ll time accessing the 104 data frames.
system.time(for (i in 1:count) my.list[[foonames[i]]]) # access each data frame from the list
## user system elapsed
## 0 0 0
Finally, I’ll look at “foo5” to make sure it is what we expect (the Anscombe data base).
my.list[["foo5"]] # is this Anscombe?
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
Yes, it is.
Conclusion
For something on the order of 100 objects, using
get
and assign
and using an ordinary list for them seem to work equally well, with no meaningful difference in execution time.This may not generalize to really large numbers of objects, but individually naming (and accessing by name) tens or hundreds of thousands of objects (or more) strikes me as a fairly low likelihood scenario.
Source code
Reproducible research seems to be a popular topic these days. I generated this post (give or take a little massaging of HTML to fit the blog style) in RStudio using an R Markdown file, which you can download here. If you “knit” it in RStudio on your own machine, you will get timing results for your setup.
I have used assign() to give objects descriptive names, add those names to a character vector, then for-loop through the character vector to retrieve the objects in later uses. Granted, this was before I had a better understanding on how to create lists, but I think it's still a useful technique.
ReplyDeleteRalph
Looping is worth some thought. One can apply a number of looping functions, including a for-loop, to a list, but I suspect in some cases looping over vectors is easier to get right than looping over lists.
DeleteAgreed, looping through lists is a black art.
ReplyDelete