Tuesday, October 6, 2015

Producing Reproducible R Code

A tip in the Google+ Statistics and R community led me to the reprex package for R. Quoting the author (Professor Jennifer Bryan, University of British Columbia), the purpose of reprex is to
[r]ender reproducible example code to Markdown suitable for use in code-oriented websites, such as StackOverflow.com or GitHub.
Much has been written about the virtues of, and need for, reproducible research. Another key need for reproducibility, one at which this package aims, is when posting questions about code or bug reports. Viewers of those posts need to know exactly what you did and exactly what resulted. The readme text on the package's GitHub home page gives a work flow description and some prescriptive advice, which I think is well worth reading.

I'm all for complete and cogent bug reports/code questions and reproducible research, but I was interested in reprex for another reason: formatting R code for blog posts (such as this one). To date I've been using a third party web site (the Pretty R syntax highlighter) to generate HTML from R code, and I've been quite happy with the results. A simpler process would be nice, though. Additional, while the aforementioned site works great with the code, I'm sometimes not sure how I should format the output.

So I decided to take prerex for a test drive using code from an old post here (Tabulating Prediction Intervals in R). I used just the code from the first part of the post (definition of the model.ctable() function and one invocation of it), a total of 17 lines of source code (including Roxygen comments for the function) leading to a single output table. Using RStudio, my work flow was as follows.
  1. Open a new script file and type/paste the code into it.
  2. Source the file to confirm it works as expected.
  3. Copy the code to the clipboard.
  4. In the console window, run the following two lines.
    library(reprex)
    reprex()
    This runs the code in the clipboard, so be careful not to do anything to modify the clipboard contents between the previous step and this one.
  5. Examine the results in the viewer pane (which automatically opens) to confirm that is as expected.
  6. Open a new R Markdown file, delete the boilerplate RStudio inserts, and paste the contents of the clipboard into it. Along with displaying results in the viewer, the reprex() function also places the R Markdown code for it in the clipboard. Again, be careful not to modify the clipboard contents between step 4 and this one.
  7. Click the "Knit HTML" button and provide a destination file for the HTML output. This opens an HTML source file in RStudio.
  8. Copy the contents of the body tag (excluding the opening and closing body tags and ignoring the pile of stuff in the header) and paste into an HTML document. (Depending on the width of the output, you might want to surround it with a scrolling DIV tag, or hack the CSS you just pasted in to make it scrollable and/or give it a border.)
For this post, I added the following properties to the CSS .main-container style defined by reprex:

  overflow: scroll;
  border-style: groove;
  border-width: 5px;
  padding: 10px;

That created a border and a bit of padding, and told the browser to add scroll bars if needed. Here is how my example turned out:


Summarize a fitted linear model, displaying both coefficient significance and confidence intervals.
@param model an instance of class lm @param level the confidence level (default 0.95)
@return a matrix combining the coefficient summary and confidence intervals
model.ctable <- function(model, level = 0.95) {
  cbind(summary(model)$coefficients, confint(model, level = level))
}
x <- rnorm(20)
y <- rnorm(20)
z <- 6 + 3 * x - 5 * y + rnorm(20)
m <- lm(z ~ x + y)
model.ctable(m, level = 0.9)
#>              Estimate Std. Error   t value     Pr(>|t|)       5 %
#> (Intercept)  6.271961  0.2462757  25.46724 5.584261e-15  5.843539
#> x            2.974000  0.2571237  11.56642 1.763158e-09  2.526706
#> y           -4.951286  0.3260552 -15.18542 2.547338e-11 -5.518494
#>                  95 %
#> (Intercept)  6.700384
#> x            3.421294
#> y           -4.384079


You can see the comments, the code and, at the end, the output (formatted as R comments). It's not perfect. In particular, it would be nice if the Roxygen comments looked like comments and not like text. There's also no syntax highlighting (which is to be expected in an R Markdown document). Still, it's not bad for a blog post, and it confirms the package works (and is easy to use).

I'll close by pointing out that I'm going "off label" by using the package this way. In particular, I'm getting no value from one of the prime virtues of R Markdown: the ability to embed code in a text document such that the code can be easily read but can also be executed by "compiling" the document (not true of an HTML document like this post). For posting code to a forum, though, this looks like a definite keeper.

2 comments:

  1. Dear Prof. Rubin,

    Just wanted to let you know about two nice approaches to producing and displaying elegant and functional source code segments for blogging. I hope that the following information will be helpful.

    The first approach is based on your current blogging platform (Blogger) and, since I don't use Blogger, I don't have much to say about it other than to point to the right information. I'm talking about an open source software project SyntaxHighlighter (http://alexgorbatchev.com/SyntaxHighlighter). Since it is developed in JavaScript, the software is very flexible and integrates well with a wide range of blogging, wiki, CMS and other platforms (http://alexgorbatchev.com/SyntaxHighlighter/integration.html -- in section B, relevant to Blogger and valid/safe links are #2, #4 and #5 -- #1 is spam, others are irrelevant).

    However, if you ever will consider moving to much more flexible WordPress blogging platform (which I use for my personal site), I can recommend the second approach, which is very nice and plays well with R language. I'm talking about Enlighter WordPress plugin (https://wordpress.org/plugins/enlighter), based on EnlighterJS library (which, perhaps, might be used for Blogger platform as well). The plugin is very configurable and supports a wide range of programming languages. It supports R in the generic mode, as can be seen on this page: http://enlighterjs.andidittrich.de/Language.Generic.html (try theme selector to see the built-in styling).

    Best regards,
    Aleksandr Blekh

    ReplyDelete
    Replies
    1. Aleksandr: Thanks for the pointers! I'm the poster child for inertia, so a move to WordPress is not in the foreseeable future. The SyntaxHighlighter project looks interesting, although I can't find a "brush" for R. Also, it requires a number of script tags to be added to the header. I'm not averse to doing that, but it would create a bit of a problem. My blog is echoed on a "metablog", Spartan Ideas (http://spartanideas.msu.edu/), hosted at Michigan State University. I pass them HTML for the body of my posts, but I'm not sure how they'd feel about hacking their headers. If I can find an R brush, I may check with the student programmer than handles the code conversion.

      Delete

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.