Monday, November 7, 2016

Interactive R Plots with GGPlot2 and Plotly

I refactored a recent Shiny project, using Hadley Wickham's ggplot2 library to produce high quality plots. One particular feature the project requires is the ability to hover over a plot and get information about the nearest point (generally referred to as "hover text" or a "tool tip"). There are multiple ways to turn static ggplots into interactive plots, and I've experimented with a few. I ended up using the R plotly library, which worked well. (The free community version was sufficient for my purposes.) The Plotly site has a short introduction to facilitate getting started. I did bump into a few "wrinkles", and thought I would record how I handled them.

Before getting into details, I'll just mention the obvious: besides installing the libraries, you need to load them in your application with library(ggplot2) and library(plotly). Also, I'll point out the default behavior for the plotly library with regard to hover text: hovering shows a pop-up with the coordinates of the point over which (or close to which) you are hovering. You can optionally add text of your choice to the pop-up.

Partly as a consequence of how variables were named in the data frames being passed to the ggplot() function, I sometimes encountered difficulty getting the coordinates properly labeled. For instance, I might want the coordinates labeled "MPG" and "Residual" but I'd end up with "x" and "y". Eventually I decided the simplest solution was to suppress the display of the coordinates (displaying just text) and then put everything I wanted (coordinate values, coordinate labels, observation labels) into a single, formatted hover text entry. The details of how I did that are the gist of this post.


In the user interface file, displaying a plot is extremely simple. If the plot is going to be passed in output$myplot, the code to display it is just plotlyOutput("myplot"). In some cases, to avoid overly large plots or plots with unappealing aspect ratios, I would display them with plotlyOutput("myplot", height = "auto").


In the server file, assume that the plot in question (an instance of class "ggplot")  is in a variable p. The code to convert it for display via Plotly with default hover information is just output$myplot <- ggplotly(p). To suppress the default information and just show my custom text, it's output$myplot <- ggplotly(p, tooltip = "text").

When generating a plot with custom text, the text is provided as an "aesthetic", an argument to the aes() function either in the call to ggplot() or in the call to a "geom" being plotted. One of the more complicated instances in my application involves plotting a histogram of residuals from a regression model with a smoothed density curve superimposed. The code for that (in which the residuals are supplied in a data frame df containing a single column named "r") is as follows:

df %>%
   ggplot(mapping = aes(x = r)) +
   labs(x = "Standardized Residual", y = "Density") +
   geom_histogram(aes(y = ..density..,
                      text = sprintf("Std.Res.: %.2f<br>Count: %d", x, ..count..)
   ) +
   geom_density(color = "Navy",
                aes(text = sprintf("Std.Res: %.2f<br>Density: %.2f", x, ..density..)
   ) + 
   geom_vline(xintercept = 0, color = "DarkGreen")

Note the calls to sprintf() to format the tool tips (with different information for histogram bars and the density curve). Also note the use of the HTML break tag (<br>)  to separate lines within the tool tip.

The inevitable edge case

As von Moltke (the Elder) noted, no battle plan survives contact with the enemy ... and no coding scheme handles all cases. One of my plots, a QQ plot of residuals, tripped me up. I wanted to identify each point with its row name. No matter where I put aes(text = ...), it failed to work. So I resorted to an alternative approach.

Assume that the QQ plot (an instance of class "ggplot") is in variable p, the source data is in data frame df, and the residuals are in data frame rf, with the graph ending up in output$qplot. The code in the user interface is unchanged. The server code looks like the following:

output$qPlot <-
    z <- plotly_build(p)
    ix <- sort(rf, index.return = TRUE)$ix
    z$x$data[[1]]$text <- rownames(df)[ix]

Some explanation is clearly in order. The plotly_build() function turns a plot generated by ggplot() into a list of instructions to the plotly.js Javascript library. That in turn is formatted by the renderPlotly() function into something plotlyOutput() can digest. Within the list plotly_build() created (stored in local variable z), z$x$data[[1]]$text turns out to be a vector of hover text entries, one per point. I replace those with the row names from the data set. (I could use sprintf() to include the coordinates of the points, but for this particular application that wasn't particularly useful.)

Finally, I have to adjust for the fact that, in a QQ plot, observations are plotted in ascending order, not in their original order.  To match the correct row name to each observation, I sort the residuals, store the sort order index vector in ix, and then use that to reorder the row names.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.