Monday, October 30, 2023

Publishing a Shiny Application

I've twice encountered problems while trying to deploy Shiny applications on the hosting platform, and since the fix was the same in both cases, I think a pattern is emerging. In both cases, the issue has to do with use of the includeMarkdown function to display help text written in Markdown and stored in a file within the application. The includeMarkdown function is provided by the htmltools library, which is apparently loaded automatically by the shiny library. So my code explicitly specifies library(shiny) but does not load htmltools.

In both problem cases, the code ran fine on my PC but not on the server. Why? A little "fine print" in the documentation of the includeMarkdown function mentions that it requires the markdown package. I have that installed on my PC, and apparently it gets loaded automatically. The server deployment system, though, apparently does not realize the need for it. So on the server my code loads the shiny library explicitly, and that causes the server to include the htmltools library but not the markdown library.

The solution is trivial: just add include(markdown) in the source code. The hard part is remembering to do it, given that it is unnecessary on my PC.

Saturday, October 28, 2023

The Trouble with Tibbles

Apologies to Star Trek fans for the title pun, but I couldn't resist. (If you're not a Trekkie, see here for clarification.)

I've been working on a interactive R application (using Shiny, although that's probably irrelevant here). There are places where the code needs to loop through a data frame, looking for consecutive rows where either three text fields match or two match and one does not. Matching rows are copied into new data frames. The data I'm testing the code on has a bit over 9,000 rows, and the time spent on this process can take upwards of nine seconds -- not an eternity, but a bit annoying when you are sitting there waiting for it to hatch.

I decided to use the profiler in RStudio to see where time was being eaten up. Almost all the nine seconds was blamed on two steps. The biggest time suck was doing case-insensitive string comparisons on the three fields, which did not come as a big surprise. I went into the profiling process thinking the other big time suck would be adding a data frame to a growing list of data frames, but that was actually quite fast. To my surprise, the number two consumer of time was "df <- temp[n, ]", which grabs row n from data frame "temp" and turns it into a temporary data frame named "df". How could such a simple operation take so long?

I had a hunch that turned out to be correct. Somewhere earlier in the code, my main data frame (from which "temp" was extracted) became a tibble. Tibbles are modernized, souped-up versions of data frames, with extra features but also occasional extra overhead/annoyances. One might call them the backbone of the Tidyverse. The Tidyverse has its adherents and its detractors, and I don't want to get into the middle of that. I'm mostly happy to work with the Tidyverse, but in this case using tibbles became a bit of a problem.

So I tweaked the line of code that creates "temp" to "temp <- ... %>%", where ... was the existing code. Lo and behold, the time spend on "df <- temp[n, ]" dropped to a little over half a second. Somewhat surprisingly, the time spent on the string comparisons dropped even more. So the overall processing time fell from around 9 seconds to under 1.3 seconds, speeding up the code by a factor of almost 7.

I'll need to keep this in mind if I bump into any other places where my code seems unusually slow.