So I wanted a solution that was moderately robust with respect to extra spaces, capitalization, and abbreviation. A Google search turned up several solutions involving string manipulation, none of which entirely appealed to me. So I rolled my own, which I'm posting here. As usual, the code is licensed under a Creative Commons license (see the right-hand margin for details).
A few notes about the code:
- I used the lubridate package to provide a function (month()) for extracting the month index from a date object. I know that some people dislike loading packages they don't absolutely need (memory consumption, name space clashes, ...). I find the lubridate::month() function pleasantly robust, but if you want to avoid loading lubridate, I suggest you try one of the other methods posted on the Web.
- My code loads the magrittr package so that I can "pipeline" commands. If you load a package (such as dplyr) that in turn loads magrittr, you're covered. If you prefer the pipeR package, a minimal amount of tweaking should produce a version that works with pipeR. If you just want to avoid loading anything, the same logic will work; you just need to change the piping into nested function calls.
- I make no claim that this is the most efficient, most robust or most elegant solution. It just seems to work for me.
# # Load libraries. # library(lubridate) library(magrittr) # # Function monthIndex converts English-language string # representations of a month name to the equivalent # cardinal value (1 for January, ..., 12 for December). # # Argument: # x a character vector, or object that can be # coerced to a character vector # # Value: # a numeric vector of the same length as x, # containing the ordinals of the months named # in x (NA if the entry in x cannot be deciphered) monthIndex <- function(x) { x %>% # strip any periods gsub("\\.", "", .) %>% # turn it into a full date string paste0(" 1, 2001") %>% # turn the full string into a date as.Date("%t%B %d, %Y") %>% # extract the month as an integer month } # # Unit test. # x <- c("Sep", "May", " July ", "huh?", "august", "dec ", "Oct. ") monthIndex(x) # 9 5 7 NA 8 12 10
No comments:
Post a Comment
Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.