Mark has a data frame with one row for each response to any of a set of questions, and three columns: respondent ID; question number; response. Here's a chunk of R code to create a small demo data frame along those lines:
Created by Pretty R at inside-R.org
The output is:
ID Question Answer 1 1 1 11 2 2 3 23 3 1 2 12 4 2 1 21 5 2 2 22 6 1 3 13
Here is code to rearrange it:
# sort the data by ID, then by Question d <- d[do.call(order,d),] # extract a list of unique IDs and Question numbers id <- unique(d[,"ID"]) q <- unique(d[,"Question"]) # rearrange the answers into the desired matrix layout m <- matrix(d[,"Answer"], nrow=length(id), ncol=length(q), byrow=TRUE) # add the ids and make a new data frame m <- cbind(id, m) dd <- data.frame(m) names(dd) <- c("ID", paste("Q", q, sep="")) print(dd)
Created by Pretty R at inside-R.org
The output of the last line (the rejiggered data frame) is:
ID Q1 Q2 Q3 1 1 11 12 13 2 2 21 22 23
That's one way to do it. My recommendation instead would be to use the reshape or reshape2 packages. Here's the solution with reshape:
ReplyDelete> cast(d, ID ~ Question)
ID 1 2 3
1 1 11 12 13
2 2 21 22 23
Thanks a lot for the solution, tweetDeck is pretty useful for tweeting.
ReplyDeleteI think your solution might actually work better with large datasets. I am using a relatively large dataset and reshape bloats the memory to three times the size of my dataset. It freezes my computer.
The only thing is that when there is no value for a cell we get NA in regular for-loop approach or in reshape but I guess here it will break
@Siah: What breaks with missing values? I changed my little example so that one of the responses was NA and the code still worked. If either the respondent ID or the question number in the original data frame is missing, bad things will happen, but I think that's true regardless of the script (it means you have an answer but you're not sure from whom or to what question).
ReplyDeleteFor really big data sets, I might be tempted to stuff the data into SQLite or MySQL and then query out what I wanted.
@Harlan: Cool! I wasn't aware of the reshape package.
ReplyDelete