Plyr | Recology

R ecology workshop

After my presentation yesterday to a group of grad students on R resources, I did a presentation today on intro to R data manipulation, visualizations, and analyses/visualizations of biparite networks and community level analyses (diversity, rarefaction, ordination, etc.). As I said yesterday I’ve been playing with two ways to make reproducible presentations in R: RStudio’s presentations built in to RStudio IDE, and Slidify. Yesterday I went with RStudio’s product - today I used Slidify. See the Markdown file for the presentation here. ...

Weecology can has new mammal dataset

So the Weecology folks have published a large dataset on mammal communities in a data paper in Ecology. I know nothing about mammal communities, but that doesn’t mean one can’t play with the data… Their dataset consists of five csv files: communities, references, sites, species, and trapping data Where are these sites, and by the way, do they vary much in altitude? Let’s zoom in on just the states ...

I Work For The Internet !

UPDATE: code and figure updated at 647 AM CST on 19 Dec ‘11. Also, see Jarrett Byrnes (improved) fork of my gist here. The site I WORK FOR THE INTERNET is collecting pictures and first names (last name initials only) to show collective support against SOPA (the Stop Online Piracy Act). Please stop by their site and add your name/picture. I used the #rstats package twitteR, created by Jeff Gentry, to search for tweets from people signing this site with their picture, then plotted using ggplot2, and also used Hadley’s lubridate to round timestamps on tweets to be able to bin tweets in to time slots for plotting. ...

ggplot2 talk by Hadley Whickam at Google

plyr's idata.frame VS. data.frame

I had seen the function idata.frame in plyr before, but not really tested it. From the plyr documentation: “An immutable data frame works like an ordinary data frame, except that when you subset it, it returns a reference to the original data frame, not a a copy. This makes subsetting substantially faster and has a big impact when you are working with large datasets with many groups.” For example, although baseball is a data.frame, its immutable counterpart is a reference to it: ...

Troubling news for the teaching of evolution

[UPDATE: i remade the maps in green, hope that helps…] A recent survey reported in Science (“Defeating Creationism in the Courtroom, but not in the Classroom”) found that biology teachers in high school do not often accept the basis of their discipline, as do teachers in other disciplines, and thus may not teach evolution appropriately. Read more here: New York Times. I took a little time to play with the data provided online along with the Science article. The data is available on the Science website along with the article, and the dataset I read into R is unchanged from the original. The states abbreviations file is here (as a .xls). Here goes: ...

Good riddance to Excel pivot tables

Excel pivot tables have been how I have reorganized data…up until now. These are just a couple of examples why R is superior to Excel for reorganizing data: UPDATE: I fixed the code to use ‘dcast’ instead of ‘cast’. And library(ggplot2) instead of library(plyr) [plyr is called along with ggplot2]. Thanks Bob! Also, see another post on this topic here. library(reshape2) library(ggplot2) dataset <- data.frame(var1 = rep(c("a","b","c","d","e","f"), each = 4), var2 = rep(c("level1","level1","level2","level2"), 6), var3 = rep(c("h","m"), 12), meas = rep(1:12)) Created by Pretty R at inside-R.org # simply pivot table dcast(dataset, var1 ~ var2 + var3) Using meas as value column. Use the value argument to cast to override this choice var1 level1_h level1_m level2_h level2_m 1 a 1 2 3 4 2 b 5 6 7 8 3 c 9 10 11 12 4 d 1 2 3 4 5 e 5 6 7 8 6 f 9 10 11 12 # mean by var1 and var2 dcast(dataset, var1 ~ var2, mean) Using meas as value column. Use the value argument to cast to override this choice var1 level1 level2 1 a 1.5 3.5 2 b 5.5 7.5 3 c 9.5 11.5 4 d 1.5 3.5 5 e 5.5 7.5 6 f 9.5 11.5 # mean by var1 and var3 dcast(dataset, var1 ~ var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 h m 1 a 2 3 2 b 6 7 3 c 10 11 4 d 2 3 5 e 6 7 6 f 10 11 # mean by var1, var2 and var3 (version 1) dcast(dataset, var1 ~ var2 + var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 level1_h level1_m level2_h level2_m 1 a 1 2 3 4 2 b 5 6 7 8 3 c 9 10 11 12 4 d 1 2 3 4 5 e 5 6 7 8 6 f 9 10 11 12 # mean by var1, var2 and var3 (version 2) dcast(dataset, var1 + var2 ~ var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 var2 h m 1 a level1 1 2 2 a level2 3 4 3 b level1 5 6 4 b level2 7 8 5 c level1 9 10 6 c level2 11 12 7 d level1 1 2 8 d level2 3 4 9 e level1 5 6 10 e level2 7 8 11 f level1 9 10 12 f level2 11 12 # use package plyr to create flexible data frames... dataset_plyr <- ddply(dataset, .(var1, var2), summarise, mean = mean(meas), se = sd(meas), CV = sd(meas)/mean(meas) ) > dataset_plyr var1 var2 mean se CV 1 a level1 1.5 0.7071068 0.47140452 2 a level2 3.5 0.7071068 0.20203051 3 b level1 5.5 0.7071068 0.12856487 4 b level2 7.5 0.7071068 0.09428090 5 c level1 9.5 0.7071068 0.07443229 6 c level2 11.5 0.7071068 0.06148755 7 d level1 1.5 0.7071068 0.47140452 8 d level2 3.5 0.7071068 0.20203051 9 e level1 5.5 0.7071068 0.12856487 10 e level2 7.5 0.7071068 0.09428090 11 f level1 9.5 0.7071068 0.07443229 12 f level2 11.5 0.7071068 0.06148755 # ...to use for plotting qplot(var1, mean, colour = var2, size = CV, data = dataset_plyr, geom = "point")