Phenotypic selection analysis in R

I have up to recently always done my phenotypic selection analyses in SAS. I finally got some code I think works to do everything SAS would do. Feedback much appreciated! ########################Selection analyses############################# install.packages(c("car","reshape","ggplot2")) require(car) require(reshape) require(ggplot2) # Create data set dat <- data.frame(plant = seq(1,100,1), trait1 = rep(c(0.1,0.15,0.2,0.21,0.25,0.3,0.5,0.6,0.8,0.9,1,3,4,10,11,12,13,14,15,16), each = 5), trait2 = runif(100), fitness = rep(c(1,5,10,20,50), each = 20)) # Make relative fitness column dat_ <- cbind(dat, dat$fitness/mean(dat$fitness)) names(dat_)[5] <- "relfitness" # Standardize traits dat_ <- cbind(dat_[,-c(2:3)], rescaler(dat_[,c(2:3)],"sd")) ####Selection differentials and correlations among traits, cor.prob uses function in functions.R file ############################################################################ ####### Function for calculating correlation matrix, corrs below diagonal, ####### and P-values above diagonal ############################################################################ cor.prob <- function(X, dfr = nrow(X) - 2) { R <- cor(X) above <- row(R) < col(R) r2 <- R[above]^2 Fstat <- r2 * dfr / (1 - r2) R[above] <- 1 - pf(Fstat, 1, dfr) R } # Get selection differentials and correlations among traits in one data frame dat_seldiffs <- cov(dat_[,c(3:5)]) # calculates sel'n differentials using cov dat_selcorrs <- cor.prob(dat_[,c(3:5)]) # use P-values above diagonal for significance of sel'n differentials in dat_seldiffs dat_seldiffs_selcorrs <- data.frame(dat_seldiffs, dat_selcorrs) # combine the two ########################################################################## ####Selection gradients dat_selngrad <- lm(relfitness ~ trait1 * trait2, data = dat_) summary(dat_selngrad) # where "Estimate" is our sel'n gradient ####Check assumptions shapiro.test(dat_selngrad$residuals) # normality, bummer, non-normal hist(dat_selngrad$residuals) # plot residuals vif(dat_selngrad) # check variance inflation factors (need package car), everything looks fine plot(dat_selngrad) # cycle through diagnostic plots ############################################################################ # Plot data ggplot(dat_, aes(trait1, relfitness)) + geom_point() + geom_smooth(method = "lm") + labs(x="Trait 1",y="Relative fitness") ggsave("myplot.jpeg") Plot of relative fitness vs. trait 1 standardized ...

February 24, 2011 · 2 min · Scott Chamberlain

Phylogenetic analysis with the phangorn package: an example

The phangorn package is a relatively new package in R for the analysis and comparison of phylogenies. See here for the Bioinformatics paper and here for the package. Here is an example of using phangorn from getting sequences to making phylogenies and visualizing them: Getting sequences from Genbank Multiple alignment Maximum likelihood tree reconstruction Visualizing trees Visualizing trees and traits Make fake traits: Visualize them on trees: ...

February 21, 2011 · 1 min · Scott Chamberlain

Farmer's markets data

I combined USDA data on farmer’s markets in the US with population census data to get an idea of the disparity in farmers markets by state, and then also expressed per capita. Download USDA data here. The formatted file I used below is here (in excel format, although I read into R as csv file). The census data is read from url as below. California has a ton of absolute number of farmer’s markets, but Vermont takes the cake by far with number of markets per capita. Iowa comes in a distant second behind Vermont in markets per capita. ...

February 16, 2011 · 2 min · Scott Chamberlain

Troubling news for the teaching of evolution

[UPDATE: i remade the maps in green, hope that helps…] A recent survey reported in Science (“Defeating Creationism in the Courtroom, but not in the Classroom”) found that biology teachers in high school do not often accept the basis of their discipline, as do teachers in other disciplines, and thus may not teach evolution appropriately. Read more here: New York Times. I took a little time to play with the data provided online along with the Science article. The data is available on the Science website along with the article, and the dataset I read into R is unchanged from the original. The states abbreviations file is here (as a .xls). Here goes: ...

February 9, 2011 · 3 min · Scott Chamberlain

Plants are less sex deprived when next to closely related neighbors

A new early online paper in American Journal of Botany by Risa Sargent and colleagues suggests that plants are less sex deprived (pollen limited) in vernal pools that have more closely related plant species. Vernal pools are (at least in my experience) small (to quite large) depressions that fill up with water with winter rains, and dry out completely in the summer. Vernal pool adapted plants flower in rings down the pool as the water dries up. Aquatic invertebrates and some herps can last through the summer by burrowing in the soil. ...

February 1, 2011 · 2 min · Scott Chamberlain

Good riddance to Excel pivot tables

Excel pivot tables have been how I have reorganized data…up until now. These are just a couple of examples why R is superior to Excel for reorganizing data: UPDATE: I fixed the code to use ‘dcast’ instead of ‘cast’. And library(ggplot2) instead of library(plyr) [plyr is called along with ggplot2]. Thanks Bob! Also, see another post on this topic here. library(reshape2) library(ggplot2) dataset <- data.frame(var1 = rep(c("a","b","c","d","e","f"), each = 4), var2 = rep(c("level1","level1","level2","level2"), 6), var3 = rep(c("h","m"), 12), meas = rep(1:12)) Created by Pretty R at inside-R.org # simply pivot table dcast(dataset, var1 ~ var2 + var3) Using meas as value column. Use the value argument to cast to override this choice var1 level1_h level1_m level2_h level2_m 1 a 1 2 3 4 2 b 5 6 7 8 3 c 9 10 11 12 4 d 1 2 3 4 5 e 5 6 7 8 6 f 9 10 11 12 # mean by var1 and var2 dcast(dataset, var1 ~ var2, mean) Using meas as value column. Use the value argument to cast to override this choice var1 level1 level2 1 a 1.5 3.5 2 b 5.5 7.5 3 c 9.5 11.5 4 d 1.5 3.5 5 e 5.5 7.5 6 f 9.5 11.5 # mean by var1 and var3 dcast(dataset, var1 ~ var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 h m 1 a 2 3 2 b 6 7 3 c 10 11 4 d 2 3 5 e 6 7 6 f 10 11 # mean by var1, var2 and var3 (version 1) dcast(dataset, var1 ~ var2 + var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 level1_h level1_m level2_h level2_m 1 a 1 2 3 4 2 b 5 6 7 8 3 c 9 10 11 12 4 d 1 2 3 4 5 e 5 6 7 8 6 f 9 10 11 12 # mean by var1, var2 and var3 (version 2) dcast(dataset, var1 + var2 ~ var3, mean) Using meas as value column. Use the value argument to cast to override this choice var1 var2 h m 1 a level1 1 2 2 a level2 3 4 3 b level1 5 6 4 b level2 7 8 5 c level1 9 10 6 c level2 11 12 7 d level1 1 2 8 d level2 3 4 9 e level1 5 6 10 e level2 7 8 11 f level1 9 10 12 f level2 11 12 # use package plyr to create flexible data frames... dataset_plyr <- ddply(dataset, .(var1, var2), summarise, mean = mean(meas), se = sd(meas), CV = sd(meas)/mean(meas) ) > dataset_plyr var1 var2 mean se CV 1 a level1 1.5 0.7071068 0.47140452 2 a level2 3.5 0.7071068 0.20203051 3 b level1 5.5 0.7071068 0.12856487 4 b level2 7.5 0.7071068 0.09428090 5 c level1 9.5 0.7071068 0.07443229 6 c level2 11.5 0.7071068 0.06148755 7 d level1 1.5 0.7071068 0.47140452 8 d level2 3.5 0.7071068 0.20203051 9 e level1 5.5 0.7071068 0.12856487 10 e level2 7.5 0.7071068 0.09428090 11 f level1 9.5 0.7071068 0.07443229 12 f level2 11.5 0.7071068 0.06148755 # ...to use for plotting qplot(var1, mean, colour = var2, size = CV, data = dataset_plyr, geom = "point") ...

January 30, 2011 · 3 min · Scott Chamberlain

R and Google Visualization API: Fish harvests

I recently gathered fish harvest data from the U.S. National Oceanic and Atmospheric Administarion (NOAA), which I downloaded from Infochimps. The data is fish harvest by weight and value, by species for 21 years, from 1985 to 2005. Here is a link to a google document of the data I used below. I had to do some minor pocessing in Excel first; thus the link to this data. https://spreadsheets.google.com/ccc?key=0Aq6aW8n11tS_dFRySXQzYkppLXFaU2F5aC04d19ZS0E&amp;hl=en Get the original data from Infochimps here http://infochimps.com/datasets/domestic-fish-and-shellfish-catch-value-and-price-by-species-198 ...

January 17, 2011 · 1 min · Scott Chamberlain

R and Google Visualization API: Wikispeedia

Wikispeedia is a website trying to gather all speed limit signs on Earth. I recently created a Google Visualization for some of their data, specifically on speed limit signs that change speed throughout the day. Check it out here. Here is how to see and comment on what they are doing: website, and Google groups.

January 17, 2011 · 1 min · Scott Chamberlain

Bipartite networks and R

Earlier, I posted about generating networks from abundance distributions that you specify. If this post was interesting, check out Jeff Kilpatrick’s website, where he provides code he produced in R and Octave to compare real bipartite networks to ones generated based on ecological variables measured in the field (in our case it was abundance, body size, and nectar production). We used that code for a paper we published. Code was modified from code produced by Diego P. Vazquez. ...

January 14, 2011 · 1 min · Scott Chamberlain

Just for fun: Recovery.gov data snooping

Okay, so this isn’t ecology related at all, but I like exploring data sets. So here goes… Propublica has some awesome data sets available at their website: http://www.propublica.org/tools/ I played around with their data set on Recovery.gov (see hyperlink below in code). Here’s some figures: Mean award amount, ranked by mean amount, and also categorized by number of grants received (“nfund”) by state (by size and color of point). Yes, there are 56 “states”, which includes things like Northern Marian Islands (MP). Notice that California got the largest number of awards, but the mean award size was relatively small. ...

January 11, 2011 · 2 min · Scott Chamberlain