My take on an R introduction talk

UPDATE: I put in an R tutorial as a Github gist below. Here is a short intro R talk I gave today…for what it’s worth… R Introduction View more presentations from schamber Here’s the tutorial in a GitHub gist: https://gist.github.com/1208321

September 9, 2011 · 1 min · Scott Chamberlain

iEvoBio 2011 Synopsis

We just wrapped up the 2011 iEvoBio meeting. It was awesome! If you didn’t go this year or last year, definitely think about going next year. Here is a list of the cool projects that were discussed at the meeting (apologies if I left some out): Vistrails: workflow tool, awesome project by Claudio Silva Commplish: purpose is to use via API’s, not with the web UI Phylopic: a database of life-form silouhettes, including an API for remote access, sweet! Gloome MappingLife: awesome geographic/etc data visualization interace on the web SuiteSMA: visualizating multiple alignments treeBASE: R interface to treebase, by Carl Boettiger VertNet: database for vertebrate natural history collections RevBayes: revamp of MrBayes, with GUI, etc. Phenoscape Knowledge Base Peter Midford lightning talk: talked about matching taxonomic and genetic data BiSciCol: biological science collections tracker Ontogrator TNRS: taxonomic name resolution service Barcode of Life data systems, and remote access Moorea Biocode Project Microbial LTER’s data BirdVis: interactive bird data visualization (Claudio Silva in collaboration with Cornell Lab of Ornithology) Crowdlabs: I think the site is down right now, another project by Claudio Silva Phycas: Bayesian phylogenetics, can you just call this from R? RIP MrBayes!!!! replaced by RevBayes (see 9 above) Slides of presentations will be at Slideshare (not all presentations up yet) A birds of a feather group I was involved in proposed an idea (TOL-o-matic) like Phylomatic, but of broader scope, for easy access and submission of trees, and perhaps even social (think just pushing a ‘SHARE’ button within PAUP, RevBayes, or other phylogenetics software)! Synopses of Birds of a Feather discussion groups: http://piratepad.net/iEvoBio11-BoF-reportouts

June 22, 2011 · 2 min · Scott Chamberlain

How to fit power laws

A new paper out in Ecology by Xiao and colleagues (in press, here) compares the use of log-transformation to non-linear regression for analyzing power-laws. They suggest that the error distribution should determine which method performs better. When your errors are additive, homoscedastic, and normally distributed, they propose using non-linear regression. When errors are multiplicative, heteroscedastic, and lognormally distributed, they suggest using linear regression on log-transformed data. The assumptions about these two methods are different, so cannot be correct for a single dataset. ...

June 7, 2011 · 1 min · Scott Chamberlain

RHIPE package in R for interfacing between Hadoop and R

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha And this review of methods for interfacing with Hadoop suggests R’s RHIPE is quite nice.

May 4, 2011 · 1 min · Scott Chamberlain

Phylometa from R: Randomization via Tip Shuffle

—UPDATE: I am now using code formatting from gist.github, so I replaced the old prettyR code (sorry guys). The github way is much easier and prettier. I hope readers like the change. I wrote earlier about some code I wrote for running Phylometa (software to do phylogenetic meta-analysis) from R. I have been concerned about what exactly is the right penalty for including phylogeny in a meta-analysis. E.g.: AIC is calculated from Q in Phylometa, and Q increases with tree size. ...

April 16, 2011 · 2 min · Scott Chamberlain

Adjust branch lengths with node ages: comparison of two methods

Here is an approach for comparing two methods of adjusting branch lengths on trees: bladj in the program Phylocom and a fxn written by Gene Hunt at the Smithsonian. Get the code and example files (tree and node ages) at https://gist.github.com/938313 Get phylocom at http://www.phylodiversity.net/phylocom/ Gene Hunt’s method has many options you can mess with, including setting tip ages (not available in bladj), setting node ages, and minimum branch length imposed. You will notice that Gene’s method may be not the appropriate if you only have extant taxa. ...

April 10, 2011 · 2 min · Scott Chamberlain

cloudnumbers.com

UPDATE: I guess it still is not actually available. Bummer… Has anyone used cloudnumbers.com? http://www.cloudnumbers.com/ They provide cloud computing, and have built in applications, including R. How well does it work? Does it increase processing speed? I guess it may at the least free up RAM and processor space on your own machine.

March 11, 2011 · 1 min · Scott Chamberlain

Five ways to visualize your pairwise comparisons

UPDATE: At the bottom are two additional methods, and some additions (underlined) are added to the original 5 methods. Thanks for all the feedback… -Also, another post here about ordered-categorical data-Also #2, a method combining splom and hexbin packages here, for larger datasets ...

March 5, 2011 · 3 min · Scott Chamberlain

Phenotypic selection analysis in R

I have up to recently always done my phenotypic selection analyses in SAS. I finally got some code I think works to do everything SAS would do. Feedback much appreciated! ########################Selection analyses############################# install.packages(c("car","reshape","ggplot2")) require(car) require(reshape) require(ggplot2) # Create data set dat <- data.frame(plant = seq(1,100,1), trait1 = rep(c(0.1,0.15,0.2,0.21,0.25,0.3,0.5,0.6,0.8,0.9,1,3,4,10,11,12,13,14,15,16), each = 5), trait2 = runif(100), fitness = rep(c(1,5,10,20,50), each = 20)) # Make relative fitness column dat_ <- cbind(dat, dat$fitness/mean(dat$fitness)) names(dat_)[5] <- "relfitness" # Standardize traits dat_ <- cbind(dat_[,-c(2:3)], rescaler(dat_[,c(2:3)],"sd")) ####Selection differentials and correlations among traits, cor.prob uses function in functions.R file ############################################################################ ####### Function for calculating correlation matrix, corrs below diagonal, ####### and P-values above diagonal ############################################################################ cor.prob <- function(X, dfr = nrow(X) - 2) { R <- cor(X) above <- row(R) < col(R) r2 <- R[above]^2 Fstat <- r2 * dfr / (1 - r2) R[above] <- 1 - pf(Fstat, 1, dfr) R } # Get selection differentials and correlations among traits in one data frame dat_seldiffs <- cov(dat_[,c(3:5)]) # calculates sel'n differentials using cov dat_selcorrs <- cor.prob(dat_[,c(3:5)]) # use P-values above diagonal for significance of sel'n differentials in dat_seldiffs dat_seldiffs_selcorrs <- data.frame(dat_seldiffs, dat_selcorrs) # combine the two ########################################################################## ####Selection gradients dat_selngrad <- lm(relfitness ~ trait1 * trait2, data = dat_) summary(dat_selngrad) # where "Estimate" is our sel'n gradient ####Check assumptions shapiro.test(dat_selngrad$residuals) # normality, bummer, non-normal hist(dat_selngrad$residuals) # plot residuals vif(dat_selngrad) # check variance inflation factors (need package car), everything looks fine plot(dat_selngrad) # cycle through diagnostic plots ############################################################################ # Plot data ggplot(dat_, aes(trait1, relfitness)) + geom_point() + geom_smooth(method = "lm") + labs(x="Trait 1",y="Relative fitness") ggsave("myplot.jpeg") Plot of relative fitness vs. trait 1 standardized ...

February 24, 2011 · 2 min · Scott Chamberlain