plyr's idata.frame VS. data.frame

I had seen the function idata.frame in plyr before, but not really tested it. From the plyr documentation: “An immutable data frame works like an ordinary data frame, except that when you subset it, it returns a reference to the original data frame, not a a copy. This makes subsetting substantially faster and has a big impact when you are working with large datasets with many groups.” For example, although baseball is a data.frame, its immutable counterpart is a reference to it: ...

May 13, 2011 · 4 min · Scott Chamberlain

google reader

I just realized that the gists code blocks don’t show up in Google Reader, so you have to click the link to my blog to see the gists. Apologies for that! -S

May 12, 2011 · 1 min · Scott Chamberlain

Comparison of functions for comparative phylogenetics

With all the packages (and beta stage groups of functions) for comparative phylogenetics in R (tested here: picante, geiger, ape, motmot, Liam Revell’s functions), I was simply interested in which functions to use in cases where multiple functions exist to do the same thing. I only show default settings, so perhaps these functions would differ under different parameter settings. [I am using a Mac 2.4 GHz i5, 4GB RAM] Get motmot here: https://r-forge.r-project.org/R/?group_id=782 ...

May 11, 2011 · 2 min · Scott Chamberlain

RHIPE package in R for interfacing between Hadoop and R

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha And this review of methods for interfacing with Hadoop suggests R’s RHIPE is quite nice.

May 4, 2011 · 1 min · Scott Chamberlain

Treebase trees from R

UPDATE: See Carl Boettiger’s functions/package at Github for searching Treebase here. Treebase is a great resource for phylogenetic trees, and has a nice interface for searching for certain types of trees. However, if you want to simply download a lot of trees for analyses (like that in Davies et al.), then you want to be able to access trees in bulk (I believe Treebase folks are working on an API though). I wrote some simple code for extracting trees from Treebase.org. It reads an xml file of (in this case consensus) URL’s for each tree, parses the xml, makes a vector of URL’s, reads the nexus files with error checking, remove trees that gave errors, then a simple plot looking at metrics of the trees. ...

May 3, 2011 · 1 min · Scott Chamberlain

Processing nested lists

So perhaps you have all figured this out already, but I was excited to figure out how to finally neatly get all the data frames, lists, vectors, etc. out of a nested list. It is as easy as nesting calls to the apply family of functions, in the case below, using plyr’s apply like functions. Take this example: # Nested lists code, an example # Make a nested list mylist <- list() mylist_ <- list() for(i in 1:5) { for(j in 1:5) { mylist[[j]] <- i*j } mylist_[[i]] <- mylist } # return values from first part of list laply(mylist_[[1]], identity) [1] 1 2 3 4 5 # return all values laply(mylist_, function(x) laply(x, identity)) 1 2 3 4 5 [1,] 1 2 3 4 5 [2,] 2 4 6 8 10 [3,] 3 6 9 12 15 [4,] 4 8 12 16 20 [5,] 5 10 15 20 25 # perform some function, in this case sqrt of each value laply(mylist_, function(x) laply(x, function(x) sqrt(x))) 1 2 3 4 5 [1,] 1.000000 1.414214 1.732051 2.000000 2.236068 [2,] 1.414214 2.000000 2.449490 2.828427 3.162278 [3,] 1.732051 2.449490 3.000000 3.464102 3.872983 [4,] 2.000000 2.828427 3.464102 4.000000 4.472136 [5,] 2.236068 3.162278 3.872983 4.472136 5.000000

April 28, 2011 · 1 min · Scott Chamberlain

Running Phylip's contrast application for trait pairs from R

Here is some code to run Phylip’s contrast application from R and get the output within R to easily manipulate yourself. Importantly, the code is written specifically for trait pairs only as the regular expression work in the code specifically grabs data from contast results when only two traits are input. You could easily change the code to do N traits. Note that the p-value calculated for the chi-square statistic is not output from contrast, but is calculated within the function ‘PhylipWithinSpContr’. In the code below there are two functions that make a lot of busy work easier: ‘WritePhylip’ and ‘PhylipWithinSpContr’. The first function is nice because the formatting required for data input to Phylip programs is so, well, awkward - and this function does it for you. The second function runs contrast and retrieves the output data. The example data set I produce in the code below has multiple individuals per species, so that contrasts are calculated taking into account within species variation. Get Phylip’s contrast documentation here. ...

April 26, 2011 · 2 min · Scott Chamberlain

Phylometa from R: Randomization via Tip Shuffle

—UPDATE: I am now using code formatting from gist.github, so I replaced the old prettyR code (sorry guys). The github way is much easier and prettier. I hope readers like the change. I wrote earlier about some code I wrote for running Phylometa (software to do phylogenetic meta-analysis) from R. I have been concerned about what exactly is the right penalty for including phylogeny in a meta-analysis. E.g.: AIC is calculated from Q in Phylometa, and Q increases with tree size. ...

April 16, 2011 · 2 min · Scott Chamberlain

RStudio Beta 2 is Out!

RStudio Beta 2 (v0.93) « RStudio Blog A new beta version of RStudio is out!

April 11, 2011 · 1 min · Scott Chamberlain

Adjust branch lengths with node ages: comparison of two methods

Here is an approach for comparing two methods of adjusting branch lengths on trees: bladj in the program Phylocom and a fxn written by Gene Hunt at the Smithsonian. Get the code and example files (tree and node ages) at https://gist.github.com/938313 Get phylocom at http://www.phylodiversity.net/phylocom/ Gene Hunt’s method has many options you can mess with, including setting tip ages (not available in bladj), setting node ages, and minimum branch length imposed. You will notice that Gene’s method may be not the appropriate if you only have extant taxa. ...

April 10, 2011 · 2 min · Scott Chamberlain