Processing nested lists

So perhaps you have all figured this out already, but I was excited to figure out how to finally neatly get all the data frames, lists, vectors, etc. out of a nested list. It is as easy as nesting calls to the apply family of functions, in the case below, using plyr’s apply like functions. Take this example: # Nested lists code, an example # Make a nested list mylist <- list() mylist_ <- list() for(i in 1:5) { for(j in 1:5) { mylist[[j]] <- i*j } mylist_[[i]] <- mylist } # return values from first part of list laply(mylist_[[1]], identity) [1] 1 2 3 4 5 # return all values laply(mylist_, function(x) laply(x, identity)) 1 2 3 4 5 [1,] 1 2 3 4 5 [2,] 2 4 6 8 10 [3,] 3 6 9 12 15 [4,] 4 8 12 16 20 [5,] 5 10 15 20 25 # perform some function, in this case sqrt of each value laply(mylist_, function(x) laply(x, function(x) sqrt(x))) 1 2 3 4 5 [1,] 1.000000 1.414214 1.732051 2.000000 2.236068 [2,] 1.414214 2.000000 2.449490 2.828427 3.162278 [3,] 1.732051 2.449490 3.000000 3.464102 3.872983 [4,] 2.000000 2.828427 3.464102 4.000000 4.472136 [5,] 2.236068 3.162278 3.872983 4.472136 5.000000

April 28, 2011 · 1 min · Scott Chamberlain

Running Phylip's contrast application for trait pairs from R

Here is some code to run Phylip’s contrast application from R and get the output within R to easily manipulate yourself. Importantly, the code is written specifically for trait pairs only as the regular expression work in the code specifically grabs data from contast results when only two traits are input. You could easily change the code to do N traits. Note that the p-value calculated for the chi-square statistic is not output from contrast, but is calculated within the function ‘PhylipWithinSpContr’. In the code below there are two functions that make a lot of busy work easier: ‘WritePhylip’ and ‘PhylipWithinSpContr’. The first function is nice because the formatting required for data input to Phylip programs is so, well, awkward - and this function does it for you. The second function runs contrast and retrieves the output data. The example data set I produce in the code below has multiple individuals per species, so that contrasts are calculated taking into account within species variation. Get Phylip’s contrast documentation here. ...

April 26, 2011 · 2 min · Scott Chamberlain

Phylometa from R: Randomization via Tip Shuffle

—UPDATE: I am now using code formatting from gist.github, so I replaced the old prettyR code (sorry guys). The github way is much easier and prettier. I hope readers like the change. I wrote earlier about some code I wrote for running Phylometa (software to do phylogenetic meta-analysis) from R. I have been concerned about what exactly is the right penalty for including phylogeny in a meta-analysis. E.g.: AIC is calculated from Q in Phylometa, and Q increases with tree size. ...

April 16, 2011 · 2 min · Scott Chamberlain

Adjust branch lengths with node ages: comparison of two methods

Here is an approach for comparing two methods of adjusting branch lengths on trees: bladj in the program Phylocom and a fxn written by Gene Hunt at the Smithsonian. Get the code and example files (tree and node ages) at https://gist.github.com/938313 Get phylocom at http://www.phylodiversity.net/phylocom/ Gene Hunt’s method has many options you can mess with, including setting tip ages (not available in bladj), setting node ages, and minimum branch length imposed. You will notice that Gene’s method may be not the appropriate if you only have extant taxa. ...

April 10, 2011 · 2 min · Scott Chamberlain

Phylometa from R - UDPATE

A while back I posted some messy code to run Phylometa from R, especially useful for processing the output data from Phylometa which is not easily done. The code is still quite messy, but it should work now. I have run the code with tens of different data sets and phylogenies so it should work. I fixed errors when parentheses came up against numbers in the output, and other things. You can use the code for up to 4 levels of your grouping variable. In addition, there are some lines of code to plot the effect sizes with confidence intervals, comparing random and fixed effects models and phylogenetic and traditional models. ...

April 1, 2011 · 2 min · Scott Chamberlain

basic ggplot2 network graphs ver2

I posted last week a simple function to plot networks using ggplot2 package. Here is version 2. I still need to work on figuring out efficient vertex placement. Changes in version 2: You have one of three options: use an igraph object, a matrix, or a dataframe (matrices will be converted to data frames within the function) If you have data on food webs similar to that provided in the Takapoto dataset provided in the NetIndices package, you can set trophic = “TRUE”, and gggraph will use the function TrophInd to assign trophic levels (the y axis value) to each vertex/node. You have to provide additional information along with this option such as what the imports and exports are, see NetIndices documentation. I added some simple error checking. if using method=“df” and trophic=“FALSE”, x axis placement of vertices is now done using the function degreex (see inside the fxn), which sorts vertices according to their degree (so the least connected species are on the left of the graph; note that species with the same degree are not stacked on the y-axis because e.g., two vertices of degree=5 would get x=3 then x=4). # ggraph Version 2 require(bipartite) require(igraph) require(ggplot2) # gggraph, version 3 g = an igraph graph object, a matrix, or data frame # vplace = type of vertex placement assignment, one of rnorm, runif, etc. # method = one of 'df' for data frame, 'mat' for matrix or 'igraph' for an # igraph graph object trophic = TRUE or FALSE for using Netindices # function TrophInd to determine trophic level (y value in graph) # trophinames = columns in matrix or dataframe to use for calculating # trophic level import = named or refereced by col# columns of matrix or # dataframe to use for import argument of TrophInd export = named or # refereced by col# columns of matrix or dataframe to use for export # argument of TrophInd dead = named or refereced by col# columns of matrix # or dataframe to use for dead argument of TrophInd gggraph <- function(g, vplace = rnorm, method, trophic = "FALSE", trophinames, import, export) { degreex <- function(x) { degreecol <- apply(x, 2, function(y) length(y[y > 0])) degreerow <- apply(x, 1, function(y) length(y[y > 0])) degrees <- sort(c(degreecol, degreerow)) df <- data.frame(degrees, x = seq(1, length(degrees), 1)) df$value <- rownames(df) df } # require igraph if (!require(igraph)) stop("must first install 'igraph' package.") # require ggplot2 if (!require(ggplot2)) stop("must first install 'ggplot2' package.") if (method == "df") { if (class(g) == "matrix") { g <- as.data.frame(g) } if (class(g) != "data.frame") stop("object must be of class 'data.frame.'") if (trophic == "FALSE") { # data preparation from adjacency matrix temp <- data.frame(expand.grid(dimnames(g))[1:2], as.vector(as.matrix(g))) temp <- temp[(temp[, 3] > 0) & !is.na(temp[, 3]), ] temp <- temp[sort.list(temp[, 1]), ] g_df <- data.frame(rows = temp[, 1], cols = temp[, 2], freqint = temp[, 3]) g_df$id <- 1:length(g_df[, 1]) g_df <- data.frame(id = g_df[, 4], rows = g_df[, 1], cols = g_df[, 2], freqint = g_df[, 3]) g_df_ <- melt(g_df, id = c(1, 4)) xy_s <- data.frame(degreex(g), y = rnorm(length(unique(g_df_$value)))) g_df_2 <- merge(g_df_, xy_s, by = "value") } else if (trophic == "TRUE") { # require NetIndices if (!require(NetIndices)) stop("must first install 'NetIndices' package.") # data preparation from adjacency matrix temp <- data.frame(expand.grid(dimnames(g[-trophinames, -trophinames]))[1:2], as.vector(as.matrix(g[-trophinames, -trophinames]))) temp <- temp[(temp[, 3] > 0) & !is.na(temp[, 3]), ] temp <- temp[sort.list(temp[, 1]), ] g_df <- data.frame(rows = temp[, 1], cols = temp[, 2], freqint = temp[, 3]) g_df$id <- 1:length(g_df[, 1]) g_df <- data.frame(id = g_df[, 4], rows = g_df[, 1], cols = g_df[, 2], freqint = g_df[, 3]) g_df_ <- melt(g_df, id = c(1, 4)) xy_s <- data.frame(value = unique(g_df_$value), x = rnorm(length(unique(g_df_$value))), y = TrophInd(g, Import = import, Export = export)[, 1]) g_df_2 <- merge(g_df_, xy_s, by = "value") } # plotting p <- ggplot(g_df_2, aes(x, y)) + geom_point(size = 5) + geom_line(aes(size = freqint, group = id)) + geom_text(size = 3, hjust = 1.5, aes(label = value)) + theme_bw() + opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(), axis.text.x = theme_blank(), axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.ticks = theme_blank(), panel.border = theme_blank(), legend.position = "none") p # return graph } else if (method == "igraph") { if (class(g) != "igraph") stop("object must be of class 'igraph.'") # data preparation from igraph object g_ <- get.edgelist(g) g_df <- as.data.frame(g_) g_df$id <- 1:length(g_df[, 1]) g_df <- melt(g_df, id = 3) xy_s <- data.frame(value = unique(g_df$value), x = vplace(length(unique(g_df$value))), y = vplace(length(unique(g_df$value)))) g_df2 <- merge(g_df, xy_s, by = "value") # plotting p <- ggplot(g_df2, aes(x, y)) + geom_point(size = 2) + geom_line(size = 0.3, aes(group = id, linetype = id)) + geom_text(size = 3, hjust = 1.5, aes(label = value)) + theme_bw() + opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(), axis.text.x = theme_blank(), axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.ticks = theme_blank(), panel.border = theme_blank(), legend.position = "none") p # return graph } else stop(paste("do not recognize method = \"", method, "\";\nmethods are \"df\" and \"igraph\"", sep = "")) } # Eg library(NetIndices) data(Takapoto) gggraph(Takapoto, vplace = rnorm, method = "df", trophic = "TRUE", trophinames = c(8:10), import = "CO2", export = c("CO2", "Sedimentation", "Grazing")) ...

March 23, 2011 · 5 min · Scott Chamberlain

basic ggplot2 network graphs

I have been looking around on the web and have not found anything yet related to using ggplot2 for making graphs/networks. I put together a few functions to make very simple graphs. The bipartite function especially is not ideal, as of course we only want to allow connections between unlike nodes, not all nodes. These functions do not, obviously, take full advantage of the power of ggplot2, but it’s a start. ...

March 17, 2011 · 2 min · Scott Chamberlain

Five ways to visualize your pairwise comparisons

UPDATE: At the bottom are two additional methods, and some additions (underlined) are added to the original 5 methods. Thanks for all the feedback… -Also, another post here about ordered-categorical data-Also #2, a method combining splom and hexbin packages here, for larger datasets ...

March 5, 2011 · 3 min · Scott Chamberlain

Phenotypic selection analysis in R

I have up to recently always done my phenotypic selection analyses in SAS. I finally got some code I think works to do everything SAS would do. Feedback much appreciated! ########################Selection analyses############################# install.packages(c("car","reshape","ggplot2")) require(car) require(reshape) require(ggplot2) # Create data set dat <- data.frame(plant = seq(1,100,1), trait1 = rep(c(0.1,0.15,0.2,0.21,0.25,0.3,0.5,0.6,0.8,0.9,1,3,4,10,11,12,13,14,15,16), each = 5), trait2 = runif(100), fitness = rep(c(1,5,10,20,50), each = 20)) # Make relative fitness column dat_ <- cbind(dat, dat$fitness/mean(dat$fitness)) names(dat_)[5] <- "relfitness" # Standardize traits dat_ <- cbind(dat_[,-c(2:3)], rescaler(dat_[,c(2:3)],"sd")) ####Selection differentials and correlations among traits, cor.prob uses function in functions.R file ############################################################################ ####### Function for calculating correlation matrix, corrs below diagonal, ####### and P-values above diagonal ############################################################################ cor.prob <- function(X, dfr = nrow(X) - 2) { R <- cor(X) above <- row(R) < col(R) r2 <- R[above]^2 Fstat <- r2 * dfr / (1 - r2) R[above] <- 1 - pf(Fstat, 1, dfr) R } # Get selection differentials and correlations among traits in one data frame dat_seldiffs <- cov(dat_[,c(3:5)]) # calculates sel'n differentials using cov dat_selcorrs <- cor.prob(dat_[,c(3:5)]) # use P-values above diagonal for significance of sel'n differentials in dat_seldiffs dat_seldiffs_selcorrs <- data.frame(dat_seldiffs, dat_selcorrs) # combine the two ########################################################################## ####Selection gradients dat_selngrad <- lm(relfitness ~ trait1 * trait2, data = dat_) summary(dat_selngrad) # where "Estimate" is our sel'n gradient ####Check assumptions shapiro.test(dat_selngrad$residuals) # normality, bummer, non-normal hist(dat_selngrad$residuals) # plot residuals vif(dat_selngrad) # check variance inflation factors (need package car), everything looks fine plot(dat_selngrad) # cycle through diagnostic plots ############################################################################ # Plot data ggplot(dat_, aes(trait1, relfitness)) + geom_point() + geom_smooth(method = "lm") + labs(x="Trait 1",y="Relative fitness") ggsave("myplot.jpeg") Plot of relative fitness vs. trait 1 standardized ...

February 24, 2011 · 2 min · Scott Chamberlain

Phylogenetic analysis with the phangorn package: an example

The phangorn package is a relatively new package in R for the analysis and comparison of phylogenies. See here for the Bioinformatics paper and here for the package. Here is an example of using phangorn from getting sequences to making phylogenies and visualizing them:Getting sequences from GenbankMultiple alignmentMaximum likelihood tree reconstructionVisualizing treesVisualizing trees and traitsMake fake traits:Visualize them on trees: ...

February 21, 2011 · 1 min · Scott Chamberlain