Open Science Challenge

Open Science Science is becoming more open in many areas: publishing, data sharing, lab notebooks, and software. There are many benefits to open science. For example, sharing research data alongside your publications leads to increased citation rate (Piwowar et. al. 2007). In addition, data is becoming easier to share and reuse thanks to efforts like FigShare and Dryad. If you don’t understand the problem we are currently facing due to lack of open science, watch this video: ...

January 8, 2013 · 3 min · Scott Chamberlain

Is invasive?

The Global Invasive Species Database (GISD) (see their website for more info here) has data on the invasiveness status of many species. From taxize you can now query the GISD database. Introducing the function gisd_isinvasive. This function was contributed to taxize by Ignasi Bartomeus, a postdoc at the Swedish University Agricultural Sciences. There are two possible outputs from using gisd_isinvasive: “Invasive” or “Not in GISD”. If you use simplify=TRUE in the function you get “Invasive” or “Not in GISD”, but if you use simplify=FALSE you get verbose description of the invasive species instead of just “Invasive” (and you still just get “Not in GISD”). ...

December 13, 2012 · 3 min · Scott Chamberlain

Shiny apps are awesome

RStudio has a new product called Shiny that, quoting from their website, “makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use”. See here for more information. A Shiny basically consists of two files: a ui.r file and a server.r file. The ui.r file, as it says, provides the user interface, and the server.r file provides the the server logic. Below is what it looks like in the wild (on a browser). ...

December 10, 2012 · 3 min · Scott Chamberlain

One R package for all your taxonomic needs

UPDATE: there were some errors in the tests for taxize, so the binaries aren’t avaiable yet. You can install from source though, see below. Getting taxonomic information for the set of species you are studying can be a pain in the ass. You have to manually type, or paste in, your species one-by-one. Or, if you are lucky, there is a web service in which you can upload a list of species. Encyclopedia of Life (EOL) has a service where you can do this here. But is this reproducible? No. ...

December 6, 2012 · 10 min · Scott Chamberlain

Displaying Your Data in Google Earth Using R2G2

Have you ever wanted to easily visualize your ecology data in Google Earth? R2G2 is a new package for R, available via R CRAN and formally described in this Molecular Ecology Resources article, which provides a user-friendly bridge between R and the Google Earth interface. Here, we will provide a brief introduction to the package, including a short tutorial, and then encourage you to try it out with your own data! Nils Arrigo, with some help from Loren Albert, Mike Barker, and Pascal Mickelson (one of the contributors to Recology), has created a set of R tools to generate KML files to view data with geographic components. Instead of just telling you what the tools can do, though, we will show you a couple of examples using publically available data. Note: a number of individual files are linked to throughout the tutorial below, but just in case you would rather download all the tutorial files in one go, have at it (tutorial zip file). ...

October 24, 2012 · 6 min · Pascal Mickelson

Getting taxonomic names downstream

It can be a pain in the ass to get taxonomic names. For example, I sometimes need to get all the Class names for a set of species. This is a relatively easy problem using the ITIS API (example below). The much harder problem is getting all the taxonomic names downstream. ITIS doesn’t provide an API method for this - well, they do (getHirerachyDownFromTSN), but it only provides direct children (e.g., the genera within a tribe - but it won’t give all the species within each genus). ...

October 16, 2012 · 3 min · Scott Chamberlain

Exploring phylogenetic tree balance metrics

I need to simulate balanced and unbalanced phylogenetic trees for some research I am doing. In order to do this, I do rejection sampling: simulate a tree -> measure tree shape -> reject if not balanced or unbalanced enough. But what is enough? We need to define some cutoff value to determine what will be our set of balanced and unbalanced trees. calculate shape metrics A function to calculate shape metrics, and a custom theme for plottingn phylogenies. ...

October 10, 2012 · 4 min · Scott Chamberlain

GBIF biodiversity data from R - more functions

UPDATE: In response to Jarrett’s query I laid out a separate use case in which you may want to query by higher taxonomic rankings than species. See below. In addition, added examples of querying by location in reply to comments by seminym. We have been working on an R package to get GBIF data from R, with the stable version available through CRAN, and the development version available on GitHub at https://github.com/rgbif ...

October 8, 2012 · 5 min · Scott Chamberlain

Vertnet - getting vertebrate museum record data and a quick map

We (rOpenSci) started a repo to wrap the API for VertNet, an open access online database of vertebrate specimen records across many collection holders. Find the open source code here - please contribute if you are so inclined. We had a great Google Summer of Code student, Vijay Barve contributing to the repo this summer, so it is getting close to being CRAN-able. Most of the functions in the repo get you the raw data, but there were no functions to visualize the data. Since much of the data records of latitude and longitude data, maps are a natural visualization to use. ...

September 19, 2012 · 2 min · Scott Chamberlain

Getting data from figures in published papers

The problem: There are a lot of figures in published papers in the scholarly literature, like the below, from (Attwood et. al. 2012)): At some point, a scientist wants to ask a question for which they can synthesize the knowledge on that question by collecting data from the published literature. This often requires something like the following workflow: Search for relevant papers (e.g., via Google Scholar). Collect the papers. Decide which are appropriate for inclusion. Collect data from the figures using software on a native application. Examples include GraphClick and ImageJ. Proof data. Analyze data & publish paper. This workflow needs revamping, particularly in step number 3 - collecting the data. This data remains private, moving from one closed source (original publication) to another (personal computer). We can surely do better. ...

September 18, 2012 · 5 min · Scott Chamberlain