R resources

I’m doing a presentation today to grad students on R resources. I have been writing HTML presentations recently, but some great tools are now available to convert text that is easy to read and write to presentations. RStudio has something called R presentations, that is basically Markdown. This tool is built in to RStudio. See some docs here. A cool feature of RStudio’s presentations is that the preview of the presentation live updates on each save - nice Another option is the slidify package, made by Ramnath Vaidyanathan. The canonical url for slidify is here. Slidify gives you more options and flexibity than RStudio presentations. For this presentation I went with RStudio’s product. See the Markdown file for the presentation here. ...

July 30, 2013 · 1 min · Scott Chamberlain

R to GeoJSON

UPDATE As you can see in Patrick’s comment below you can convert to GeoJSON format files with rgdal as an alternative to calling the Ogre web API described below. See here for example code for converting to GeoJSON with rgdal. GitHub recently introduced the ability to render GeoJSON files on their site as maps here, and recently introduced here support for TopoJSON, an extension of GeoJSON can be up to 80% smaller than GeoJSON, support for other file extensions (.topojson and .json), and you can embed the maps on other sites (so awesome). The underlying maps used on GitHub are Openstreet Maps. ...

June 30, 2013 · 3 min · Scott Chamberlain

Stashing and playing with raw data locally from the web

It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures to users like a data.frame. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that’s returned from a query to a database in R? What if you want to get that nice data.frame in R, but you think you may want to look at the raw data later? The raw data from web queries are often JSON or XML data. This type of data, especially JSON, can be easily stored in schemaless so-called NoSQL databases, and queried later. ...

June 17, 2013 · 7 min · Scott Chamberlain

Fylopic, an R wrapper to Phylopic

What is PhyloPic? PhyloPic is an awesome new service - I’ll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It’s like this: by Gareth Monger What makes PhyloPic not just awesome, but super awesome? All or most images are licensed under Creative Commons licenses. This means you can use the silhouettes without having to ask or pay - just attribute. ...

June 1, 2013 · 3 min · Scott Chamberlain

BISON USGS species occurrence data

The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the APIs allows you to search and retrieve data, and the other gives back maps as either a heatmap or a species occurrence map. The latter is more appropriate for working in a browser, so I’ll leave that to the web app folks. ...

May 27, 2013 · 4 min · Scott Chamberlain

Scholarly metadata in R

Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH. Many publishers, provide their metadata through their own endpoint, and implement the standard OAI-PMH methods: GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, and ListSets. Many providers use OAI-PMH, including DataCite, Dryad, and PubMed. ...

March 16, 2013 · 6 min · Scott Chamberlain

Visualizing rOpenSci collaboration

We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a Github repository, and when they did it. Then we can visualize this hitorical record. ...

March 8, 2013 · 3 min · Scott Chamberlain

Getting a simple tree via NCBI

I was just at the Phylotastic hackathon in Tucson, AZ at the iPlant facilities at the UofA. A problem that needs to be solved is getting the incrasingly vast phylogenetic information to folks not comfortable building their own phylogenies. Phylomatic has made this super easy for people that want plant phylogenies (at least 250 or so papers have used and cited Phylomatic in their papers) - however, there are few options for those that want phylogenies for other taxonomic groups. ...

February 14, 2013 · 2 min · Scott Chamberlain

Waiting for an API request to complete

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don’t know when the query will be done, so how do we know when to send the query to rerieve the data? ...

January 26, 2013 · 2 min · Scott Chamberlain

Resolving species names when you have a lot of them

taxize use case: Resolving species names when you have a lot of them Species names can be a pain in the ass, especially if you are an ecologist. We ecologists aren’t trained in taxonomy, yet we often end up with huge species lists. Of course we want to correct any spelling errors in the names, and get the newest names for our species, resolve any synonyms, etc. We are building tools into our R package taxize, that will let you check your species names to make sure they are correct. ...

January 25, 2013 · 5 min · Scott Chamberlain