rgauges - fun with hourly web site analytics

Gaug.es is a really nice looking analytics platform as an alternative to Google Analytics. It is a paid service, but not that expensive really. We’ve made an R package to interact with the Gaug.es API called rgauges. Find it on Github and on CRAN. Although working with the Gaug.es API is nice and easy, they don’t keep hourly visit stats and provide those via the API, so that you have to continually collect them yourself if you want them. That’s what I have done for my own website. ...

January 17, 2014 · 5 min · Scott Chamberlain

govdat - SunlightLabs and New York Times Congress data via R

I started an R package a while back, and a few people have shown interest, so I thought it was time to revist the code. govdat is an interface to various APIs for government data: currently the Sunlight Labs APIs, and the New York Times congress API. Returned objects from functions are simple lists. In future versions of govdat, I may change how data is returned. The following are examples (which is also the package vignette) of using the Sunlight Labs API. I will add examples of using the New York Times Congress API once their site is up again; I’m doing this on 2013-08-28, just after the takedown of their site. ...

August 28, 2013 · 6 min · Scott Chamberlain

Working with climate data from the web in R

I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina - this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for #scioClimate for tweets from the conference, and #sciordata for tweets from the session I ran. The following is an abbreviated demo of what I did in the workshop showing some of what you can do with climate data in R using our packages. ...

August 17, 2013 · 5 min · Scott Chamberlain

R to GeoJSON

UPDATE As you can see in Patrick’s comment below you can convert to GeoJSON format files with rgdal as an alternative to calling the Ogre web API described below. See here for example code for converting to GeoJSON with rgdal. GitHub recently introduced the ability to render GeoJSON files on their site as maps here, and recently introduced here support for TopoJSON, an extension of GeoJSON can be up to 80% smaller than GeoJSON, support for other file extensions (.topojson and .json), and you can embed the maps on other sites (so awesome). The underlying maps used on GitHub are Openstreet Maps. ...

June 30, 2013 · 3 min · Scott Chamberlain

Stashing and playing with raw data locally from the web

It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures to users like a data.frame. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that’s returned from a query to a database in R? What if you want to get that nice data.frame in R, but you think you may want to look at the raw data later? The raw data from web queries are often JSON or XML data. This type of data, especially JSON, can be easily stored in schemaless so-called NoSQL databases, and queried later. ...

June 17, 2013 · 7 min · Scott Chamberlain

Fylopic, an R wrapper to Phylopic

What is PhyloPic? PhyloPic is an awesome new service - I’ll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It’s like this: by Gareth Monger What makes PhyloPic not just awesome, but super awesome? All or most images are licensed under Creative Commons licenses. This means you can use the silhouettes without having to ask or pay - just attribute. ...

June 1, 2013 · 3 min · Scott Chamberlain

BISON USGS species occurrence data

The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the APIs allows you to search and retrieve data, and the other gives back maps as either a heatmap or a species occurrence map. The latter is more appropriate for working in a browser, so I’ll leave that to the web app folks. ...

May 27, 2013 · 4 min · Scott Chamberlain

Scholarly metadata in R

Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH. Many publishers, provide their metadata through their own endpoint, and implement the standard OAI-PMH methods: GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, and ListSets. Many providers use OAI-PMH, including DataCite, Dryad, and PubMed. ...

March 16, 2013 · 6 min · Scott Chamberlain

Visualizing rOpenSci collaboration

We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a Github repository, and when they did it. Then we can visualize this hitorical record. ...

March 8, 2013 · 3 min · Scott Chamberlain

Waiting for an API request to complete

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don’t know when the query will be done, so how do we know when to send the query to rerieve the data? ...

January 26, 2013 · 2 min · Scott Chamberlain