pygbif - GBIF client for Python

I maintain an R client for the GBIF API, at rgbif. Been working on it for a few years, and recently been thinking that there should be a nice low level client for Python as well. I didn’t see one searching Github, etc. so I started working on one recently: pygbif It’s up on pypi. There’s not much in pygbif yet - I wanted to get something up to start getting some users to more quickly make the library useful to people....

November 12, 2015 · 2 min · Scott Chamberlain

noaa - Integrated Surface Database data

I’ve recently made some improvements to the functions that work with ISD (Integrated Surface Database) data. isd data The isd() function now caches more intelligently. We now cache using .rds files via saveRDS/readRDS, whereas we used to use .csv files, which take up much more disk space, and we have to worry about not changing data formats on reading data back into an R session. This has the downside that you can’t just go directly to open up a cached file in your favorite spreadsheet viewer, but you can do that manually after reading in to R....

October 21, 2015 · 4 min · Scott Chamberlain

Metrics for open source projects

Measuring use of open source software isn’t always straightforward. The problem is especially acute for software targeted largely at academia, where usage is not measured just by software downloads, but also by citations. Citations are a well-known pain point because the citation graph is privately held by iron doors (e.g., Scopus, Google Scholar). New ventures aim to open up citation data, but of course it’s an immense amount of work, and so does not come quickly....

October 19, 2015 · 5 min · Scott Chamberlain

analogsea - an R client for the Digital Ocean API

analogsea is now on CRAN. We started developing the pkg back in May 2014, but just now getting the first version on CRAN. It’s a collaboration with Hadley and Winston Chang. Most of analogsea package is for interacting with the Digital Ocean API, including: Manage domains Manage ssh keys Get actions Manage images Manage droplets (servers) A number of convenience functions are included for doing tasks (e.g., resizing a droplet) that aren’t supported by Digital Ocean’s API out of the box (i....

October 2, 2015 · 2 min · Scott Chamberlain

oai - an OAI-PMH client

oai is a general purpose client to work with any ‘OAI-PMH’ service. The ‘OAI-PMH’ protocol is described at https://www.openarchives.org/OAI/openarchivesprotocol.html. The main functions follow the OAI-PMH verbs: GetRecord Identify ListIdentifiers ListMetadataFormats ListRecords ListSets The repo is at https://github.com/sckott/oai I will be using this in a number of packages I maintain that use OAI-PMH data services. If you try it, let me know what you think. This package is heading to rOpenSci soon: https://github....

September 11, 2015 · 3 min · Scott Chamberlain

fulltext - a package to help you mine text

Finally, we got fulltext up on CRAN - our first commit was May last year. fulltext is a package to facilitate text mining. It focuses on open access journals. This package makes it easier to search for articles, download those articles in full text if available, convert pdf format to plain text, and extract text chunks for vizualization/analysis. We are planning to add bits for analysis in future versions. We’ve been working on this package for a while now....

August 7, 2015 · 10 min · Scott Chamberlain

rnoaa - Weather data in R

NOAA provides a lot of weather data, across many different websites under different project names. The R package rnoaa accesses many of these, including: NOAA NCDC climate data, using the NCDC API version 2 GHCND FTP data ISD FTP data Severe weather data docs are at https://www.ncdc.noaa.gov/swdiws/ Sea ice data NOAA buoy data Tornadoes! Data from the NOAA Storm Prediction Center HOMR - Historical Observing Metadata Repository - from NOAA NCDC Storm data - from the International Best Track Archive for Climate Stewardship (IBTrACS) rnoaa used to provide access to ERDDAP servers, but a separate package rerddap focuses on just those data sources....

July 7, 2015 · 12 min · Scott Chamberlain

rerddap - General purpose R client for ERDDAP servers

ERDDAP is a data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. Besides it’s own RESTful interface, much of which is designed based on OPeNDAP, ERDDAP can act as an OPeNDAP server and as a WMS server for gridded data. ERDDAP is a powerful tool - in a world of heterogeneous data, it’s often hard to combine data and serve it through the same interface, with tools for querying/filtering/subsetting the data....

June 24, 2015 · 8 min · Scott Chamberlain

iDigBio - a new data source in spocc

iDigBio, or Integrated Digitized Biocollections, collects and provides access to species occurrence data, and associated metadata (e.g., images of specimens, when provided). They collect data from a lot of different providers. They have a nice web interface for searching, check out idigbio.org/portal/search. spocc is a package we’ve been working on at rOpenSci for a while now - it is a one stop shop for retrieving species ocurrence data. As new sources of species occurrence data come to our attention, and are available via a RESTful API, we incorporate them into spocc....

June 8, 2015 · 3 min · Scott Chamberlain

lawn - a new package to do geospatial analysis

lawn is an R wrapper for the Javascript library turf.js for advanced geospatial analysis. In addition, we have a few functions to interface with the geojson-random Javascript library. lawn includes traditional spatial operations, helper functions for creating GeoJSON data, and data classification and statistics tools. There is an additional helper function (see view()) in this package to help visualize data with interactive maps via the leaflet package (https://github.com/rstudio/leaflet). Note that leaflet is not required to install lawn - it’s in Suggests, not Imports or Depends....

May 18, 2015 · 5 min · Scott Chamberlain