Playing with Ruby Patterns in R

I was returning to a long-term project I’ve been working on - a package for caching HTTP requests in R called vcr, a port of the Ruby gem vcr - when you do that thing you do when you are porting a library from one language to another. I stumbled upon some methods/functions I wasn’t familiar with. For example, take_while I had never seeen before. It iterates over an array, returning the elements of the array that evalulate to true (for those new to Ruby, they use true instead of TRUE as we do in R) when passed through the function given....

January 25, 2018 · 4 min · Scott Chamberlain

habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc....

October 23, 2017 · 3 min · Scott Chamberlain

cranchecks: an API for CRAN check results

If you maintain an R package, or even use R packages, you may have looked at CRAN check results. These are essentially the results of running R CMD CHECK on a package. They do these for each package for each of a few different operating systems (debian, fedora, solaris, windows, osx) and different R versions (devel, release and patched). src: https://github.com/ropensci/cchecksapi base api url: https://cranchecks.info CRAN maintainers look at these, and eventually will email maintainers if checks are bad enough....

September 27, 2017 · 3 min · Scott Chamberlain

hoardr: simple file caching

hoardr is a client for caching files and managing those files. You can definitely achieve the same tasks without a separate pacakge, and there’s a number of packages for caching various objects in R already. However, I didn’t think there was a tool for that did everything I needed. The use cases I typically need hoardr for are when dealing with large files, either text (e.g., csv) or binary (e.g., shp) files that would be nice to not make the user of packages I maintain download again if they already have the file....

August 15, 2017 · 4 min · Scott Chamberlain

CascadiaRConf

Save the date for CascadiaRConf! Website: cascadiarconf.com Twitter: @cascadiarconf There’s not a lot of info available yet - but so far: When 3 June, 2017 Where OHSU Collaborative Life Science Building more details soon on what rooms, etc. Agenda No details yet - but likely to be a series of workshops as well as single track set of talks. We’ll be accepting talk submissions soonish. Tickets We aren’t out to make money - tickets will be cheap and probably free for students....

March 23, 2017 · 1 min

USDA plants database API in R

The USDA maintains a database of plant information, some of it trait data, some of it life history. Check it out at https://plants.usda.gov/java/ They’ve been talking about releasing an API for a long time, but have not done so. Thus, since at least some version of their data is in the public web, I’ve created a RESTful API for the data: source code: https://github.com/sckott/usdaplantsapi/ base URL: https://plantsdb.xyz Check out the API, and open issues for bugs/feature requests in the github repo....

October 19, 2016 · 8 min

nonoyes - text analysis of Reply All podcast transcripts

Setup URLs Episode names Transcripts Summary word usage Sentiment Most common positive and negative words Reply All is a great podcast. I’ve been wanting to learn some text analysis tools, and transcripts from the podcast are on their site. Took some approaches outlined in the tidytext package in this vignette, and used the tokenizers package, and some of the tidyverse. Code on github at sckott/nonoyes Also check out the html version...

August 25, 2016 · 4 min

Marine Regions data in R

UPDATE: pkg API has changed - updated the post below to work with the current CRAN version, submitted 2016-08-02 I was at a hackathon focused on Ocean Biogeographic Information System (OBIS) data back in November last year in Belgium. One project idea was to make it easier to get at data based on one or more marine regions. I was told that Marineregions.org is often used for shape files to get different regions to then do other work with....

June 9, 2016 · 6 min

atomize - make new packages from other packages

We (rOpenSci) just held our 3rd annual rOpenSci unconference (https://unconf16.ropensci.org/) in San Francisco. There were a lot of ideas, and lots of awesome projects from awesome people came out of the 2 day event. One weird idea I had comes from looking at the Node world, where there are lots of tiny packages, instead of the often larger packages we have in the R world. One reason for tiny in Node is that of course you want a library to be tiny if running in the browser for faster load times (esp....

April 7, 2016 · 2 min

scrubr - clean species occurrence records

scrubr is an R library for cleaning species occurrence records. It’s general purpose, and has the following approach: We think using a piping workflow (%>%) makes code easier to build up, and easier to understand. However, you don’t have to use pipes in this package. All inputs and outputs are data.frame’s - which makes the above point easier Records trimmed off due to various filters are retained as attributes, so can still be accessed for later inspection, but don’t get in the way of the data....

March 4, 2016 · 11 min · Scott Chamberlain