Notes on porting Ruby to R

In doing a number of ports of Ruby gems to R (vcr, webmockr), I’ve noticed a few differences between the languages that are fun to dive into, at least for me. monkey patching Ruby has a nice thing where you can “monkey patch” classes/methods/etc. in other Ruby libraries. For example, lets say you have Ruby gems foo and bar. If foo has a method hello, you can override the hello method in foo with one from bar. AFAICT this is acceptable in gems on Rubygems.org and in general in the community. ...

February 19, 2019 · 4 min · Scott Chamberlain

Playing with Ruby Patterns in R

I was returning to a long-term project I’ve been working on - a package for caching HTTP requests in R called vcr, a port of the Ruby gem vcr - when you do that thing you do when you are porting a library from one language to another. I stumbled upon some methods/functions I wasn’t familiar with. For example, take_while I had never seeen before. It iterates over an array, returning the elements of the array that evalulate to true (for those new to Ruby, they use true instead of TRUE as we do in R) when passed through the function given. R has lists and vectors - R’s lists are the most similar to Ruby arrays because both can have mixed objects in them (e.g., a string and an integer) while still retaining those objects as is. ...

January 25, 2018 · 4 min · Scott Chamberlain

Web APIs with Sinatra, Mongo, Docker, and Caddy

The problem The R community has a package distribution thing called CRAN just like Ruby has Rubygems, and Python has Pypi, etc. On all packages on CRAN, the CRAN maintainers run checks on each package on multiple versions of R and on many operating systems. They report those results on a page associated with the package, like this one. You might be thinking: okay, but we have Travis-CI and friends, so who cares about that? Well, it’s these checks that CRAN runs that will determine if your package on CRAN leads to emails to you asking for changes, and possibly the package being taken down if e.g., they email and you don’t respond for a period of time. ...

November 14, 2017 · 8 min · Scott Chamberlain

habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc. (e.g., books, articles), search for publishing members, for funders, for journals, for DOI prefixes, and for licenses. It’s a powerful API with basically no rate limits, so you can work through lots of data quickly. ...

October 23, 2017 · 3 min · Scott Chamberlain

gbifrb: Ruby client for the GBIF API

gbifrb is a new Ruby client for the GBIF API. docs: https://www.rubydoc.info/gems/gbifrb/ rubygems: https://rubygems.org/gems/gbifrb code: https://github.com/sckott/gbifrb I maintain (w/ help) two other GBIF API clients: Python: pygbif R: rgbif API Here’s the gbifrb methods in relation to GBIF API routes registry /node - Gbif::Registry.nodes /network - Gbif::Registry.networks /installations - Gbif::Registry.installations /organizations - Gbif::Registry.organizations /dataset_metrics - Gbif::Registry.dataset_metrics /datasets - Gbif::Registry.datasets /dataset_suggest - Gbif::Registry.dataset_suggest /dataset_search - Gbif::Registry.dataset_search species /species/match - Gbif::Species.name_backbone /species/suggest - Gbif::Species.name_suggest /species/search - Gbif::Species.name_lookup /species - Gbif::Species.name_usage occurrences ...

September 7, 2017 · 1 min · Scott Chamberlain

gbids - GenBank IDs API is back up!

GBIDS API is back Back in March this year I wrote a post about a new API for working with GenBank IDs. I had to take the API down because it was too expensive to keep up. Expensive because the dump of data is very large (3.8 GB compressed), and I need disk space on the server to uncompress that to I think about 18 GB, then load into MySQL, which is another maybe 30 GB or so. Anyway, it’s not expensive because of high traffic - although I wish that was the case - but because of needing lots of disk space. ...

September 1, 2016 · 3 min

GenBank IDs API - get, match, swap id types

GenBank IDs, accession numbers and GI identifiers, are the two types of identifiers for entries in GenBank. (see this page for why there are two types of identifiers). Actually, recent news from NCBI is that GI identifiers will be phased out by September this year, which affects what I’ll talk about below. There are a lot of sequences in GenBank. Sometimes you have identifiers and you want to check if they exist in GenBank, or want to get one type from another (accession from GI, or vice versa; although GI phase out will make this use case no longer needed), or just get a bunch of identifiers for software testing purposes perhaps. ...

March 29, 2016 · 3 min

heythere - a robot to automate GitHub issue comments

GitHub issues are great for humans to correspond over software, or any other project. At rOpenSci we use an issue based software review system (ropensci/onboarding). Software authors and reviewers go back and forth on the software, making a better product in the end. We have a relatively small number of pieces of software under review at any one time compared to e.g., scientific journals - however, even with the small number, we as organizers, and authors and reviewers can forget things. For example: ...

March 24, 2016 · 4 min

Crossref programmatic clients

I gave two talks recently at the annual Crossref meeting, one of which was a somewhat technical overview of programmatic clients for Crossref APIs. Check out the talk here. I talked about the motivation for working with Crossref data by writing code/etc. rather than going the GUI route, then went over the various clients, with brief examples. We (rOpenSci) have been working on the R client rcrossref for a while now, but I’m also working on the Python and Ruby clients for Crossref. In addition, the Ruby client has a CLI client inside. The Javascript client is worked on independently by ScienceAI. ...

November 30, 2015 · 3 min · Scott Chamberlain

icanhaz altmetrics

The Lagotto application is a Rails app that collects and serves up via RESTful API article level metrics data for research objects. So far, this application has only been applied to scholarly articles, but will see action on datasets soon. Martin Fenner has lead the development of Lagotto. He recently set up a discussion site if you want to chat about it. The application has a nice GUI interface, and a quite nice RESTful API. ...

December 8, 2014 · 3 min · Scott Chamberlain