habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc. (e.g., books, articles), search for publishing members, for funders, for journals, for DOI prefixes, and for licenses. It’s a powerful API with basically no rate limits, so you can work through lots of data quickly. ...

October 23, 2017 · 3 min · Scott Chamberlain

cranchecks: an API for CRAN check results

If you maintain an R package, or even use R packages, you may have looked at CRAN check results. These are essentially the results of running R CMD CHECK on a package. They do these for each package for each of a few different operating systems (debian, fedora, solaris, windows, osx) and different R versions (devel, release and patched). src: https://github.com/ropensci/cchecksapi base api url: https://cranchecks.info CRAN maintainers look at these, and eventually will email maintainers if checks are bad enough. ...

September 27, 2017 · 3 min · Scott Chamberlain

gbifrb: Ruby client for the GBIF API

gbifrb is a new Ruby client for the GBIF API. docs: https://www.rubydoc.info/gems/gbifrb/ rubygems: https://rubygems.org/gems/gbifrb code: https://github.com/sckott/gbifrb I maintain (w/ help) two other GBIF API clients: Python: pygbif R: rgbif API Here’s the gbifrb methods in relation to GBIF API routes registry /node - Gbif::Registry.nodes /network - Gbif::Registry.networks /installations - Gbif::Registry.installations /organizations - Gbif::Registry.organizations /dataset_metrics - Gbif::Registry.dataset_metrics /datasets - Gbif::Registry.datasets /dataset_suggest - Gbif::Registry.dataset_suggest /dataset_search - Gbif::Registry.dataset_search species /species/match - Gbif::Species.name_backbone /species/suggest - Gbif::Species.name_suggest /species/search - Gbif::Species.name_lookup /species - Gbif::Species.name_usage occurrences ...

September 7, 2017 · 1 min · Scott Chamberlain

hoardr: simple file caching

hoardr is a client for caching files and managing those files. You can definitely achieve the same tasks without a separate pacakge, and there’s a number of packages for caching various objects in R already. However, I didn’t think there was a tool for that did everything I needed. The use cases I typically need hoardr for are when dealing with large files, either text (e.g., csv) or binary (e.g., shp) files that would be nice to not make the user of packages I maintain download again if they already have the file. This makes the server’s life easier that’s serving the files and makes work faster for the user of my package. ...

August 15, 2017 · 4 min · Scott Chamberlain

Tooling for R package development

There are a lot of ways to make R packages. Many blog posts have covered making R packages, but for the most part they’ve covered only how they make packages, going from the required files for a package, what to put in DESCRIPTION, etc. But what about the tooling? I’m not going to talk about the code, etc. - but rather the different ways to approach it. The blog posts/etc. on making R packages: ...

June 18, 2017 · 7 min

Reading in May

Reading right now or just finished The Nine, Jeffrey Toobin https://www.jeffreytoobin.com/books/the-nine-tr Just finished reading this. synopsis: fucking hell, Scalia and Thomas are awful the Warren court was awesome RBG 4 life Evolutionary Biology of Parasites, Peter W. Price https://press.princeton.edu/titles/645.html In progress. I got this book from my undergrad advisor around 2001 or so - figured I’d give it a read. synopsis: parasites are awesome Bike Snob: Systematically & Mercilessly Realigning the World of Cycling, Christopher Koelle https://www.goodreads.com/book/show/7549138-bike-snob In progress. Got from my dad, thx dad synopsis: funny The Genius of Birds, Jennifer Ackerman https://www.jenniferackermanauthor.com/genius-ofbirds In progress. synopsis: birds are smart

May 16, 2017 · 1 min

CascadiaRConf

Save the date for CascadiaRConf! Website: cascadiarconf.com Twitter: @cascadiarconf There’s not a lot of info available yet - but so far: When 3 June, 2017 Where OHSU Collaborative Life Science Building more details soon on what rooms, etc. Agenda No details yet - but likely to be a series of workshops as well as single track set of talks. We’ll be accepting talk submissions soonish. Tickets We aren’t out to make money - tickets will be cheap and probably free for students. ...

March 23, 2017 · 1 min

USDA plants database API in R

The USDA maintains a database of plant information, some of it trait data, some of it life history. Check it out at https://plants.usda.gov/java/ They’ve been talking about releasing an API for a long time, but have not done so. Thus, since at least some version of their data is in the public web, I’ve created a RESTful API for the data: source code: https://github.com/sckott/usdaplantsapi/ base URL: https://plantsdb.xyz Check out the API, and open issues for bugs/feature requests in the github repo. ...

October 19, 2016 · 8 min

gbids - GenBank IDs API is back up!

GBIDS API is back Back in March this year I wrote a post about a new API for working with GenBank IDs. I had to take the API down because it was too expensive to keep up. Expensive because the dump of data is very large (3.8 GB compressed), and I need disk space on the server to uncompress that to I think about 18 GB, then load into MySQL, which is another maybe 30 GB or so. Anyway, it’s not expensive because of high traffic - although I wish that was the case - but because of needing lots of disk space. ...

September 1, 2016 · 3 min

nonoyes - text analysis of Reply All podcast transcripts

Setup URLs Episode names Transcripts Summary word usage Sentiment Most common positive and negative words Reply All is a great podcast. I’ve been wanting to learn some text analysis tools, and transcripts from the podcast are on their site. Took some approaches outlined in the tidytext package in this vignette, and used the tokenizers package, and some of the tidyverse. Code on github at sckott/nonoyes Also check out the html version Setup Load deps library("httr") library("xml2") library("stringi") library("dplyr") library("ggplot2") library("tokenizers") library("tidytext") library("tidyr") source helper functions ...

August 25, 2016 · 4 min