R | Recology

Balancing user friendliness and code fragility

I occasionally think about these various topics and ping back and forth between them, thinking I’ve got to make a package more user friendly, then back to thinking oh, I really should make this package easier to maintain, but what if that makes it less user friendly? I’ve wanted to get these thoughts written down for a while now, so here goes. User friendliness and code fragility It’s an unassailable good to make your code more user friendly. There’s no point of making your package harder to use unless you really don’t want people using it. ...

Exploring specimen collections data in Butte County, California

Why Butte County? I went to college at California State University, Chico - in Butte County, CA. I did a BA degree in Biology there. It was a great program as it was heavily focused on natural history - with classes on herps, birds, insects, fish, etc. Specimen collections data Specimen collections data are increasingly being digitized, and often accessed via largeish platforms like GBIF and iDigBio. Here I’ll explore Butte County data found with iDigBio with the spocc R package. You could also use the ridigbio package to go directly to iDigBio data. ...

Exploring git commits with git2r

In rOpenSci - as in presumably most open source projects - we want the entire project to be sustainable, but also each individual software project to be sustainable. A big part of each software project (aka R package in this case) being sustainable is the people making it, particularly whether: how many contributors a project has, and how contributions are spread across contibutors There are discussions going on about how to increase contributors to any given project. But the first thing to do is to do an assesment of where you’re at. One way to do that is visualization. ...

My Sublime Text workflow/setup

Sublime Text is pretty great. Let’s start at the beginning. Why would my primary editing tool not be vim? My background is as a biologist, spending way to many years in grad school. My first programming language was R back in 2006; my first text editor about the same year was Notepad++; my first interaction with the cli was probably a year later or so (but that was on Windows). After using Notepad++ for a few years, I stumbled upon Sublime Text via advice from a friend. I used it for a few years without paying (which you can still do), and after that realized it was worth paying for. They now have an easy to use Discourse forum too. ...

Playing with Ruby Patterns in R

I was returning to a long-term project I’ve been working on - a package for caching HTTP requests in R called vcr, a port of the Ruby gem vcr - when you do that thing you do when you are porting a library from one language to another. I stumbled upon some methods/functions I wasn’t familiar with. For example, take_while I had never seeen before. It iterates over an array, returning the elements of the array that evalulate to true (for those new to Ruby, they use true instead of TRUE as we do in R) when passed through the function given. R has lists and vectors - R’s lists are the most similar to Ruby arrays because both can have mixed objects in them (e.g., a string and an integer) while still retaining those objects as is. ...

habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc. (e.g., books, articles), search for publishing members, for funders, for journals, for DOI prefixes, and for licenses. It’s a powerful API with basically no rate limits, so you can work through lots of data quickly. ...

cranchecks: an API for CRAN check results

If you maintain an R package, or even use R packages, you may have looked at CRAN check results. These are essentially the results of running R CMD CHECK on a package. They do these for each package for each of a few different operating systems (debian, fedora, solaris, windows, osx) and different R versions (devel, release and patched). src: https://github.com/ropensci/cchecksapi base api url: https://cranchecks.info CRAN maintainers look at these, and eventually will email maintainers if checks are bad enough. ...

hoardr: simple file caching

hoardr is a client for caching files and managing those files. You can definitely achieve the same tasks without a separate pacakge, and there’s a number of packages for caching various objects in R already. However, I didn’t think there was a tool for that did everything I needed. The use cases I typically need hoardr for are when dealing with large files, either text (e.g., csv) or binary (e.g., shp) files that would be nice to not make the user of packages I maintain download again if they already have the file. This makes the server’s life easier that’s serving the files and makes work faster for the user of my package. ...

CascadiaRConf

Save the date for CascadiaRConf! Website: cascadiarconf.com Twitter: @cascadiarconf There’s not a lot of info available yet - but so far: When 3 June, 2017 Where OHSU Collaborative Life Science Building more details soon on what rooms, etc. Agenda No details yet - but likely to be a series of workshops as well as single track set of talks. We’ll be accepting talk submissions soonish. Tickets We aren’t out to make money - tickets will be cheap and probably free for students. ...

USDA plants database API in R

The USDA maintains a database of plant information, some of it trait data, some of it life history. Check it out at https://plants.usda.gov/java/ They’ve been talking about releasing an API for a long time, but have not done so. Thus, since at least some version of their data is in the public web, I’ve created a RESTful API for the data: source code: https://github.com/sckott/usdaplantsapi/ base URL: https://plantsdb.xyz Check out the API, and open issues for bugs/feature requests in the github repo. ...