taxize workflows

A missed chat on the rOpenSci website the other day asked: Hi there, i am trying to use the taxize package and have a .csv file of species names to run through taxize updating them. What would be the code i would need to run to achieve this? One way to answer this is to talk about the basic approach to importing data, doing stuff to the data, then recombining data. There are many ways to do this, but I’ll go over a few of them. ...

December 2, 2014 · 5 min · Scott Chamberlain

1000 commits to taxize

Just today we’ve hit 1000 commits on taxize! taxize is an R client to search across lots of taxonomic databases on the web. In honor of the 1000 commit milestone, here’s some stats on the project. Before that, lots of people have contributed to taxize, it’s a big group effort: Eduard Szöcs Zachary Foster Carl Boettiger Karthik Ram Jari Oksanen Francis Michonneau Oliver Keyes David LeBauer Ben Marwick Anirvan Chatterjee In addition, we’ve had lots of feedback from users, including feature requests and bug reports, making taxize a lot better. ...

November 28, 2014 · 3 min · Scott Chamberlain

Intro to alpha ckanr - R client for CKAN RESTful API

Recently I had need to create a client for scraping museum metadata to help out some folks that use that kind of data. It’s called musemeta. One of the data sources in that package uses the open source data portal software CKAN, and so we can interact with the CKAN API to get data. Since many groups can use CKAN API/etc infrastucture because it’s open source, I thought why not have a general purpose R client for this, since there are other clients for Python, PHP, Ruby, etc. ...

November 26, 2014 · 8 min · Scott Chamberlain

Fun with the GitHub API

Recently I’ve had fun playing with the GitHub API, and here are some notes to self about this fun having. Setup Get/load packages install.packages(c('devtools','jsonlite','httr','yaml')) library("devtools") library("httr") library("yaml") Define a vector of package names pkgs <- c("alm", "bmc", "bold", "clifro", "ecoengine", "elastic", "fulltext", "geonames", "gistr", "RNeXML", "rnoaa", "rnpn", "traits", "rplos", "rsnps", "rWBclimate", "solr", "spocc", "taxize", "togeojson", "treeBASE") pkgs <- sort(pkgs) Define functions github_auth <- function(appname = getOption("gh_appname"), key = getOption("gh_id"), secret = getOption("gh_secret")) { if (is.null(getOption("gh_token"))) { myapp <- oauth_app(appname, key, secret) token <- oauth2.0_token(oauth_endpoints("github"), myapp) options(gh_token = token) } else { token <- getOption("gh_token") } return(token) } make_url <- function(x, y, z) { sprintf("https://api.github.com/repos/%s/%s/%s", x, y, z) } process_result <- function(x) { stop_for_status(x) if (!x$headers$`content-type` == "application/json; charset=utf-8") stop("content type mismatch") tmp <- content(x, as = "text") jsonlite::fromJSON(tmp, flatten = TRUE) } parse_file <- function(x) { tmp <- gsub("\n\\s+", "\n", paste(vapply(strsplit(x, "\n")[[1]], RCurl::base64Decode, character(1), USE.NAMES = FALSE), collapse = " ")) lines <- readLines(textConnection(tmp)) vapply(lines, gsub, character(1), pattern = "\\s", replacement = "", USE.NAMES = FALSE) } request <- function(owner = "ropensci", repo, file="DESCRIPTION", ...) { req <- GET(make_url(owner, repo, paste0("contents/", file)), config = c(token = github_auth(), ...)) if(req$status_code != 200) { NA } else { cts <- process_result(req)$content parse_file(cts) } } has_term <- function(what, ...) any(grepl(what, request(...))) has_file <- function(what, ...) if(all(is.na(request(file = what, ...)))) FALSE else TRUE Do stuff Does a package depend on a particular package? e.g., look for httr in the DESCRIPTION file (which is the default file name in request() above) ...

November 26, 2014 · 3 min · Scott Chamberlain

sofa - reboot

I’ve reworked sofa recently after someone reported a bug in the package. Since the last post on this package on 2013-06-21, there’s a bunch of changes: Removed the sofa_ prefix from all functions as it wasn’t really necessary. Replaced rjson/RJSONIO with jsonlite for JSON I/O. New functions: revisions() - to get the revision numbers for a document. uuids() - get any number of UUIDs - e.g., if you want to set document IDs with UUIDs Most functions that deal with documents are prefixed with doc_ Functions that deal with databases are prefixed with db_ Simplified all code, reducing duplication All functions take cushion as the first parameter, for consistency sake. Changed cushion() function so that you can only register one cushion with each function call, and the function takes parameters for each element now, name (name of the cushion, whatever you want), user (user name, if applicable), pwd (password, if applicable), type (one of localhost, cloudant, or iriscouch), and port (if applicable). Changed package license from CC0 to MIT There’s still more to do, but I’m pretty happy with the recent changes, and I hope at least some find the package useful. Also, would love people to try it out as all bugs are shallow and all that… ...

November 18, 2014 · 5 min · Scott Chamberlain

rsunlight - R client for Sunlight Labs APIs

My last blog post on this package was so long ago the package wrapped both New York Times APIs and Sunlight Labs APIs and the package was called govdat. I split that package up into rsunlight for Sunlight Labs APIs and rtimes for some New York Times APIs. rtimes is in development at Github. We’ve updated the package to include four sets of functions, one set for each of four Sunlight Labs APIs (with a separate prefix for each API): ...

August 11, 2014 · 6 min · Scott Chamberlain

analogsea - v0.1 notes

My last blog post introduced the R package I’m working on analogsea, an R client for the Digital Ocean API. Things have changed a bit, including fillig out more functions for all API endpoints, and incorparting feedback from Hadley and Karthik. The package is as v0.1 now, so I thought I’d say a few things about how it works. Note that Digital Ocean’s v2 API is in beta stage now, so the current version of analogsea at v0.1 works with their v1 API. The v2 branch of analogsea is being developed for their v2 API. ...

June 18, 2014 · 5 min · Scott Chamberlain

analogsea - an R client for the Digital Ocean API

I think this package name is my best yet. Maybe it doesn’t make sense though? At least it did at the time… Anyway, the main motivation for this package was to be able to automate spinning up Linux boxes to do cloud R/RStudio work. Of course if you are a command line native this is all easy for you, but if you are afraid of the command line and/or just don’t want to deal with it, this tool will hopefully help. ...

May 28, 2014 · 5 min · Scott Chamberlain

cowsay - ascii messages and warnings for R

The history Cowsay is a terminal program that generates ascii pictures of a cow saying what you tell the cow to say in a bubble. See the Wikipedia page for more information: https://en.wikipedia.org/wiki/Cowsay - Install cowsay to use in your terminal (on OSX): brew update brew install cowsay Type cowsay hello world!, and you get: ______________ < hello world! > -------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || Optionally, you can install fortune to get pseudorandom messages from a database of quotations. On OSX do brew install fortune, then you can pipe a fortune quote to cowsay: ...

February 20, 2014 · 4 min · Scott Chamberlain

cites - citation stuff from the command line

I’ve been learning Ruby, and decided to scratch an itch: getting citations for papers to put in a bibtex file or my Zotero library. This usually requires two parts: 1) searching for an article with keywords, and then 2) getting the citation once the paper is found. Since I am lazy, I would prefer to do this from the command line instead of opening up a browser. Thus => cites. (Note, I’m sure someone has created something better - the point is I’m learnin’ me some Ruby) cites does two things: ...

January 18, 2014 · 5 min · Scott Chamberlain