Publications by author country

I just missed another chat on the rOpenSci website: I want to know the number of publications by people from a certain country, but I dont know how to achieve this… Fun! Let’s do that. It’s a bit complicated because there is no field like geography of the authors. But there are affiliation fields, from which we can collect data we need. Installation You’ll need the GitHub version for the coutry names data, or just use the CRAN version, and get country names elsewhere. ...

December 3, 2014 · 4 min · Scott Chamberlain

http codes

Recently noticed a little Python library called httpcode that does a simple thing: gives information on http codes in the CLI. I thought this could maybe potentially be useful for R. So I made an R version. Installation devtools::install_github("sckott/httpcode") library("httpcode") Search by http code http_code(100) #> <Status code: 100> #> Message: Continue #> Explanation: Request received, please continue http_code(400) #> <Status code: 400> #> Message: Bad Request #> Explanation: Bad request syntax or unsupported method http_code(503) #> <Status code: 503> #> Message: Service Unavailable #> Explanation: The server cannot process the request due to a high load http_code(999) #> Error: No description found for code: 999 Fuzzy code search http_code('1xx') #> [[1]] #> <Status code: 100> #> Message: Continue #> Explanation: Request received, please continue #> #> [[2]] #> <Status code: 101> #> Message: Switching Protocols #> Explanation: Switching to new protocol; obey Upgrade header #> #> [[3]] #> <Status code: 102> #> Message: Processing #> Explanation: WebDAV; RFC 2518 http_code('3xx') #> [[1]] #> <Status code: 300> #> Message: Multiple Choices #> Explanation: Object has several resources -- see URI list #> #> [[2]] #> <Status code: 301> #> Message: Moved Permanently #> Explanation: Object moved permanently -- see URI list #> #> [[3]] #> <Status code: 302> #> Message: Found #> Explanation: Object moved temporarily -- see URI list #> #> [[4]] #> <Status code: 303> #> Message: See Other #> Explanation: Object moved -- see Method and URL list #> #> [[5]] #> <Status code: 304> #> Message: Not Modified #> Explanation: Document has not changed since given time #> #> [[6]] #> <Status code: 305> #> Message: Use Proxy #> Explanation: You must use proxy specified in Location to access this resource. #> #> [[7]] #> <Status code: 306> #> Message: Switch Proxy #> Explanation: Subsequent requests should use the specified proxy #> #> [[8]] #> <Status code: 307> #> Message: Temporary Redirect #> Explanation: Object moved temporarily -- see URI list #> #> [[9]] #> <Status code: 308> #> Message: Permanent Redirect #> Explanation: Object moved permanently http_code('30[12]') #> [[1]] #> <Status code: 301> #> Message: Moved Permanently #> Explanation: Object moved permanently -- see URI list #> #> [[2]] #> <Status code: 302> #> Message: Found #> Explanation: Object moved temporarily -- see URI list http_code('30[34]') #> [[1]] #> <Status code: 303> #> Message: See Other #> Explanation: Object moved -- see Method and URL list #> #> [[2]] #> <Status code: 304> #> Message: Not Modified #> Explanation: Document has not changed since given time Search by text message http_search("request") #> [[1]] #> <Status code: 100> #> Message: Continue #> Explanation: Request received, please continue #> #> [[2]] #> <Status code: 200> #> Message: OK #> Explanation: Request fulfilled, document follows #> #> [[3]] #> <Status code: 202> #> Message: Accepted #> Explanation: Request accepted, processing continues off-line #> #> [[4]] #> <Status code: 203> #> Message: Non-Authoritative Information #> Explanation: Request fulfilled from cache #> #> [[5]] #> <Status code: 204> #> Message: No Content #> Explanation: Request fulfilled, nothing follows #> #> [[6]] #> <Status code: 306> #> Message: Switch Proxy #> Explanation: Subsequent requests should use the specified proxy #> #> [[7]] #> <Status code: 400> #> Message: Bad Request #> Explanation: Bad request syntax or unsupported method #> #> [[8]] #> <Status code: 403> #> Message: Forbidden #> Explanation: Request forbidden -- authorization will not help #> #> [[9]] #> <Status code: 408> #> Message: Request Timeout #> Explanation: Request timed out; try again later. #> #> [[10]] #> <Status code: 409> #> Message: Conflict #> Explanation: Request conflict. #> #> [[11]] #> <Status code: 413> #> Message: Request Entity Too Large #> Explanation: Entity is too large. #> #> [[12]] #> <Status code: 414> #> Message: Request-URI Too Long #> Explanation: URI is too long. #> #> [[13]] #> <Status code: 416> #> Message: Requested Range Not Satisfiable #> Explanation: Cannot satisfy request range. #> #> [[14]] #> <Status code: 503> #> Message: Service Unavailable #> Explanation: The server cannot process the request due to a high load #> #> [[15]] #> <Status code: 505> #> Message: HTTP Version Not Supported #> Explanation: Cannot fulfill request. http_search("forbidden") #> [[1]] #> <Status code: 403> #> Message: Forbidden #> Explanation: Request forbidden -- authorization will not help http_search("too") #> [[1]] #> <Status code: 413> #> Message: Request Entity Too Large #> Explanation: Entity is too large. #> #> [[2]] #> <Status code: 414> #> Message: Request-URI Too Long #> Explanation: URI is too long. http_search("birds") #> Error: No status code found for search: : birds

December 2, 2014 · 4 min · Scott Chamberlain

taxize workflows

A missed chat on the rOpenSci website the other day asked: Hi there, i am trying to use the taxize package and have a .csv file of species names to run through taxize updating them. What would be the code i would need to run to achieve this? One way to answer this is to talk about the basic approach to importing data, doing stuff to the data, then recombining data. There are many ways to do this, but I’ll go over a few of them. ...

December 2, 2014 · 5 min · Scott Chamberlain

1000 commits to taxize

Just today we’ve hit 1000 commits on taxize! taxize is an R client to search across lots of taxonomic databases on the web. In honor of the 1000 commit milestone, here’s some stats on the project. Before that, lots of people have contributed to taxize, it’s a big group effort: Eduard Szöcs Zachary Foster Carl Boettiger Karthik Ram Jari Oksanen Francis Michonneau Oliver Keyes David LeBauer Ben Marwick Anirvan Chatterjee In addition, we’ve had lots of feedback from users, including feature requests and bug reports, making taxize a lot better. ...

November 28, 2014 · 3 min · Scott Chamberlain

Intro to alpha ckanr - R client for CKAN RESTful API

Recently I had need to create a client for scraping museum metadata to help out some folks that use that kind of data. It’s called musemeta. One of the data sources in that package uses the open source data portal software CKAN, and so we can interact with the CKAN API to get data. Since many groups can use CKAN API/etc infrastucture because it’s open source, I thought why not have a general purpose R client for this, since there are other clients for Python, PHP, Ruby, etc. ...

November 26, 2014 · 8 min · Scott Chamberlain

Fun with the GitHub API

Recently I’ve had fun playing with the GitHub API, and here are some notes to self about this fun having. Setup Get/load packages install.packages(c('devtools','jsonlite','httr','yaml')) library("devtools") library("httr") library("yaml") Define a vector of package names pkgs <- c("alm", "bmc", "bold", "clifro", "ecoengine", "elastic", "fulltext", "geonames", "gistr", "RNeXML", "rnoaa", "rnpn", "traits", "rplos", "rsnps", "rWBclimate", "solr", "spocc", "taxize", "togeojson", "treeBASE") pkgs <- sort(pkgs) Define functions github_auth <- function(appname = getOption("gh_appname"), key = getOption("gh_id"), secret = getOption("gh_secret")) { if (is.null(getOption("gh_token"))) { myapp <- oauth_app(appname, key, secret) token <- oauth2.0_token(oauth_endpoints("github"), myapp) options(gh_token = token) } else { token <- getOption("gh_token") } return(token) } make_url <- function(x, y, z) { sprintf("https://api.github.com/repos/%s/%s/%s", x, y, z) } process_result <- function(x) { stop_for_status(x) if (!x$headers$`content-type` == "application/json; charset=utf-8") stop("content type mismatch") tmp <- content(x, as = "text") jsonlite::fromJSON(tmp, flatten = TRUE) } parse_file <- function(x) { tmp <- gsub("\n\\s+", "\n", paste(vapply(strsplit(x, "\n")[[1]], RCurl::base64Decode, character(1), USE.NAMES = FALSE), collapse = " ")) lines <- readLines(textConnection(tmp)) vapply(lines, gsub, character(1), pattern = "\\s", replacement = "", USE.NAMES = FALSE) } request <- function(owner = "ropensci", repo, file="DESCRIPTION", ...) { req <- GET(make_url(owner, repo, paste0("contents/", file)), config = c(token = github_auth(), ...)) if(req$status_code != 200) { NA } else { cts <- process_result(req)$content parse_file(cts) } } has_term <- function(what, ...) any(grepl(what, request(...))) has_file <- function(what, ...) if(all(is.na(request(file = what, ...)))) FALSE else TRUE Do stuff Does a package depend on a particular package? e.g., look for httr in the DESCRIPTION file (which is the default file name in request() above) ...

November 26, 2014 · 3 min · Scott Chamberlain

sofa - reboot

I’ve reworked sofa recently after someone reported a bug in the package. Since the last post on this package on 2013-06-21, there’s a bunch of changes: Removed the sofa_ prefix from all functions as it wasn’t really necessary. Replaced rjson/RJSONIO with jsonlite for JSON I/O. New functions: revisions() - to get the revision numbers for a document. uuids() - get any number of UUIDs - e.g., if you want to set document IDs with UUIDs Most functions that deal with documents are prefixed with doc_ Functions that deal with databases are prefixed with db_ Simplified all code, reducing duplication All functions take cushion as the first parameter, for consistency sake. Changed cushion() function so that you can only register one cushion with each function call, and the function takes parameters for each element now, name (name of the cushion, whatever you want), user (user name, if applicable), pwd (password, if applicable), type (one of localhost, cloudant, or iriscouch), and port (if applicable). Changed package license from CC0 to MIT There’s still more to do, but I’m pretty happy with the recent changes, and I hope at least some find the package useful. Also, would love people to try it out as all bugs are shallow and all that… ...

November 18, 2014 · 5 min · Scott Chamberlain

Conditionality meta-analysis data

The paper One paper from my graduate work asked most generally ~ “How much does the variation in magnitudes and signs of species interaction outcomes vary?”. More specifically, we wanted to know if variation differed among species interaction classes (mutualism, competition, predation), and among various “gradients” (space, time, etc.). To answer this question, we used a meta-analysis approach (rather than e.g., a field experiment). We published the paper recently. p.s. I really really wish we would have put it in an open access journal… ...

October 6, 2014 · 4 min · Scott Chamberlain

rsunlight - R client for Sunlight Labs APIs

My last blog post on this package was so long ago the package wrapped both New York Times APIs and Sunlight Labs APIs and the package was called govdat. I split that package up into rsunlight for Sunlight Labs APIs and rtimes for some New York Times APIs. rtimes is in development at Github. We’ve updated the package to include four sets of functions, one set for each of four Sunlight Labs APIs (with a separate prefix for each API): ...

August 11, 2014 · 6 min · Scott Chamberlain

analogsea - v0.1 notes

My last blog post introduced the R package I’m working on analogsea, an R client for the Digital Ocean API. Things have changed a bit, including fillig out more functions for all API endpoints, and incorparting feedback from Hadley and Karthik. The package is as v0.1 now, so I thought I’d say a few things about how it works. Note that Digital Ocean’s v2 API is in beta stage now, so the current version of analogsea at v0.1 works with their v1 API. The v2 branch of analogsea is being developed for their v2 API. ...

June 18, 2014 · 5 min · Scott Chamberlain