the new way - httsnap

Inspired by httpie, a Python command line client as a sort of drop in replacement for curl, I am playing around with something similar-ish in R, at least in spirit. I started a little R pkg called httsnap with the following ideas: The web is increasingly a JSON world, so set content-type and accept headers to applications/json by default The workflow follows logically, or at least should, from, hey, I got this url, to i need to add some options, to execute request Whenever possible, transform output to data.frame’s - facilitating downstream manipulation via dplyr, etc. Do GET requests by default. Specify a different type if you don’t want GET. Some functionality does GET by default, though in some cases you need to specify GET You can use non-standard evaluation to easily pass in query parameters without worrying about &’s, URL escaping, etc. (see Query()) Same for body params (see Body()) Install Install and load httsnap ...

April 29, 2015 · 4 min · Scott Chamberlain

Faster solr with csv

With the help of user input, I’ve tweaked solr just a bit to make things faster using default setings. I imagine the main interface for people using the solr R client is via solr_search(), which used to have wt=json by default. Changing this to wt=csv gives better performance. And it sorta makes sense to use csv, as the point of using an R client is probably do get data eventually into a data.frame, so it makes sense to go csv format (Already in tabular format) if it’s faster too. ...

March 20, 2015 · 3 min · Scott Chamberlain

PUT dataframes on your couch

It would be nice to easily push each row or column of a data.frame into CouchDB instead of having to prepare them yourself into JSON, then push in to couch. I recently added ability to push data.frame’s into couch using the normal PUT /{db} method, and added support for the couch bulk API. Install install.packages("devtools") devtools::install_github("sckott/sofa") library("sofa") PUT /db You can write directly from a data.frame, either by rows or columns. First, rows: ...

March 12, 2015 · 3 min · Scott Chamberlain

csl - an R client for Citation Style Language data

CSL (Citation Style Language) is used quite widely now to specify citations in a standard fashion. csl is an R client for exploring CSL styles, and is inspired by the Ruby gem csl. For example, csl is given back in the PLOS Lagotto article level metric API (follow https://alm.plos.org/api/v5/articles?ids=10.1371%252Fjournal.pone.0025110&info=detail&source_id=crossref). Let me know if you have any feedback at the repo https://github.com/ropensci/csl Install install.packages("devtools") devtools::install_github("ropensci/csl") library("csl") Load CSL style from a URL You can load CSL styles from either a URL or a local file on your machine. Firt, from a URL. In this case from the Zotero style repository, for the American Journal or Political Science. ...

March 11, 2015 · 3 min · Scott Chamberlain

Elasticsearch backup and restore

setup backup curl -XPUT 'http://localhost:9200/_snapshot/my_backup/' -d '{ "type": "fs", "settings": { "location": "/Users/sacmac/esbackups/my_backup", "compress": true } }' create backup http PUT "localhost:9200/_snapshot/my_backup/snapshot_2?wait_for_completion=true" get info on snapshot http "localhost:9200/_snapshot/my_backup/snapshot_2" restore curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_2/_restore" partial restore, including various options that can be used curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_2/_restore" -d '{ "indices": "index_1,index_2", "ignore_unavailable": "true", "include_global_state": false, "rename_pattern": "index_(.+)", "rename_replacement": "restored_index_$1" }'

February 26, 2015 · 1 min · Scott Chamberlain

note to self, secure elasticsearch

Recently I spun up a box on a cloud hosting provider planning to make a tens of thousdands of queries to an Elasticsearch instance on the same box. I could have done this on my own machine, but didn’t want to take up compute resources. I installed R and Elasticsearch on the box, then went about doing my thang. A day later when things were still running, the hosting provider sent me a message that apparently my box had been serving up a DDoS attack. ...

February 26, 2015 · 2 min · Scott Chamberlain

httping - ping and time http requests

I’ve been working on a little thing called httping - a small R package that started as a pkg to Ping urls and time requests. It’s a port of the Ruby gem httping. The httr package is in Depends in this package, so its functions can be called directly, without having to load httr explicitly yourself. In addition to timing requests, I’ve been tinkering with how to make http requests, with curl options accepting and returning the same object so they can be chained together, and then that object passed to a http verb like GET. Maybe this is a bad idea, but maybe not. ...

January 30, 2015 · 5 min · Scott Chamberlain

elastic - Elasticsearch from R

We’ve (ropensci) been working on an R client for interacting with Elasticsearch for a while now, first commit was November 2013. Elasticsearch is a document database built on the JVM. elastic interacts with the Elasticsearch HTTP API, and includes functions for setting connection details to Elasticsearch instances, loading bulk data, searching for documents with both HTTP query variables and JSON based body requests. In addition, elastic provides functions for interacting with APIs for indices, documents, nodes, clusters, an interface to the cat API, and more. ...

January 29, 2015 · 10 min · Scott Chamberlain

binomen - taxonomic classes and parsing

I maintain, along with other awesome people, the taxize R package - a taxonomic toolbelt for R, for interacting with taxonomic data sources on the web. Taxonomy data is not standardized, but there are a lot of common elements, and there is a finite list of taxonomic ranks, and finite number of major taxonomic data sets. Thus, I’ve been interested in attempting to define a pseudo standard for expressing taxonomic data in R. The conversation started a while back in a GitHub issue, and hasn’t moved very far. ...

January 19, 2015 · 3 min · Scott Chamberlain

discgolf - Dicourse from R

Discourse is a great discussion forum application. It’s another thing from Jeff Atwood, the co-founder of Stackoverflow/Stackexchange. The installation is epecially easy with their dockerized installation setup on DigitalOcean ([instructions here][https://www.digitalocean.com/community/tutorials/how-to-install-discourse-on-ubuntu-14-04]). In rOpenSci, we’ve been using a Google Groups mailing list, which is sufficient I guess, but doesn’t support Markdown, and we all know Google can kill products any day, so it makes sense to use something with which we have more control. We’ve set up our own Discourse installation to have rOpenSci discussions - find it at discuss.ropensci.org. Check it out if you want to discuss anything rOpenSci related, or general open science, open source software, etc. You can login with email, Mozilla Persona, Twitter, or GitHub. ...

January 15, 2015 · 4 min · Scott Chamberlain