note to self, secure elasticsearch

Recently I spun up a box on a cloud hosting provider planning to make a tens of thousdands of queries to an Elasticsearch instance on the same box. I could have done this on my own machine, but didn’t want to take up compute resources. I installed R and Elasticsearch on the box, then went about doing my thang. A day later when things were still running, the hosting provider sent me a message that apparently my box had been serving up a DDoS attack. ...

February 26, 2015 · 2 min · Scott Chamberlain

httping - ping and time http requests

I’ve been working on a little thing called httping - a small R package that started as a pkg to Ping urls and time requests. It’s a port of the Ruby gem httping. The httr package is in Depends in this package, so its functions can be called directly, without having to load httr explicitly yourself. In addition to timing requests, I’ve been tinkering with how to make http requests, with curl options accepting and returning the same object so they can be chained together, and then that object passed to a http verb like GET. Maybe this is a bad idea, but maybe not. ...

January 30, 2015 · 5 min · Scott Chamberlain

elastic - Elasticsearch from R

We’ve (ropensci) been working on an R client for interacting with Elasticsearch for a while now, first commit was November 2013. Elasticsearch is a document database built on the JVM. elastic interacts with the Elasticsearch HTTP API, and includes functions for setting connection details to Elasticsearch instances, loading bulk data, searching for documents with both HTTP query variables and JSON based body requests. In addition, elastic provides functions for interacting with APIs for indices, documents, nodes, clusters, an interface to the cat API, and more. ...

January 29, 2015 · 10 min · Scott Chamberlain

binomen - taxonomic classes and parsing

I maintain, along with other awesome people, the taxize R package - a taxonomic toolbelt for R, for interacting with taxonomic data sources on the web. Taxonomy data is not standardized, but there are a lot of common elements, and there is a finite list of taxonomic ranks, and finite number of major taxonomic data sets. Thus, I’ve been interested in attempting to define a pseudo standard for expressing taxonomic data in R. The conversation started a while back in a GitHub issue, and hasn’t moved very far. ...

January 19, 2015 · 3 min · Scott Chamberlain

discgolf - Dicourse from R

Discourse is a great discussion forum application. It’s another thing from Jeff Atwood, the co-founder of Stackoverflow/Stackexchange. The installation is epecially easy with their dockerized installation setup on DigitalOcean ([instructions here][https://www.digitalocean.com/community/tutorials/how-to-install-discourse-on-ubuntu-14-04]). In rOpenSci, we’ve been using a Google Groups mailing list, which is sufficient I guess, but doesn’t support Markdown, and we all know Google can kill products any day, so it makes sense to use something with which we have more control. We’ve set up our own Discourse installation to have rOpenSci discussions - find it at discuss.ropensci.org. Check it out if you want to discuss anything rOpenSci related, or general open science, open source software, etc. You can login with email, Mozilla Persona, Twitter, or GitHub. ...

January 15, 2015 · 4 min · Scott Chamberlain

R I/O for geojson and topojson

At rOpenSci we’ve been working on an R package (geojsonio) to make converting R data in various formats to geoJSON and topoJSON, and vice versa. We hope to do this one job very well, and handle all reasonable use cases. Functions in this package are organized first around what you’re working with or want to get, geojson or topojson, then convert to or read from various formats: geojson_list()/topojson_list() - convert to geojson/topojson as R list format geojson_json()/topojson_json() - convert to geojson/topojson as json geojson_read()``topojson_read() - read a geojson/topojson file from file path or URL geojson_write()/topojson_write() - write a geojson/topojson file locally Each of the above functions have methods for various objects/classes, including numeric, data.frame, list, SpatialPolygons, SpatialPolygonsDataFrame, SpatialLines, SpatialLinesDataFrame, SpatialPoints, SpatialPointsDataFrame. ...

January 6, 2015 · 5 min · Scott Chamberlain

gistr - R client for GitHub gists

GitHub has this site https://gist.github.com/ in which we can share code, text, images, maps, plots, etc super easily, without having to open up a repo, etc. GitHub gists are a great way to throw up an example use case to show someone, or show code that’s throwing errors to a support person, etc. In addition, there’s API access, which means we can interact with Gists not just from their web interface, but from the command line, or any programming language. There are clients for Node.js, Ruby, Python, and on and on. But AFAIK there wasn’t one for R. Along with Ramnath and others, we’ve been working on an R client for gists. v0.1 is now on CRAN. Below is an overview. ...

January 5, 2015 · 7 min · Scott Chamberlain

pytaxize - low level ITIS functions

I’ve been working on a Python port of the R package taxize that I maintain. It’s still early days with this Python library, I’d love to know what people think. For example, I’m giving back Pandas DataFrame’s from most functions. Does this make sense? Installation sudo pip install git+git://github.com/sckott/pytaxize.git#egg=pytaxize Or git clone the repo down, and python setup.py build && python setup.py install Load library import pytaxize ITIS ping pytaxize.itis_ping() 'This is the ITIS Web Service, providing access to the data behind www.itis.gov. The database contains 665,266 scientific names (501,207 of them valid/accepted) and 122,735 common names.' Get hierarchy down from tsn pytaxize.gethierarchydownfromtsn(tsn = 161030) tsn rankName taxonName parentName parentTsn 0 161048 Class Sarcopterygii Osteichthyes 161030 1 161061 Class Actinopterygii Osteichthyes 161030 Get hierarchy up from tsn pytaxize.gethierarchyupfromtsn(tsn = 37906) author parentName parentTsn rankName taxonName tsn 0 Gaertn. ex Schreb. Asteraceae 35420 Genus Liatris 37906 Get rank names pytaxize.getranknames() kingdomname rankid rankname 0 Bacteria 10 Kingdom 1 Bacteria 20 Subkingdom 2 Bacteria 30 Phylum 3 Bacteria 40 Subphylum 4 Bacteria 50 Superclass 5 Bacteria 60 Class 6 Bacteria 70 Subclass 7 Bacteria 80 Infraclass 8 Bacteria 90 Superorder 9 Bacteria 100 Order 10 Bacteria 110 Suborder 11 Bacteria 120 Infraorder 12 Bacteria 130 Superfamily 13 Bacteria 140 Family 14 Bacteria 150 Subfamily 15 Bacteria 160 Tribe 16 Bacteria 170 Subtribe 17 Bacteria 180 Genus 18 Bacteria 190 Subgenus 19 Bacteria 220 Species 20 Bacteria 230 Subspecies 21 Protozoa 10 Kingdom 22 Protozoa 20 Subkingdom 23 Protozoa 25 Infrakingdom 24 Protozoa 30 Phylum 25 Protozoa 40 Subphylum 26 Protozoa 45 Infraphylum 27 Protozoa 47 Parvphylum 28 Protozoa 50 Superclass 29 Protozoa 60 Class .. ... ... ... 150 Chromista 190 Subgenus 151 Chromista 200 Section 152 Chromista 210 Subsection 153 Chromista 220 Species 154 Chromista 230 Subspecies 155 Chromista 240 Variety 156 Chromista 250 Subvariety 157 Chromista 260 Form 158 Chromista 270 Subform 159 Archaea 10 Kingdom 160 Archaea 20 Subkingdom 161 Archaea 30 Phylum 162 Archaea 40 Subphylum 163 Archaea 50 Superclass 164 Archaea 60 Class 165 Archaea 70 Subclass 166 Archaea 80 Infraclass 167 Archaea 90 Superorder 168 Archaea 100 Order 169 Archaea 110 Suborder 170 Archaea 120 Infraorder 171 Archaea 130 Superfamily 172 Archaea 140 Family 173 Archaea 150 Subfamily 174 Archaea 160 Tribe 175 Archaea 170 Subtribe 176 Archaea 180 Genus 177 Archaea 190 Subgenus 178 Archaea 220 Species 179 Archaea 230 Subspecies Search by scientific name pytaxize.searchbyscientificname(x="Tardigrada") combinedname tsn 0 Rotaria tardigrada 58274 1 Notommata tardigrada 58898 2 Pilargis tardigrada 65562 3 Tardigrada 155166 4 Heterotardigrada 155167 5 Arthrotardigrada 155168 6 Mesotardigrada 155358 7 Eutardigrada 155362 8 Scytodes tardigrada 866744 Get accepted names from tsn pytaxize.getacceptednamesfromtsn('208527') If accepted, returns the same id ...

December 26, 2014 · 3 min · Scott Chamberlain

Museum metadata - the Asian Art Museum of San Francisco

I was in San Francisco last week for an altmetrics conference at PLOS. While there, I visited the Asian Art Museum, just the Roads of Arabia exhibition. It was a great exhibit. While I was looking at the pieces, I read many labels, and thought, “hey, what if someone wants this metadata”? Since we have an R package in development for scraping museum metadata (called musemeta), I just started some scraping code for this museum. Unfortunately, I don’t think the pieces from the Roads of Arabia exhibit are on their site, so no metadata to get. But they do have their main collection searchable online at https://www.asianart.org/collections/collection. Examples follow. ...

December 10, 2014 · 5 min · Scott Chamberlain

icanhaz altmetrics

The Lagotto application is a Rails app that collects and serves up via RESTful API article level metrics data for research objects. So far, this application has only been applied to scholarly articles, but will see action on datasets soon. Martin Fenner has lead the development of Lagotto. He recently set up a discussion site if you want to chat about it. The application has a nice GUI interface, and a quite nice RESTful API. ...

December 8, 2014 · 3 min · Scott Chamberlain