CascadiaRConf

Save the date for CascadiaRConf! Website: cascadiarconf.com Twitter: @cascadiarconf There’s not a lot of info available yet - but so far: When 3 June, 2017 Where OHSU Collaborative Life Science Building more details soon on what rooms, etc. Agenda No details yet - but likely to be a series of workshops as well as single track set of talks. We’ll be accepting talk submissions soonish. Tickets We aren’t out to make money - tickets will be cheap and probably free for students. ...

March 23, 2017 · 1 min

USDA plants database API in R

The USDA maintains a database of plant information, some of it trait data, some of it life history. Check it out at https://plants.usda.gov/java/ They’ve been talking about releasing an API for a long time, but have not done so. Thus, since at least some version of their data is in the public web, I’ve created a RESTful API for the data: source code: https://github.com/sckott/usdaplantsapi/ base URL: https://plantsdb.xyz Check out the API, and open issues for bugs/feature requests in the github repo. ...

October 19, 2016 · 8 min

gbids - GenBank IDs API is back up!

GBIDS API is back Back in March this year I wrote a post about a new API for working with GenBank IDs. I had to take the API down because it was too expensive to keep up. Expensive because the dump of data is very large (3.8 GB compressed), and I need disk space on the server to uncompress that to I think about 18 GB, then load into MySQL, which is another maybe 30 GB or so. Anyway, it’s not expensive because of high traffic - although I wish that was the case - but because of needing lots of disk space. ...

September 1, 2016 · 3 min

nonoyes - text analysis of Reply All podcast transcripts

Setup URLs Episode names Transcripts Summary word usage Sentiment Most common positive and negative words Reply All is a great podcast. I’ve been wanting to learn some text analysis tools, and transcripts from the podcast are on their site. Took some approaches outlined in the tidytext package in this vignette, and used the tokenizers package, and some of the tidyverse. Code on github at sckott/nonoyes Also check out the html version Setup Load deps library("httr") library("xml2") library("stringi") library("dplyr") library("ggplot2") library("tokenizers") library("tidytext") library("tidyr") source helper functions ...

August 25, 2016 · 4 min

video editing notes

This is how I edit videos of talks that I need to incorporate slides and video together I’m on a Mac import to iMovie (using v10 something) drop movie into editing section split pdf slides into individual files pdfseparate foobar.pdf %d.pdf convert individual pdf slides into png sips -s format png --out "${pdf%%.*}.png" "$pdf" import png’s into imovie for each image, drop into editing area where you want it when focused on the png of the slide: select crop, then - choose fit, say okay select “add as overlay” (very most left symbol), then choose picture in picture then choose swap then move inset to where you want it say okay rinse and repeat for all slides export - via File option share to youtube e.g. of the result

August 12, 2016 · 1 min

Marine Regions data in R

UPDATE: pkg API has changed - updated the post below to work with the current CRAN version, submitted 2016-08-02 I was at a hackathon focused on Ocean Biogeographic Information System (OBIS) data back in November last year in Belgium. One project idea was to make it easier to get at data based on one or more marine regions. I was told that Marineregions.org is often used for shape files to get different regions to then do other work with. ...

June 9, 2016 · 6 min

atomize - make new packages from other packages

We (rOpenSci) just held our 3rd annual rOpenSci unconference (https://unconf16.ropensci.org/) in San Francisco. There were a lot of ideas, and lots of awesome projects from awesome people came out of the 2 day event. One weird idea I had comes from looking at the Node world, where there are lots of tiny packages, instead of the often larger packages we have in the R world. One reason for tiny in Node is that of course you want a library to be tiny if running in the browser for faster load times (esp. on mobile). ...

April 7, 2016 · 2 min

GenBank IDs API - get, match, swap id types

GenBank IDs, accession numbers and GI identifiers, are the two types of identifiers for entries in GenBank. (see this page for why there are two types of identifiers). Actually, recent news from NCBI is that GI identifiers will be phased out by September this year, which affects what I’ll talk about below. There are a lot of sequences in GenBank. Sometimes you have identifiers and you want to check if they exist in GenBank, or want to get one type from another (accession from GI, or vice versa; although GI phase out will make this use case no longer needed), or just get a bunch of identifiers for software testing purposes perhaps. ...

March 29, 2016 · 3 min

heythere - a robot to automate GitHub issue comments

GitHub issues are great for humans to correspond over software, or any other project. At rOpenSci we use an issue based software review system (ropensci/onboarding). Software authors and reviewers go back and forth on the software, making a better product in the end. We have a relatively small number of pieces of software under review at any one time compared to e.g., scientific journals - however, even with the small number, we as organizers, and authors and reviewers can forget things. For example: ...

March 24, 2016 · 4 min

scrubr - clean species occurrence records

scrubr is an R library for cleaning species occurrence records. It’s general purpose, and has the following approach: We think using a piping workflow (%>%) makes code easier to build up, and easier to understand. However, you don’t have to use pipes in this package. All inputs and outputs are data.frame’s - which makes the above point easier Records trimmed off due to various filters are retained as attributes, so can still be accessed for later inspection, but don’t get in the way of the data.frame that gets modified for downstream use User interface vs. speed: This is the kind of package that surely can get faster. However, we’re focusing on the UI first, then make speed improvements down the road. Since occurrence record datasets should all have columns with lat/long information, we automatically look for those columns for you. If identified, we use them, but you can supply lat/long column names manually as well. We have many packages that fetch species occurrence records from GBIF, iNaturalist, VertNet, iDigBio, Ecoengine, and more. scrubr fills a crucial missing niche as likely all uses of occurrence data requires cleaning of some kind. When using GBIF data via rgbif, that package has some utilities for cleaning data based on the issues returned with GBIF data - scrubr is a companion to do the rest of the cleaning. ...

March 4, 2016 · 11 min · Scott Chamberlain