stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed....

September 10, 2020 · 8 min · Scott Chamberlain

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer....

August 17, 2020 · 4 min · Scott Chamberlain

Faster solr with csv

With the help of user input, I’ve tweaked solr just a bit to make things faster using default setings. I imagine the main interface for people using the solr R client is via solr_search(), which used to have wt=json by default. Changing this to wt=csv gives better performance. And it sorta makes sense to use csv, as the point of using an R client is probably do get data eventually into a data....

March 20, 2015 · 3 min · Scott Chamberlain

Intro to alpha ckanr - R client for CKAN RESTful API

Recently I had need to create a client for scraping museum metadata to help out some folks that use that kind of data. It’s called musemeta. One of the data sources in that package uses the open source data portal software CKAN, and so we can interact with the CKAN API to get data. Since many groups can use CKAN API/etc infrastucture because it’s open source, I thought why not have a general purpose R client for this, since there are other clients for Python, PHP, Ruby, etc....

November 26, 2014 · 8 min · Scott Chamberlain

sofa - reboot

I’ve reworked sofa recently after someone reported a bug in the package. Since the last post on this package on 2013-06-21, there’s a bunch of changes: Removed the sofa_ prefix from all functions as it wasn’t really necessary. Replaced rjson/RJSONIO with jsonlite for JSON I/O. New functions: revisions() - to get the revision numbers for a document. uuids() - get any number of UUIDs - e.g., if you want to set document IDs with UUIDs Most functions that deal with documents are prefixed with doc_ Functions that deal with databases are prefixed with db_ Simplified all code, reducing duplication All functions take cushion as the first parameter, for consistency sake....

November 18, 2014 · 5 min · Scott Chamberlain

Stashing and playing with raw data locally from the web

It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures to users like a data.frame. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that’s returned from a query to a database in R? What if you want to get that nice data....

June 17, 2013 · 7 min · Scott Chamberlain