Database

stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed. ...

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer. The previous versions of taxizedb used a variety of different databases (MySQL/MariaDB, PostgreSQL, SQLite), so the technical barrier to entry was pretty high. In the newest version just released, we’ve drastically simplified the database situation, among other things. ...

Faster solr with csv

With the help of user input, I’ve tweaked solr just a bit to make things faster using default setings. I imagine the main interface for people using the solr R client is via solr_search(), which used to have wt=json by default. Changing this to wt=csv gives better performance. And it sorta makes sense to use csv, as the point of using an R client is probably do get data eventually into a data.frame, so it makes sense to go csv format (Already in tabular format) if it’s faster too. ...

Intro to alpha ckanr - R client for CKAN RESTful API

Recently I had need to create a client for scraping museum metadata to help out some folks that use that kind of data. It’s called musemeta. One of the data sources in that package uses the open source data portal software CKAN, and so we can interact with the CKAN API to get data. Since many groups can use CKAN API/etc infrastucture because it’s open source, I thought why not have a general purpose R client for this, since there are other clients for Python, PHP, Ruby, etc. ...

sofa - reboot

I’ve reworked sofa recently after someone reported a bug in the package. Since the last post on this package on 2013-06-21, there’s a bunch of changes: Removed the sofa_ prefix from all functions as it wasn’t really necessary. Replaced rjson/RJSONIO with jsonlite for JSON I/O. New functions: revisions() - to get the revision numbers for a document. uuids() - get any number of UUIDs - e.g., if you want to set document IDs with UUIDs Most functions that deal with documents are prefixed with doc_ Functions that deal with databases are prefixed with db_ Simplified all code, reducing duplication All functions take cushion as the first parameter, for consistency sake. Changed cushion() function so that you can only register one cushion with each function call, and the function takes parameters for each element now, name (name of the cushion, whatever you want), user (user name, if applicable), pwd (password, if applicable), type (one of localhost, cloudant, or iriscouch), and port (if applicable). Changed package license from CC0 to MIT There’s still more to do, but I’m pretty happy with the recent changes, and I hope at least some find the package useful. Also, would love people to try it out as all bugs are shallow and all that… ...

Stashing and playing with raw data locally from the web

It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures to users like a data.frame. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that’s returned from a query to a database in R? What if you want to get that nice data.frame in R, but you think you may want to look at the raw data later? The raw data from web queries are often JSON or XML data. This type of data, especially JSON, can be easily stored in schemaless so-called NoSQL databases, and queried later. ...