stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed. ...

September 10, 2020 · 8 min · Scott Chamberlain

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer. The previous versions of taxizedb used a variety of different databases (MySQL/MariaDB, PostgreSQL, SQLite), so the technical barrier to entry was pretty high. In the newest version just released, we’ve drastically simplified the database situation, among other things. ...

August 17, 2020 · 4 min · Scott Chamberlain

binomen - Tools for slicing and dicing taxonomic names

The first version of binomen is now up on CRAN. It provides various taxonomic classes for defining a single taxon, multiple taxa, and a taxonomic data.frame. It is designed as a companion to taxize, where you can get taxonomic data on taxonomic names from the web. The classes (S3): taxon taxonref taxonrefs binomial grouping (i.e., classification - used different term to avoid conflict with classification in taxize) For example, the binomial class is defined by a genus, epithet, authority, and optional full species name and canonical version. ...

December 8, 2015 · 5 min · Scott Chamberlain

binomen - taxonomic classes and parsing

I maintain, along with other awesome people, the taxize R package - a taxonomic toolbelt for R, for interacting with taxonomic data sources on the web. Taxonomy data is not standardized, but there are a lot of common elements, and there is a finite list of taxonomic ranks, and finite number of major taxonomic data sets. Thus, I’ve been interested in attempting to define a pseudo standard for expressing taxonomic data in R. The conversation started a while back in a GitHub issue, and hasn’t moved very far. ...

January 19, 2015 · 3 min · Scott Chamberlain

pytaxize - low level ITIS functions

I’ve been working on a Python port of the R package taxize that I maintain. It’s still early days with this Python library, I’d love to know what people think. For example, I’m giving back Pandas DataFrame’s from most functions. Does this make sense? Installation sudo pip install git+git://github.com/sckott/pytaxize.git#egg=pytaxize Or git clone the repo down, and python setup.py build && python setup.py install Load library import pytaxize ITIS ping pytaxize.itis_ping() 'This is the ITIS Web Service, providing access to the data behind www.itis.gov. The database contains 665,266 scientific names (501,207 of them valid/accepted) and 122,735 common names.' Get hierarchy down from tsn pytaxize.gethierarchydownfromtsn(tsn = 161030) tsn rankName taxonName parentName parentTsn 0 161048 Class Sarcopterygii Osteichthyes 161030 1 161061 Class Actinopterygii Osteichthyes 161030 Get hierarchy up from tsn pytaxize.gethierarchyupfromtsn(tsn = 37906) author parentName parentTsn rankName taxonName tsn 0 Gaertn. ex Schreb. Asteraceae 35420 Genus Liatris 37906 Get rank names pytaxize.getranknames() kingdomname rankid rankname 0 Bacteria 10 Kingdom 1 Bacteria 20 Subkingdom 2 Bacteria 30 Phylum 3 Bacteria 40 Subphylum 4 Bacteria 50 Superclass 5 Bacteria 60 Class 6 Bacteria 70 Subclass 7 Bacteria 80 Infraclass 8 Bacteria 90 Superorder 9 Bacteria 100 Order 10 Bacteria 110 Suborder 11 Bacteria 120 Infraorder 12 Bacteria 130 Superfamily 13 Bacteria 140 Family 14 Bacteria 150 Subfamily 15 Bacteria 160 Tribe 16 Bacteria 170 Subtribe 17 Bacteria 180 Genus 18 Bacteria 190 Subgenus 19 Bacteria 220 Species 20 Bacteria 230 Subspecies 21 Protozoa 10 Kingdom 22 Protozoa 20 Subkingdom 23 Protozoa 25 Infrakingdom 24 Protozoa 30 Phylum 25 Protozoa 40 Subphylum 26 Protozoa 45 Infraphylum 27 Protozoa 47 Parvphylum 28 Protozoa 50 Superclass 29 Protozoa 60 Class .. ... ... ... 150 Chromista 190 Subgenus 151 Chromista 200 Section 152 Chromista 210 Subsection 153 Chromista 220 Species 154 Chromista 230 Subspecies 155 Chromista 240 Variety 156 Chromista 250 Subvariety 157 Chromista 260 Form 158 Chromista 270 Subform 159 Archaea 10 Kingdom 160 Archaea 20 Subkingdom 161 Archaea 30 Phylum 162 Archaea 40 Subphylum 163 Archaea 50 Superclass 164 Archaea 60 Class 165 Archaea 70 Subclass 166 Archaea 80 Infraclass 167 Archaea 90 Superorder 168 Archaea 100 Order 169 Archaea 110 Suborder 170 Archaea 120 Infraorder 171 Archaea 130 Superfamily 172 Archaea 140 Family 173 Archaea 150 Subfamily 174 Archaea 160 Tribe 175 Archaea 170 Subtribe 176 Archaea 180 Genus 177 Archaea 190 Subgenus 178 Archaea 220 Species 179 Archaea 230 Subspecies Search by scientific name pytaxize.searchbyscientificname(x="Tardigrada") combinedname tsn 0 Rotaria tardigrada 58274 1 Notommata tardigrada 58898 2 Pilargis tardigrada 65562 3 Tardigrada 155166 4 Heterotardigrada 155167 5 Arthrotardigrada 155168 6 Mesotardigrada 155358 7 Eutardigrada 155362 8 Scytodes tardigrada 866744 Get accepted names from tsn pytaxize.getacceptednamesfromtsn('208527') If accepted, returns the same id ...

December 26, 2014 · 3 min · Scott Chamberlain

taxize workflows

A missed chat on the rOpenSci website the other day asked: Hi there, i am trying to use the taxize package and have a .csv file of species names to run through taxize updating them. What would be the code i would need to run to achieve this? One way to answer this is to talk about the basic approach to importing data, doing stuff to the data, then recombining data. There are many ways to do this, but I’ll go over a few of them. ...

December 2, 2014 · 5 min · Scott Chamberlain

Taxonomy data from the web in three languages

Eduard Szöcs and I started developing a taxonomic toolbelt for the R language a while back , which lets you interact with a multitude of taxonomic databases on the web. We have a paper in F1000Research if you want to find out more (see here). I thought it would be fun to rewrite some of taxize in other languages to learn more languages. Ruby and Python made the most sense to try. I did try others (Julia, Node), but gave up on those for now. The goal here isn’t to port taxize to Python and Ruby right now - it’s for me to learn myself some coding. ...

September 27, 2013 · 2 min · Scott Chamberlain

One R package for all your taxonomic needs

UPDATE: there were some errors in the tests for taxize, so the binaries aren’t avaiable yet. You can install from source though, see below. Getting taxonomic information for the set of species you are studying can be a pain in the ass. You have to manually type, or paste in, your species one-by-one. Or, if you are lucky, there is a web service in which you can upload a list of species. Encyclopedia of Life (EOL) has a service where you can do this here. But is this reproducible? No. ...

December 6, 2012 · 10 min · Scott Chamberlain

Getting taxonomic names downstream

It can be a pain in the ass to get taxonomic names. For example, I sometimes need to get all the Class names for a set of species. This is a relatively easy problem using the ITIS API (example below). The much harder problem is getting all the taxonomic names downstream. ITIS doesn’t provide an API method for this - well, they do (getHirerachyDownFromTSN), but it only provides direct children (e.g., the genera within a tribe - but it won’t give all the species within each genus). ...

October 16, 2012 · 3 min · Scott Chamberlain