Working at Fred Hutchinson Cancer Center

Soooo, my last job at Deck was amazing. I loved it. I was doing data engineer stuff there, mostly maintaining infrastructure for data pipelines. Everyone was great and the mission was amazing: helping Democrats win. Yet the company was shut down about a month ago, sending me on another job search, the 3rd since early/mid 2021. I’m super thrilled to have landed a job (Software and Reproducibility Software Developer) at the Fred Hutch Data Science Lab (DASL), headed up by Jeff Leek, working with Sean Kross, Amy Paguirigan, and Monica Gerber, among many other amazing folks. ...

October 6, 2023 · 1 min · Scott Chamberlain

Python, ast, and redbaron

I recently had a use case at work where I wanted to check that file paths given in a Python script actually existed. These paths were in various GitHub repositories, so all I had to do was pull out the paths and check if they exist on GitHub. There were a few catches though. First, I couldn’t simply get any string out of each Python script - they needed to be strings specficied by a specific function parameter, and match a regex (e.g., start with ‘abc’). ...

April 18, 2023 · 4 min · Scott Chamberlain

List comprehension vs. filter vs. key lookup

I was working on a work task last week, and needed to filter out one instance of a class from a list of class instances. No matter how you do this speed doesn’t matter too much if you’re doing this operation once or a few times. However, I this operation needs to be done about 100K times each time the script runs - so speed definitely does matter in this case. ...

April 18, 2022 · 3 min · Scott Chamberlain

Notes on Python

It’s been interesting switching jobs with respect to programming languages. I used to write 95% R - now I write 95% Python. I have been using Python for many years, but not seriously or getting paid either. I’ve learned alot in the first 6 months. Some Python things learned: Functions and methods I used to think functions and methods were the same thing. But during the last 6 months I learned that functions and methods are not the same. Well, they’re not that different. A function outside a class is just called a function while a function inside a class is called a method. They could be exactly the same and do the same thing, but one is outside a class and the other inside a class. ...

February 7, 2022 · 2 min · Scott Chamberlain

habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc. (e.g., books, articles), search for publishing members, for funders, for journals, for DOI prefixes, and for licenses. It’s a powerful API with basically no rate limits, so you can work through lots of data quickly. ...

October 23, 2017 · 3 min · Scott Chamberlain

Crossref programmatic clients

I gave two talks recently at the annual Crossref meeting, one of which was a somewhat technical overview of programmatic clients for Crossref APIs. Check out the talk here. I talked about the motivation for working with Crossref data by writing code/etc. rather than going the GUI route, then went over the various clients, with brief examples. We (rOpenSci) have been working on the R client rcrossref for a while now, but I’m also working on the Python and Ruby clients for Crossref. In addition, the Ruby client has a CLI client inside. The Javascript client is worked on independently by ScienceAI. ...

November 30, 2015 · 3 min · Scott Chamberlain

pygbif - GBIF client for Python

I maintain an R client for the GBIF API, at rgbif. Been working on it for a few years, and recently been thinking that there should be a nice low level client for Python as well. I didn’t see one searching Github, etc. so I started working on one recently: pygbif It’s up on pypi. There’s not much in pygbif yet - I wanted to get something up to start getting some users to more quickly make the library useful to people. ...

November 12, 2015 · 2 min · Scott Chamberlain

pytaxize - low level ITIS functions

I’ve been working on a Python port of the R package taxize that I maintain. It’s still early days with this Python library, I’d love to know what people think. For example, I’m giving back Pandas DataFrame’s from most functions. Does this make sense? Installation sudo pip install git+git://github.com/sckott/pytaxize.git#egg=pytaxize Or git clone the repo down, and python setup.py build && python setup.py install Load library import pytaxize ITIS ping pytaxize.itis_ping() 'This is the ITIS Web Service, providing access to the data behind www.itis.gov. The database contains 665,266 scientific names (501,207 of them valid/accepted) and 122,735 common names.' Get hierarchy down from tsn pytaxize.gethierarchydownfromtsn(tsn = 161030) tsn rankName taxonName parentName parentTsn 0 161048 Class Sarcopterygii Osteichthyes 161030 1 161061 Class Actinopterygii Osteichthyes 161030 Get hierarchy up from tsn pytaxize.gethierarchyupfromtsn(tsn = 37906) author parentName parentTsn rankName taxonName tsn 0 Gaertn. ex Schreb. Asteraceae 35420 Genus Liatris 37906 Get rank names pytaxize.getranknames() kingdomname rankid rankname 0 Bacteria 10 Kingdom 1 Bacteria 20 Subkingdom 2 Bacteria 30 Phylum 3 Bacteria 40 Subphylum 4 Bacteria 50 Superclass 5 Bacteria 60 Class 6 Bacteria 70 Subclass 7 Bacteria 80 Infraclass 8 Bacteria 90 Superorder 9 Bacteria 100 Order 10 Bacteria 110 Suborder 11 Bacteria 120 Infraorder 12 Bacteria 130 Superfamily 13 Bacteria 140 Family 14 Bacteria 150 Subfamily 15 Bacteria 160 Tribe 16 Bacteria 170 Subtribe 17 Bacteria 180 Genus 18 Bacteria 190 Subgenus 19 Bacteria 220 Species 20 Bacteria 230 Subspecies 21 Protozoa 10 Kingdom 22 Protozoa 20 Subkingdom 23 Protozoa 25 Infrakingdom 24 Protozoa 30 Phylum 25 Protozoa 40 Subphylum 26 Protozoa 45 Infraphylum 27 Protozoa 47 Parvphylum 28 Protozoa 50 Superclass 29 Protozoa 60 Class .. ... ... ... 150 Chromista 190 Subgenus 151 Chromista 200 Section 152 Chromista 210 Subsection 153 Chromista 220 Species 154 Chromista 230 Subspecies 155 Chromista 240 Variety 156 Chromista 250 Subvariety 157 Chromista 260 Form 158 Chromista 270 Subform 159 Archaea 10 Kingdom 160 Archaea 20 Subkingdom 161 Archaea 30 Phylum 162 Archaea 40 Subphylum 163 Archaea 50 Superclass 164 Archaea 60 Class 165 Archaea 70 Subclass 166 Archaea 80 Infraclass 167 Archaea 90 Superorder 168 Archaea 100 Order 169 Archaea 110 Suborder 170 Archaea 120 Infraorder 171 Archaea 130 Superfamily 172 Archaea 140 Family 173 Archaea 150 Subfamily 174 Archaea 160 Tribe 175 Archaea 170 Subtribe 176 Archaea 180 Genus 177 Archaea 190 Subgenus 178 Archaea 220 Species 179 Archaea 230 Subspecies Search by scientific name pytaxize.searchbyscientificname(x="Tardigrada") combinedname tsn 0 Rotaria tardigrada 58274 1 Notommata tardigrada 58898 2 Pilargis tardigrada 65562 3 Tardigrada 155166 4 Heterotardigrada 155167 5 Arthrotardigrada 155168 6 Mesotardigrada 155358 7 Eutardigrada 155362 8 Scytodes tardigrada 866744 Get accepted names from tsn pytaxize.getacceptednamesfromtsn('208527') If accepted, returns the same id ...

December 26, 2014 · 3 min · Scott Chamberlain

icanhaz altmetrics

The Lagotto application is a Rails app that collects and serves up via RESTful API article level metrics data for research objects. So far, this application has only been applied to scholarly articles, but will see action on datasets soon. Martin Fenner has lead the development of Lagotto. He recently set up a discussion site if you want to chat about it. The application has a nice GUI interface, and a quite nice RESTful API. ...

December 8, 2014 · 3 min · Scott Chamberlain

Taxonomy data from the web in three languages

Eduard Szöcs and I started developing a taxonomic toolbelt for the R language a while back , which lets you interact with a multitude of taxonomic databases on the web. We have a paper in F1000Research if you want to find out more (see here). I thought it would be fun to rewrite some of taxize in other languages to learn more languages. Ruby and Python made the most sense to try. I did try others (Julia, Node), but gave up on those for now. The goal here isn’t to port taxize to Python and Ruby right now - it’s for me to learn myself some coding. ...

September 27, 2013 · 2 min · Scott Chamberlain