Working at Fred Hutchinson Cancer Center

Soooo, my last job at Deck was amazing. I loved it. I was doing data engineer stuff there, mostly maintaining infrastructure for data pipelines. Everyone was great and the mission was amazing: helping Democrats win. Yet the company was shut down about a month ago, sending me on another job search, the 3rd since early/mid 2021. I’m super thrilled to have landed a job (Software and Reproducibility Software Developer) at the Fred Hutch Data Science Lab (DASL), headed up by Jeff Leek, working with Sean Kross, Amy Paguirigan, and Monica Gerber, among many other amazing folks....

October 6, 2023 · 1 min · Scott Chamberlain

Python, ast, and redbaron

I recently had a use case at work where I wanted to check that file paths given in a Python script actually existed. These paths were in various GitHub repositories, so all I had to do was pull out the paths and check if they exist on GitHub. There were a few catches though. First, I couldn’t simply get any string out of each Python script - they needed to be strings specficied by a specific function parameter, and match a regex (e....

April 18, 2023 · 4 min · Scott Chamberlain

List comprehension vs. filter vs. key lookup

I was working on a work task last week, and needed to filter out one instance of a class from a list of class instances. No matter how you do this speed doesn’t matter too much if you’re doing this operation once or a few times. However, I this operation needs to be done about 100K times each time the script runs - so speed definitely does matter in this case....

April 18, 2022 · 3 min · Scott Chamberlain

Notes on Python

It’s been interesting switching jobs with respect to programming languages. I used to write 95% R - now I write 95% Python. I have been using Python for many years, but not seriously or getting paid either. I’ve learned alot in the first 6 months. Some Python things learned: Functions and methods I used to think functions and methods were the same thing. But during the last 6 months I learned that functions and methods are not the same....

February 7, 2022 · 2 min · Scott Chamberlain

habanero update: Crossref data from Python

I wrote about Crossref clients back nearly two years ago on this blog: Crossref programmatic clients. Since it’s been a while, it seems worth talking again about the the many ways to work programmatically with Crossref data - and focus in on the Python client habanero since it has some recent updates. The 3 clients work with the main Crossref API, which lets you do things like search for works by title, author, etc....

October 23, 2017 · 3 min · Scott Chamberlain

Crossref programmatic clients

I gave two talks recently at the annual Crossref meeting, one of which was a somewhat technical overview of programmatic clients for Crossref APIs. Check out the talk here. I talked about the motivation for working with Crossref data by writing code/etc. rather than going the GUI route, then went over the various clients, with brief examples. We (rOpenSci) have been working on the R client rcrossref for a while now, but I’m also working on the Python and Ruby clients for Crossref....

November 30, 2015 · 3 min · Scott Chamberlain

pygbif - GBIF client for Python

I maintain an R client for the GBIF API, at rgbif. Been working on it for a few years, and recently been thinking that there should be a nice low level client for Python as well. I didn’t see one searching Github, etc. so I started working on one recently: pygbif It’s up on pypi. There’s not much in pygbif yet - I wanted to get something up to start getting some users to more quickly make the library useful to people....

November 12, 2015 · 2 min · Scott Chamberlain

pytaxize - low level ITIS functions

I’ve been working on a Python port of the R package taxize that I maintain. It’s still early days with this Python library, I’d love to know what people think. For example, I’m giving back Pandas DataFrame’s from most functions. Does this make sense? Installation sudo pip install git+git://github.com/sckott/pytaxize.git#egg=pytaxize Or git clone the repo down, and python setup.py build && python setup.py install Load library import pytaxize ITIS ping pytaxize.itis_ping() 'This is the ITIS Web Service, providing access to the data behind www....

December 26, 2014 · 3 min · Scott Chamberlain

icanhaz altmetrics

The Lagotto application is a Rails app that collects and serves up via RESTful API article level metrics data for research objects. So far, this application has only been applied to scholarly articles, but will see action on datasets soon. Martin Fenner has lead the development of Lagotto. He recently set up a discussion site if you want to chat about it. The application has a nice GUI interface, and a quite nice RESTful API....

December 8, 2014 · 3 min · Scott Chamberlain

Taxonomy data from the web in three languages

Eduard Szöcs and I started developing a taxonomic toolbelt for the R language a while back , which lets you interact with a multitude of taxonomic databases on the web. We have a paper in F1000Research if you want to find out more (see here). I thought it would be fun to rewrite some of taxize in other languages to learn more languages. Ruby and Python made the most sense to try....

September 27, 2013 · 2 min · Scott Chamberlain