Notes on Python

It’s been interesting switching jobs with respect to programming languages. I used to write 95% R - now I write 95% Python. I have been using Python for many years, but not seriously or getting paid either. I’ve learned alot in the first 6 months. Some Python things learned: Functions and methods I used to think functions and methods were the same thing. But during the last 6 months I learned that functions and methods are not the same. Well, they’re not that different. A function outside a class is just called a function while a function inside a class is called a method. They could be exactly the same and do the same thing, but one is outside a class and the other inside a class. ...

February 7, 2022 · 2 min · Scott Chamberlain

Mocking HTTP redirects

You’ve experienced an HTTP redirect (or URL redirect, or URL forwarding) even if you haven’t noticed. We all use browsers (I assume, since you are reading this), either on a phone or laptop/desktop computer. Browsers don’t show all the HTTP requests going on in the background, some of which are redirects. Redirection is used for various reasons, including to prevent broken links when web pages are moved, for privacy protection, to allow multiple domains to refer to a single web page, and more. ...

November 27, 2021 · 4 min · Scott Chamberlain

API client design: how to deal with lots of parameters?

In February this year I wroute about how many parameters functions should have, looking at some other languages, with a detailed look at R. On a related topic … As I work on many R packages that are API clients for various web services, I began wondering: What is the best way to deal with API routes that have a lot of parameters? The general programming wisdom I’ve seen is that a function should have no more than 3-4 parameters (e.g., this long SO thread, or this one). So should one do anything different from a normal function when that function is connecting to a web API route with a lot of parameters? I’ve not found very much spilled ink on this exact topic, but I’ll discuss what I have found. ...

December 21, 2020 · 8 min · Scott Chamberlain

stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed. ...

September 10, 2020 · 8 min · Scott Chamberlain

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer. The previous versions of taxizedb used a variety of different databases (MySQL/MariaDB, PostgreSQL, SQLite), so the technical barrier to entry was pretty high. In the newest version just released, we’ve drastically simplified the database situation, among other things. ...

August 17, 2020 · 4 min · Scott Chamberlain

how many parameters?

Functions can have no parameters, or have a lot of parameters, or somewhere in between. How many parameters is too many? Does it even matter how many parameters there are in a function? There’s AFAIK no “correct” answer to this question. And surely the “best practice” varies among programming languages. What do folks say about this and what should we be doing in R? From other languages Many of the blog posts and SO posts on this topic cite the book Clean Code by “Uncle Bob”. I’ve not read the book, but it sounds worth a read. ...

February 10, 2020 · 5 min · Scott Chamberlain

finding truffles

The bad thing about making software is that you can sometimes make it easier for someone to shoot themselves in the foot. The good thing about software is that you can make more software to help them not shoot a foot off. The R package vcr, an R port of the Ruby library of the same name, records and plays back HTTP requests. Some HTTP requests can have secrets (e.g., passwords, API keys, etc.) in their requests and/or responses. These secrets can then accidentally end up on the Internet, where bad people may find them. These secrets are sometimes called “truffles”. ...

January 30, 2020 · 3 min · Scott Chamberlain

text mining, apis, and parsing api logs

Acquiring full text articles fulltext is an R package I maintain to obtain full text versions of research articles for text mining. It’s a hard problem, with a spaghetti web of code. One of the hard problems is figuring out what the URL is for the full text version of an article. Publishers do not have consistent URL patterns through time, and so you can not set rules once and never revisit them. ...

March 21, 2019 · 7 min · Scott Chamberlain

Exceptions in control flow in R

I was listening to a Bike Shed podcast episode 189, “It’s Gonna Work, Definitely, No Problems Whatsoever”, and starting at 27:44 there was a conversation about exception handling. Specifically it was about exception handling in control flow when doing web API requests. This topic piqued my interest straight away as I do a lot of API stuff (making and wrapping). The part of the conversation that I want to address is their conclusion that exceptions in control flow are an anti-pattern. Seems this is a general pattern in programming languages, e.g., this SO thread. But on the contrary there are some languages in which exceptions in control flow are considered normal behavior; e.g., Python (this, this). ...

March 4, 2019 · 9 min · Scott Chamberlain

Notes on porting Ruby to R

In doing a number of ports of Ruby gems to R (vcr, webmockr), I’ve noticed a few differences between the languages that are fun to dive into, at least for me. monkey patching Ruby has a nice thing where you can “monkey patch” classes/methods/etc. in other Ruby libraries. For example, lets say you have Ruby gems foo and bar. If foo has a method hello, you can override the hello method in foo with one from bar. AFAICT this is acceptable in gems on Rubygems.org and in general in the community. ...

February 19, 2019 · 4 min · Scott Chamberlain