List comprehension vs. filter vs. key lookup

I was working on a work task last week, and needed to filter out one instance of a class from a list of class instances. No matter how you do this speed doesn’t matter too much if you’re doing this operation once or a few times. However, I this operation needs to be done about 100K times each time the script runs - so speed definitely does matter in this case....

April 18, 2022 · 3 min · Scott Chamberlain

Notes on Python

It’s been interesting switching jobs with respect to programming languages. I used to write 95% R - now I write 95% Python. I have been using Python for many years, but not seriously or getting paid either. I’ve learned alot in the first 6 months. Some Python things learned: Functions and methods I used to think functions and methods were the same thing. But during the last 6 months I learned that functions and methods are not the same....

February 7, 2022 · 2 min · Scott Chamberlain

Mocking HTTP redirects

You’ve experienced an HTTP redirect (or URL redirect, or URL forwarding) even if you haven’t noticed. We all use browsers (I assume, since you are reading this), either on a phone or laptop/desktop computer. Browsers don’t show all the HTTP requests going on in the background, some of which are redirects. Redirection is used for various reasons, including to prevent broken links when web pages are moved, for privacy protection, to allow multiple domains to refer to a single web page, and more....

November 27, 2021 · 4 min · Scott Chamberlain

API client design: how to deal with lots of parameters?

In February this year I wroute about how many parameters functions should have, looking at some other languages, with a detailed look at R. On a related topic … As I work on many R packages that are API clients for various web services, I began wondering: What is the best way to deal with API routes that have a lot of parameters? The general programming wisdom I’ve seen is that a function should have no more than 3-4 parameters (e....

December 21, 2020 · 8 min · Scott Chamberlain

stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed....

September 10, 2020 · 8 min · Scott Chamberlain

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer....

August 17, 2020 · 4 min · Scott Chamberlain

how many parameters?

Functions can have no parameters, or have a lot of parameters, or somewhere in between. How many parameters is too many? Does it even matter how many parameters there are in a function? There’s AFAIK no “correct” answer to this question. And surely the “best practice” varies among programming languages. What do folks say about this and what should we be doing in R? From other languages Many of the blog posts and SO posts on this topic cite the book Clean Code by “Uncle Bob”....

February 10, 2020 · 5 min · Scott Chamberlain

finding truffles

The bad thing about making software is that you can sometimes make it easier for someone to shoot themselves in the foot. The good thing about software is that you can make more software to help them not shoot a foot off. The R package vcr, an R port of the Ruby library of the same name, records and plays back HTTP requests. Some HTTP requests can have secrets (e.g., passwords, API keys, etc....

January 30, 2020 · 3 min · Scott Chamberlain

text mining, apis, and parsing api logs

Acquiring full text articles fulltext is an R package I maintain to obtain full text versions of research articles for text mining. It’s a hard problem, with a spaghetti web of code. One of the hard problems is figuring out what the URL is for the full text version of an article. Publishers do not have consistent URL patterns through time, and so you can not set rules once and never revisit them....

March 21, 2019 · 7 min · Scott Chamberlain

Exceptions in control flow in R

I was listening to a Bike Shed podcast episode 189, “It’s Gonna Work, Definitely, No Problems Whatsoever”, and starting at 27:44 there was a conversation about exception handling. Specifically it was about exception handling in control flow when doing web API requests. This topic piqued my interest straight away as I do a lot of API stuff (making and wrapping). The part of the conversation that I want to address is their conclusion that exceptions in control flow are an anti-pattern....

March 4, 2019 · 9 min · Scott Chamberlain