Python, ast, and redbaron

I recently had a use case at work where I wanted to check that file paths given in a Python script actually existed. These paths were in various GitHub repositories, so all I had to do was pull out the paths and check if they exist on GitHub. There were a few catches though. First, I couldn’t simply get any string out of each Python script - they needed to be strings specficied by a specific function parameter, and match a regex (e.g., start with ‘abc’). ...

April 18, 2023 · 4 min · Scott Chamberlain

CRAN Checks API and Badges

TL;DR In 6 months (end of November 2022) the CRAN Checks API https://cranchecks.info/ will be gone You can still get badges at https://badges.cranchecks.info You can use the new badges like: [![cran checks](https://badges.cranchecks.info/worst/dplyr.svg)](https://cran.r-project.org/web/checks/check_results_dplyr.html) Find more details at https://github.com/sckott/cchecksbadges Sunsetting the CRAN Checks API If you contribute an R package to CRAN, you may use badges from the CRAN checks API at https://cranchecks.info/. The CRAN Checks API has been operating since about September 2017 (I think). The API has a number of routes, but really people only use the badges. ...

June 2, 2022 · 2 min · Scott Chamberlain

List comprehension vs. filter vs. key lookup

I was working on a work task last week, and needed to filter out one instance of a class from a list of class instances. No matter how you do this speed doesn’t matter too much if you’re doing this operation once or a few times. However, I this operation needs to be done about 100K times each time the script runs - so speed definitely does matter in this case. ...

April 18, 2022 · 3 min · Scott Chamberlain

Notes on Python

It’s been interesting switching jobs with respect to programming languages. I used to write 95% R - now I write 95% Python. I have been using Python for many years, but not seriously or getting paid either. I’ve learned alot in the first 6 months. Some Python things learned: Functions and methods I used to think functions and methods were the same thing. But during the last 6 months I learned that functions and methods are not the same. Well, they’re not that different. A function outside a class is just called a function while a function inside a class is called a method. They could be exactly the same and do the same thing, but one is outside a class and the other inside a class. ...

February 7, 2022 · 2 min · Scott Chamberlain

Mocking HTTP redirects

You’ve experienced an HTTP redirect (or URL redirect, or URL forwarding) even if you haven’t noticed. We all use browsers (I assume, since you are reading this), either on a phone or laptop/desktop computer. Browsers don’t show all the HTTP requests going on in the background, some of which are redirects. Redirection is used for various reasons, including to prevent broken links when web pages are moved, for privacy protection, to allow multiple domains to refer to a single web page, and more. ...

November 27, 2021 · 4 min · Scott Chamberlain

API client design: how to deal with lots of parameters?

In February this year I wroute about how many parameters functions should have, looking at some other languages, with a detailed look at R. On a related topic … As I work on many R packages that are API clients for various web services, I began wondering: What is the best way to deal with API routes that have a lot of parameters? The general programming wisdom I’ve seen is that a function should have no more than 3-4 parameters (e.g., this long SO thread, or this one). So should one do anything different from a normal function when that function is connecting to a web API route with a lot of parameters? I’ve not found very much spilled ink on this exact topic, but I’ll discuss what I have found. ...

December 21, 2020 · 8 min · Scott Chamberlain

stories behind archived packages

Update on 2021-02-09: I’ve archived 8 more packages. Post below updated Code is often arranged in packages for any given language. Packages are often cataloged in a package registry of some kind: NPM for node, crates.io for Rust, etc. For R, that registry is either CRAN or Bioconductor (for the most part). CRAN has the concept of an archived package. That is, the namespace for a package (foo) is still in the registry (and can not be used again), but the package is archived - no longer gets updated and checks I think are no longer performed. ...

September 10, 2020 · 8 min · Scott Chamberlain

taxizedb: an update

taxizedb arose from pain in using taxize when dealing with large amounts of data in a single request or doing a lot of requests of any data size. taxize works with remote data sources on the web, so there’s a number of issues that can slow the response down: internet speed, server response speed (was a response already cached or not; or do they even use caching), etc. The idea with taxizedb was to allow users to do the same things as taxize allows, but much faster by accessing the entire database for a data source on their own computer. The previous versions of taxizedb used a variety of different databases (MySQL/MariaDB, PostgreSQL, SQLite), so the technical barrier to entry was pretty high. In the newest version just released, we’ve drastically simplified the database situation, among other things. ...

August 17, 2020 · 4 min · Scott Chamberlain

how many parameters?

Functions can have no parameters, or have a lot of parameters, or somewhere in between. How many parameters is too many? Does it even matter how many parameters there are in a function? There’s AFAIK no “correct” answer to this question. And surely the “best practice” varies among programming languages. What do folks say about this and what should we be doing in R? From other languages Many of the blog posts and SO posts on this topic cite the book Clean Code by “Uncle Bob”. I’ve not read the book, but it sounds worth a read. ...

February 10, 2020 · 5 min · Scott Chamberlain

finding truffles

The bad thing about making software is that you can sometimes make it easier for someone to shoot themselves in the foot. The good thing about software is that you can make more software to help them not shoot a foot off. The R package vcr, an R port of the Ruby library of the same name, records and plays back HTTP requests. Some HTTP requests can have secrets (e.g., passwords, API keys, etc.) in their requests and/or responses. These secrets can then accidentally end up on the Internet, where bad people may find them. These secrets are sometimes called “truffles”. ...

January 30, 2020 · 3 min · Scott Chamberlain