Recology

sixtyfour: writing robust code against AWS

Introduction At work (Fred Hutch Cancer Center) we’ve been working on an R package (sixtyfour) over the last ~1.5 years. This post is a quick intro to the package with some learnings about working with AWS. sixtyfour is a science-focused, more humane R interface to AWS. It is built on top of the great paws package maintained by Dyfan Jones, which handles authentication and the low-level work of interacting with AWS services while sixtyfour provides higher-level interfaces for common tasks. ...

Picking a web framework

Over the past few months at work we’ve been looking into web framework options for a project that currently uses R Shiny. Our needs have grown beyond what Shiny can provide. We decided to try Rails, SvelteKit, and HTMX/FastAPI to see which one would best suit our needs. The frameworks Info and links on each: Rails Main language: Ruby Docs: Rails Guides Svelte Kit Main language: JavaScript Docs: Svelte Kit Docs HTMX/FastAPI Main language: Python Docs: HTMX Docs, FastAPI Docs Build minimal apps I built a minimal version of our existing Shiny app using each framework. For each app I created only the landing page, and two other pages - in total these three pages encompassed a lot of the complexity of the app, and allowed me to get a feel for how it would be to use each framework IRL. Because of reasons I’m not going to share these minimal apps. ...

Notes on The Emperor of All Maladies

For work I’m reading and taking notes on my blog about cancer biology texts. The following are notes on most chapters of The Emperor of All Maladies by Siddhartha Mukherjee. Some of these are not actual chapters (e.g., “Leukemia” is not a chapter). I listened (during many episodes getting a kid to sleep in the middle of the night) to this book borrowed from my local library with the Libby app. Leukemia Book started off with author taking about his treatment of a leukemia patient. Noting that is a special type of cancer, in that it is of white blood cells, so can be quantified, and it is rather rare, but especially vicious and deadly. Remote Sympathy At one point “remote sympathy” was a suggested treatment, practice sympathy and palliative care Anesthesia: first used in 1846 Antiseptic: first used in 1860s Those two creations (anesthesia and antiseptic) paved the way for surgery to remove tumors that wouldn’t be a major risk to the patient (less pain during, and not dying from infection afterwards) A Radical Idea (Halsted and radical surgery) “radical surgery”: Halsted started this originally as digging up the root of the tumor, but then went into removing not just the tumor but muscle and bones and glands behind it, often permanently disfiguring people - this apparently took hold after he showed evidence that it worked well for local tumor removal - even though patients most often still died within 3 years because the cancer spread despite the radical surgery Radiology (the hard tube and the weak light) someone noticed (about 1890?) that certain things gave off x-rays and would be able to give a picture of the insides of the human body. radium was one of these - which lead to the watch maker situation where women died of cancer from radium. But cancer treatment of radiation came from this and still used today. Dyeing and Dying The industry involved in dyeing fabrics led to many discoveries, including Treatments for bacterial infections in humans Paul Ehrlich came up with concept of chemotherapy though he used the phrase “curative substances” Paul Ehrlich came up with some of the first antibiotics from these dye chemicals After tackling diseases of humans caused by bacteria, Ehrlich wanted to address cancer, but could not find any chemicals that worked 2 years after Ehrlich’s death, mustard gas was used by germans (and maybe others) in World War 1. A pair of people studied the affects of this gas on the few that survived, and published their results which should have interested cancer researchers, but because of war wasn’t picked up until much later Poisoning the Atmosphere 70 tons of mustard gas was on an allied battle ship in WW2 and germans attacked, and the mustard gas killed a lot of people (an “own goal”) From that event two researchers were curious if mustard gas would be effective against white blood cells because they attacked VAMP A four drug regimen for leukemia in children, idea was radical at the time, thought was you give one drug at a time then try the next etc Happened at the NCI Fry and Fryberg led the trial Most had remission but then all but 5% died after leukemia passed through blood brain barrier into the brain an anatomist’s tumor started describing two cases: hodgkin’s lymphoma, and pancreatic cancer - where the first had a high chance of recovering and the latter very little chance then told story of a tiny woman with a huge personality that had some cancer, that they treated but realized they had nothing else to try told story of hodgkin who described hodgkin’s lymphoma - how he came to study anatomy and the importance of his work story of Kaplan who focued on treating hodgkin’s lymphoma with very intense xrays adapted from a linear accelerator - and get some success maybe curing very early stages of hodgkin’s lymphoma in a randomized trial An army on the march no notes Moonshot for cancer right after landing on moon so coined moonshot Legislation was a independent federal agency but it got watered down to just a couple hundred million I think Cancer group that promoted it gave up the smiling oncologist no notes knowing the enemy no notes Halsted’s Ashes no notes counting cancer stories of quantifying cancer shocking realization that death from cancer was not decreasing but increasing realization that perhaps prevention needed to be made more important because fighting cancer when it’s already in the body is not easy prevention is the cure no notes the emperor’s nylon stockings big study on cause of lung cancer on doctors in the UK. a thief in the night no notes a statement of warning ftc was going to be a warning on cigarettes but industry asked for congress legislation instead and it was much softer fairness doctrine: young lawyer thought it up, but then industry voluntarily removed tv ads so they didn’t have to suffer negative ads Vignette of a woman victim with lung cancer: that was used to sue tobacco companies for not sharing everything they knew and court let them see lots of internal communications which lead to lots of embarrassing things even though case was mostly lost Despite major victories in the us, companies have just moved to other countries where they’re paying off politicians etc and doing the same thing curiouser and curiouser discovery of hepatitis b virus by someone originally thought it was a genetic linked trait but then found that a person changed from negative to positive and so couldn’t have been genetic They then found a vaccine for it gastritis investigation: helicobacter pylori was found, apparently gastritis can predispose to cancer, researcher infected himself to confirm it, a spiders web Papanicolaou came up with Pap smear originally from studying pig menstruation screening tests for breast cancer: history of screening trials for mammograms, fraught with many bad trials that messed up the randomization stamp megadose chemotherapy - just means giving very large (larger than previously used) doses - can’t find mention of the term specifically, but there is high-dose chemotherapy the map and the parachute megadose chemotherapy took hold and even insurers started paying for it but trials finally happened and all but one showed it wasn’t worth it That one trial was in South Africa and megadose was probably egged on by that trial alone. In 1999 Americans went to South Africa to look at the data. It was poorly kept and the doctor didn’t turn over everything, and many patients that were called cured or in remission were missing or dying, And there were ethical issues too as it appeared the patients may not have known what they were having done a distorted version of our normal selves Mendel and peas 1909 word “gene” coined but no understanding of what it was Thomas Hunt Morgan : proposed that genes were on chromosomes Oswald Avery: bacteriologist, 1926 genes could transmit laterally between bacteria. In 1944 they found that genes was actually dna a student of Morgan’s George Beetle studied slime molds and figured out the dna to protein pathway A few others figured out that rna was in between dna and proteins morgan former student had a theory of mutagenesis that potentially all cancers were due to mutations in genes but wasn’t widely accepted under the lamps of viruses Rous of rsv had a very strong personality and pushed the field away from any explanation of cancer causation other than viral Howard Temin researching RSV postulated that it could turn rna into dna, but years later finally via a post doc got proof for that happening - these viruses that make reverse transcription happen are called retro-viruses Massive hunt for proof of retro viruses causing cancer but only 1 case found a rare cancer But retro virus research paid off as it was found as cause for HIV the hunting of the sarc [[oncogene]]: a gene capable of causing cancer endogenous SARC found in cells virus rsv probably just picked up SARC from a cancer cell originally lesson was that radiation or other exogenous factor caused cancer by activating endogenous proto oncogene the wind in the trees a scientist found that a CMS leukemia had chromosomal translocation, the heads of chromosomes 9 and 22 were swapped This kinda changed the thought of cancer from chaos to organized chaos retinoblastoma: two forms: inherited form and non inherited sporadic form, inherited always has tumors in two eyes and always one in sporadic form. a risky prediction many labs raced to isolate an oncogene , one was the Weinberg lab All extracted the same gene: RAS RAS carefully modulates a protein in normal cells but goes crazy in cancer cells br gene 🧬 Brca1 gene, brca2 gene 🧬 Transgenic mice in mid 80s came along Mice for studying cancer were used afterwards . Oncomouse the hallmarks of cancer vogelstein: research trying to find how many changes are required to create a cancer cell RAS to mec to ERS pathway resists can death signals Can activate pathways to accumulate blood delivery to itself to grow faster, tumor angiogenesis By early 90s cancer researchers could model a cancer type by its underlying genes paper written: hallmarks of cancer the fruits of long endeavors anecdotes about authors patients and Jimmy fund person comes forward after 50 years new drugs for old cancers no notes a city of strings recombinant DNA was the technology that started Genentech Genentech started 1976 Most of early wins were production of old drugs that were hard to collect from animals 1984 [[HER2]] found by Genentech scientists. But they didn’t know what to do with it. (“neu” was the predecessor found in Weinberg’s lab) drugs bodies and proof chapter covered [[HER2]] positive drug created at Genentech A woman died waiting to get access to the drug set off protests at the south San Francisco office, and then company agreed to a trial Trial was a huge success and drug was approved a four minute mile BCR-ABL gene is a fusion of two separate genes caused by a translocation of chromosomes 9 and 22 that causes chronic myeloid leukemia (CML) gleevec: name for a drug targeting CML Gleevec made by Novartis Druker had to beg Novartis to release some of the drug first person treated was a train conductor CML used to be fatal and live for 3-6 years but with gleevec they can likely survive rest of their life taking it the whole time the red queen’s race CML became resistant in some patients to gleevec and BMS made a new drug that acted different and it works for those patients Framingham long term study following everyone living there Social network study of the above: found that smoking was best explained by the social network, with isolated people facing the worst unlikely to quit smoking red queens race https://en.wikipedia.org/wiki/Red_Queen's_race?wprov=sfti1 thirteen mountains quote: “cancer is a genetic disease” returned to story of his patient Carla. Visited her to declare that they beat her cancer cancer genome atlas project released in 2008 most cancers have a lot of mutations, e.g, 50 or so, but some cancers have very few, only a half dozen or so, and are more easily treatable. On the other hand if there’s very few for a cancer maybe it takes less damage/time for a cancer to start? https://bsky.app/profile/longreads.com/post/3lkefphassh2i passenger mutations are ones that have no effect driver mutations are ones that potentially contribute to cancer - mutations to an [[oncogene]] Attosa’s war Attosa, an Achaemenid princess (who lived 550–475 BC) , who had cancer, was used as a way to say how she would have been treated through time (referenced in her wikipedia article) - rather through distinct changes in cancer treatment through time - this was a very cool way to summarize parts of the book and cancer treatment history story at very end was about a women with cancer and how she didn’t make it in the end, but author realized later after she’d died what certain things meant (at least what he thought they meant) end thoughts This is the only book I’ve read about cancer- having said that, it was a great book. Maybe if I knew more about cancer already I’d have a more discerning palate and more to say. ...

uv notes

What is uv I’ve recently started using uv to manage Python projects and packages: many projects at work and the one active package I have on pypi.org (habanero). I don’t really know enough about all the various Python tools similar to uv to give an informed opinion. Rather, this is purely reflections on using uv. uv’s tagline is: An extremely fast Python package and project manager, written in Rust. uv docs and source code. ...

webmockr v2: another day, another stub

webmockr v2 is here. You can find the source at https://github.com/ropensci/webmockr, and the docs at https://docs.ropensci.org/webmockr. There’s some big changes in this version; most importantly a breaking change, thus the major version change this time. Here’s a run down of the important items in this release. Installation pak::pak("webmockr") The breaking change: error handling Previous to v2 when stubs were constructed starting with stub_request() if an error occurred in a pipe chain, or non-pipe flow, the stub created prior to the error remained. This was not correct behavior from a logical perspective - i.e., one would expect if an error occurred that the thing they were trying to do did not stick around. The new behavior as of v2 deletes the stub upon any error during its creation. Under the hood we’re using withCallingHandlers() to handle different types of errors, throw warnings, etc. For example, wi_th() only accepts types list and partial, so fails with this code: ...

Keeping internal function examples alive

While reviewing an R package at work I realized I wasn’t totally sure what advice to give about examples for internal functions in a package. That is, there’s an R package. The package has some exported functions, and some internal functions that are not exported. Internal functions are not loaded when the package loads so the normal flow of running examples under roxygen2 tag @examples doesn’t work (assuming you don’t prevent it from running any of various ways). ...

Notes on A Biology Primer for Computer Scientists

Since most of my education has taken place above the organism level, and since my current job concerns sub-organism processes, I want to get more familiar with those sub-organism things. So I’m reading and taking notes on my blog about the stuff I’m reading. First off is the PDF compiled by Franco Preparata from Brown University called “A Biology Primer for Computer Scientists” at https://web.stanford.edu/class/cs173/papers/bioprimer.pdf section 1 life is defined as being able to replicate, and that’s possible with DNA section 2 chemical composition chemical makeup of life is largely composed of carbon, hydrogen, nitrogen and oxygen. molecules have different binding strengths and therefore different energy levels to break them molecules are held together by many types of bonds, one of which is the covalent bond an agent that aids a chemical reaction is a catalyst; a biological catalyst is an enzyme an enzyme itself is a molecule; and enzyme has a specific shape that matches a specific reagent it will catalyze building blocks of living organisms are biomolecules; basic ones are sugars, fatty acids, amino acids, and nucleotides. two important types of sugars are ribose and deoxyribose amino acids are particularly important, of which there are 20 types amino acids make up proteins; polysaccharides made of large carbohydrates; section 3 nucleic acids building blocks of nucleic acids are nucleotides a nucleotide is made up of three components: a base B, a sugar S and a phosphoric acid P. there’s 8 different nucleotide types (G, A, C, and T in DNA, and G, A, C, U in RNA) the S sugar of each nucleotide is called ribose; nucleotides polymerize as nucleic acids, either DNA or RNA DNA Is double stranded; RNA Is single stranded in DNA the two strands orient opposite directions with 5’ to 3’ and the other strand going 3’ to 5' in DNA, each strand has nucleic acids that bind to the complementary nucleic acid in the other strand section 4 fundamental cell processes three major processes occur in the cell: DNA replication, DNA-RNA transcription and RNA-protein translation. section 5 DNA replication DNA replication is the process by which a double-stranded DNA sequences produces two double-stranded sequences identical to the original one replication always proceeds from the 5’ end to the 3’ end - therefore goes in opposite directions in each strand (leading strand is 5’-3’, lagging strand is 3’-5') DNA polymerases facilitate the replication For the leading strand a string of about 200 bases indicates where to start For the lagging strand each Okazaki fragment is initiated by an RNA string synthesized by a specific enzyme section 6 DNA-RNA transcription transcription only uses the so called “genomic” strand, the one that goes from 5’ to 3' machinery that does transcription is called RNA-polymerase. it separates two DNA strands along a short area and transcription occurs along the short exposed strand, and DNA strands rejoin as the process proceeds In DNA replication, DNA is replicated in its entirety, whereas transcription is selective both in space (only certain substrings of DNA are transcribed) and time (depending on environment). Different types of RNA: mRNA (messenger; involved in RNA-protein translation), rRNA (ribosomal; participate in the structure of the ribosome), tRNA (transfer; assume a rigid 3d configuration acting as linkages between mRNA and protein chains), snRNA (small nuclear; excision of introns and splicing of exons) section 7 RNA-protein translation aka Protein synthesis DNA is segmented into triplets of nucleotides; each triplet == codon; each codon is individually translated into an amino acid there are 64 codons there are 20 amino acids (so each amino acid is encoded by more than 1 codon) translation occurs in the ribosome. ribosome can be compared to a tape reader (an mRNA sequence) that also produces an output tape (a protein). tRNA is also required (and are specific to a codon/amino acid pair?) section 8 protein structure proteins are polymers (polypeptides) amino acids fully specify a protein, BUT it is its spatial arrangement that determines its function how polypeptides fold is not fully understood protein structure levels primary: the linear sequence of amino acids secondary: local folding patterns such as alpha helices and beta sheets tertiary: complete 3D shape of a single polypeptide chain quaternary: arrangement of multiple polypeptide subunits within a protein complex

cowsay v1

cowsay is a command line program written in Perl. The original version had a final release in 2016 (that’s the version of many installed cowsay programs) and there’s a number of forks of that release in Perl. There are also many many versions of cowsay in other programming languages, like the one I maintain written in R, unimaginatively called cowsay. I wrote about cowsay here back in 2014. I didn’t think this would ever be 300+ stars popular, but here we are. Given that people seem to actually use it - or at least star it - seems worth putting some more time into it. ...

Software rules and Quarto

At work I’ve been using Quarto quite a bit for website and books for work projects. One of the projects I’ve been working on lately that uses Quarto is the WILDS Contributor Guide (WILDS = Workflows Integrating Large Data and Software). This guide (a Quarto book) is mostly a guide for our own immediate team members, but aims to a) be a guide for any contributors to our open source software work, and b) demonstrate good open source software practices for the greater Fred Hutch community where we work. ...

Refactoring notes

I worked on a refactor of an R package at work the other day. Here’s some notes about that after doing the work. This IS NOT a best practices post - it’s just a collection of thoughts. For context, the package is an API client. It made sense to break the work for any given exported function into the following components, as applicable depending on the endpoint being handled (some endpoints needed just a few lines of code, so those funtions were left unchanged): ...