scrubr is an R library for cleaning species occurrence records. It’s general purpose, and has the following approach:
We think using a piping workflow (%>%) makes code easier to build up, and easier to understand. However, you don’t have to use pipes in this package. All inputs and outputs are data.frame’s - which makes the above point easier Records trimmed off due to various filters are retained as attributes, so can still be accessed for later inspection, but don’t get in the way of the data.frame that gets modified for downstream use User interface vs. speed: This is the kind of package that surely can get faster. However, we’re focusing on the UI first, then make speed improvements down the road. Since occurrence record datasets should all have columns with lat/long information, we automatically look for those columns for you. If identified, we use them, but you can supply lat/long column names manually as well. We have many packages that fetch species occurrence records from GBIF, iNaturalist, VertNet, iDigBio, Ecoengine, and more. scrubr fills a crucial missing niche as likely all uses of occurrence data requires cleaning of some kind. When using GBIF data via rgbif, that package has some utilities for cleaning data based on the issues returned with GBIF data - scrubr is a companion to do the rest of the cleaning.
...