Measuring use of open source software isn’t always straightforward. The problem is especially acute for software targeted largely at academia, where usage is not measured just by software downloads, but also by citations.

Citations are a well-known pain point because the citation graph is privately held by iron doors (e.g., Scopus, Google Scholar). New ventures aim to open up citation data, but of course it’s an immense amount of work, and so does not come quickly.

The following is a laundry list of metrics on software of which I am aware, and some of which I use in our rOpenSci twice monthly updates.

I primarily develop software for the R language, so some of the metrics are specific to R, but many are not. In addition, we (rOpenSci) don’t develop web apps, which may bring in an additional set of metrics not covered below.

I organize by source instead of type of data because some sources give multiple kinds of data - I note what kinds of data they give with labels.

CRAN downloads

downloads
  • Link: https://github.com/metacran/cranlogs.app
  • This is a REST API for CRAN downloads from the RStudio CRAN CDN. Note however, that the RStudio CDN is only one of many - there are other mirrors users can insall packages from, and are not included in this count. However, a significant portion of downloads probably come from the RStudio CDN.
  • Other programming languages have similar support, e.g., Ruby and Node.

Lagotto

citations github social-media

Depsy

citations github

  • Link: https://depsy.org
  • This is a nascent venture by the ImpactStory team that seeks to uncover the impact of research software. As far as I can tell, they’ll collect usage via software downloads and citations in the literature.

Web Site Analytics

page-views

  • If you happen to have a website for your project, collecting analytics is a way to gauge views of the landing page, and any help/tutorial pages you may have. A good easy way to do this is a deploy a basic site on your gh-pages branch of your GitHub repo, and use the easily integrated Google Analytics.
  • Whatever analytics you use, in my experience this mostly brings up links from google searches and blog posts that may mention your project
  • Google Analytics beacon (for README views): https://github.com/igrigorik/ga-beacon. I haven’t tried this yet, but seems promising.

Auomated tracking: SSNMP

citations github

  • Link: https://scisoft-net-map.isri.cmu.edu
  • Scientific Software Network Map Project
  • This is a cool NSF funded project by Chris Bogart that tracks software usage via GitHub and citations in literature.

Google Scholar

citations

  • Link: https://scholar.google.com/
  • Searching Google Scholar for software citations manually is fine at a small scale, but at a larger scale scraping is best. However, you’re not legally supposed to do this, and Google will shut you down.
  • Could try using g-scholar alerts as well, especially if new citations of your work are infrequent.
  • If you have institutional access to Scopus/Web of Science, you could search those, but I don’t push this as an option since it’s available to so few.

GitHub

github

Other

  • Support forums: Whether you use UserVoice, Discourse, Google Groups, Gitter, etc., depending on your viewpoint, these interactions could be counted as metrics of software usage.
  • Emails: I personally get a lot of emails asking for help with software I maintain. I imagine this is true for most software developers. Counting these could be another metric of software usage, although I never have counted mine.
  • Social media: See Lagotto above, which tracks some social media outlets.
  • Code coverage: There are many options now for code coverage, integrated with each Travis-CI build. A good option is CodeCov. CodeCov gives percentage test coverage, which one could use as one measure of code quality.
  • Reviews: There isn’t a lot of code review going on that I’m aware of. Even if there was, I suppose this would just be a logical TRUE/FALSE.
  • Cash money y’all: Grants/consulting income/etc. could be counted as a metric.
  • Users: If you require users to create an account or similar before getting your software, you have a sense of number of users and perhaps their demographics.

Promising

Some software metrics things on the horizon that look interesting:

Missed?

I’m sure I missed things. Let me know.