I gave two talks recently at the annual Crossref meeting, one of which was a somewhat technical overview of programmatic clients for Crossref APIs. Check out the talk here. I talked about the motivation for working with Crossref data by writing code/etc. rather than going the GUI route, then went over the various clients, with brief examples.

We (rOpenSci) have been working on the R client rcrossref for a while now, but I’m also working on the Python and Ruby clients for Crossref. In addition, the Ruby client has a CLI client inside. The Javascript client is worked on independently by ScienceAI.

The R, Ruby, and Python clients are useable but not feature complete yet, and would benefit from lots of users surfacing bugs and highlighting nice to have features.

The main Crossref API used in all the clients is documented at api.crossref.org.

I’ve tried to make the APIs similar-ish across clients. Functions in each client match the main Crossref search API (api.crossref.org) routes:

  • /works
  • /members
  • /funders
  • /journals
  • /types
  • /licenses

Other methods in all three clients:

  • Get DOI minting agency
    • Uses api.crossref.org API
  • Get random DOIs
    • Uses api.crossref.org API
  • Content negotiation
  • Get full text
    • other clients in each language will focus on this use case
  • Get citation count

The following shows how to install, and then examples from each client for a few use cases.

Installation

Python

pip install habanero

Ruby

gem install serrano

R

Inside R:

install.packages("rcrossref")

Javascript

npm install crossref

I won’t do any examples with the js library, as I don’t maintain it.

Use case: get ORCID IDs for authors

Python

from habanero import Crossref
cr = Crossref()
res = cr.works(filter = {'has_orcid': True}, limit = 10)
res2 = [ [ z.get('ORCID') for z in x['author'] ] for x in res.result['message']['items'] ]
filter(None, reduce(lambda x, y: x+y, res2))
[u'https://orcid.org/0000-0003-4087-8021',
 u'https://orcid.org/0000-0002-2076-5452',
 u'https://orcid.org/0000-0003-4087-8021',
 u'https://orcid.org/0000-0002-2076-5452',
 u'https://orcid.org/0000-0003-1710-1580',
 u'https://orcid.org/0000-0003-1710-1580',
 u'https://orcid.org/0000-0003-4637-238X',
 u'https://orcid.org/0000-0003-4637-238X',
 u'https://orcid.org/0000-0003-4637-238X',
 u'https://orcid.org/0000-0003-4637-238X',
 u'https://orcid.org/0000-0003-4637-238X',
 u'https://orcid.org/0000-0003-2510-4271']

Ruby

require 'serrano'
res = Serrano.works(filter: {'has_orcid': true}, limit: 10)
res2 = res['message']['items'].collect { |x| x['author'].collect { |z| z['ORCID'] } }
res2.flatten.compact
=> ["https://orcid.org/0000-0003-4087-8021",
 "https://orcid.org/0000-0002-2076-5452",
 "https://orcid.org/0000-0003-4087-8021",
 "https://orcid.org/0000-0002-2076-5452",
 "https://orcid.org/0000-0003-1710-1580",
 "https://orcid.org/0000-0003-1710-1580",
 "https://orcid.org/0000-0003-4637-238X",
 "https://orcid.org/0000-0003-4637-238X",
 "https://orcid.org/0000-0003-4637-238X",
 "https://orcid.org/0000-0003-4637-238X",
 "https://orcid.org/0000-0003-4637-238X",
 "https://orcid.org/0000-0003-2510-4271"]

R

library("rcrossref")
res <- cr_works(filter=c(has_orcid=TRUE), limit = 10)
orcids <- unlist(lapply(res$data$author, function(z) z$ORCID))
Filter(function(x) !is.na(x), orcids)
 [1] "https://orcid.org/0000-0003-4087-8021"
 [2] "https://orcid.org/0000-0002-2076-5452"
 [3] "https://orcid.org/0000-0003-4087-8021"
 [4] "https://orcid.org/0000-0002-2076-5452"
 [5] "https://orcid.org/0000-0003-1710-1580"
 [6] "https://orcid.org/0000-0003-1710-1580"
 [7] "https://orcid.org/0000-0003-4637-238X"
 [8] "https://orcid.org/0000-0003-4637-238X"
 [9] "https://orcid.org/0000-0003-4637-238X"
[10] "https://orcid.org/0000-0003-4637-238X"
[11] "https://orcid.org/0000-0003-4637-238X"
[12] "https://orcid.org/0000-0003-2510-4271"

CLI

serrano works --filter=has_orcid:true --json --limit=12 | jq '.message.items[].author[].ORCID | select(. != null)'
"https://orcid.org/0000-0003-4087-8021"
"https://orcid.org/0000-0002-2076-5452"
"https://orcid.org/0000-0003-4087-8021"
"https://orcid.org/0000-0002-2076-5452"
"https://orcid.org/0000-0003-1710-1580"
"https://orcid.org/0000-0003-1710-1580"
"https://orcid.org/0000-0003-4637-238X"
"https://orcid.org/0000-0003-4637-238X"
"https://orcid.org/0000-0003-4637-238X"
"https://orcid.org/0000-0003-4637-238X"
"https://orcid.org/0000-0003-4637-238X"
"https://orcid.org/0000-0003-2510-4271"
"https://orcid.org/0000-0001-9408-8207"
"https://orcid.org/0000-0002-2076-5452"

Use case: content negotation

Python

from habanero import cn
cn.content_negotiation(ids = '10.1126/science.169.3946.635', format = "text")
u'Frank, H. S. (1970). The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance. Science, 169(3946), 635\xe2\x80\x93641. doi:10.1126/science.169.3946.635\n'

Ruby

require 'serrano'
Serrano.content_negotiation(ids: '10.1126/science.169.3946.635', format: "text")
=> ["Frank, H. S. (1970). The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance. Science, 169(3946), 635\xE2\x80\x93641. doi:10.1126/science.169.3946.635\n"]

R

library("rcrossref")
cr_cn(dois="10.1126/science.169.3946.635", "text")
[1] "Frank, H. S. (1970). The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance. Science, 169(3946), 635–641. doi:10.1126/science.169.3946.635"

CLI

serrano contneg 10.1890/13-0590.1 --format=text
Murtaugh, P. A. (2014).  In defense of P values . Ecology, 95(3), 611–617. doi:10.1890/13-0590.1

More

There are definitely issues with data in the Crossref search API, some of which I cover in my talks. However, it is still the best place to go for scholarly metadata.

Let us know of other use cases - there are others not covered here for brevity sake.

There are lots of examples in the docs for each client. If you can think of any doc improvements file an issue.

If you find any bugs, please do file an issue.