So I want to mine some #altmetrics data for some research I’m thinking about doing. The steps would be:

  • Get journal titles for ecology and evolution journals.
  • Get DOI’s for all papers in all the above journal titles.
  • Get altmetrics data on each DOI.
  • Do some fancy analyses.
  • Make som pretty figs.
  • Write up results.

It’s early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).

Unfortunately, Mendeley’s API does not have methods for getting a list of journals by field, or at least I don’t know how to do it using their API. No worries though - Crossref comes to save the day. Here’s my attempt at this using the Crossref OAI-PMH.

I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine - hopefully yours too!

library(XML)
library(RCurl)

token <- "characters"  # define a iterator, also used for gettingn the resumptionToken
nameslist <- list()  # define empty list to put joural titles in to
while (is.character(token) == TRUE) {
    baseurl <- "http://oai.crossref.org/OAIHandler?verb=ListSets"
    if (token == "characters") {
        tok2 <- NULL
    } else {
        tok2 <- paste("&resumptionToken=", token, sep = "")
    }
    query <- paste(baseurl, tok2, sep = "")
    crsets <- xmlToList(xmlParse(getURL(query)))
    names <- as.character(sapply(crsets[[4]], function(x) x[["setName"]]))
    nameslist[[token]] <- names
    if (class(try(crsets[[2]]$.attrs[["resumptionToken"]])) == "try-error") {
        stop("no more data")
    } else token <- crsets[[2]]$.attrs[["resumptionToken"]]
}

Yay! Hopefully it worked if you tried it. Let’s see how long the list of journal titles is.

sapply(nameslist, length)  # length of each list
                          characters c65ebc3f-b540-4672-9c00-f3135bf849e3 
                               10001                                10001 
6f61b343-a8f4-48f1-8297-c6f6909ca7f7 
                                6864 
allnames <- do.call(c, nameslist)  # combine to list
length(allnames)
[1] 26866

Now, let’s use some regex to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^ symbol says “the string must start here”. The \\s means whitespace. The [] lets you specify a set of letters you are looking for, e.g., [Ee] means capital E OR lowercase e. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr package.

library(stringr)

ecotitles <- as.character(allnames[str_detect(allnames, "^[Ee]cology|\\s[Ee]cology")])
evotitles <- as.character(allnames[str_detect(allnames, "^[Ee]volution|\\s[Ee]volution")])
systtitles <- as.character(allnames[str_detect(allnames, "^[Ss]ystematic|\\s[Ss]systematic")])
naturalist <- as.character(allnames[str_detect(allnames, "[Nn]aturalist")])

ecoevotitles <- unique(c(ecotitles, evotitles, systtitles, naturalist))  # combine to list
ecoevotitles <- str_trim(ecoevotitles, side = "both")  # trim whitespace, if any
length(ecoevotitles)
[1] 188

# Just the first ten titles
ecoevotitles[1:10]
 [1] "Microbial Ecology in Health and Disease"           
 [2] "Population Ecology"                                
 [3] "Researches on Population Ecology"                  
 [4] "Behavioral Ecology and Sociobiology"               
 [5] "Microbial Ecology"                                 
 [6] "Biochemical Systematics and Ecology"               
 [7] "FEMS Microbiology Ecology"                         
 [8] "Journal of Experimental Marine Biology and Ecology"
 [9] "Applied Soil Ecology"                              
[10] "Forest Ecology and Management"                     

Get the .Rmd file used to create this post at my github account.

Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.