So I want to mine some #altmetrics data for some research I’m thinking about doing. The steps would be:
Get journal titles for ecology and evolution journals.
Get DOI’s for all papers in all the above journal titles.
Get altmetrics data on each DOI.
Do some fancy analyses.
Make som pretty figs.
Write up results.
It’s early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).
Unfortunately, Mendeley’s API does not have methods for getting a list of journals by field, or at least I don’t know how to do it using their API. No worries though - Crossref comes to save the day. Here’s my attempt at this using the Crossref OAI-PMH.
I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine - hopefully yours too!
library(XML)library(RCurl)token<-"characters"# define a iterator, also used for gettingn the resumptionTokennameslist<-list()# define empty list to put joural titles in towhile(is.character(token)==TRUE){baseurl<-"http://oai.crossref.org/OAIHandler?verb=ListSets"if(token=="characters"){tok2<-NULL}else{tok2<-paste("&resumptionToken=",token,sep="")}query<-paste(baseurl,tok2,sep="")crsets<-xmlToList(xmlParse(getURL(query)))names<-as.character(sapply(crsets[[4]],function(x)x[["setName"]]))nameslist[[token]]<-namesif(class(try(crsets[[2]]$.attrs[["resumptionToken"]]))=="try-error"){stop("no more data")}elsetoken<-crsets[[2]]$.attrs[["resumptionToken"]]}
Yay! Hopefully it worked if you tried it. Let’s see how long the list of journal titles is.
allnames<-do.call(c,nameslist)# combine to listlength(allnames)
[1] 26866
Now, let’s use some regex to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^ symbol says “the string must start here”. The \\s means whitespace. The [] lets you specify a set of letters you are looking for, e.g., [Ee] means capital EOR lowercase e. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr package.
1
2
3
4
5
6
7
8
9
10
library(stringr)ecotitles<-as.character(allnames[str_detect(allnames,"^[Ee]cology|\\s[Ee]cology")])evotitles<-as.character(allnames[str_detect(allnames,"^[Ee]volution|\\s[Ee]volution")])systtitles<-as.character(allnames[str_detect(allnames,"^[Ss]ystematic|\\s[Ss]systematic")])naturalist<-as.character(allnames[str_detect(allnames,"[Nn]aturalist")])ecoevotitles<-unique(c(ecotitles,evotitles,systtitles,naturalist))# combine to listecoevotitles<-str_trim(ecoevotitles,side="both")# trim whitespace, if anylength(ecoevotitles)
[1] 188
1
2
# Just the first ten titlesecoevotitles[1:10]
[1] "Microbial Ecology in Health and Disease"
[2] "Population Ecology"
[3] "Researches on Population Ecology"
[4] "Behavioral Ecology and Sociobiology"
[5] "Microbial Ecology"
[6] "Biochemical Systematics and Ecology"
[7] "FEMS Microbiology Ecology"
[8] "Journal of Experimental Marine Biology and Ecology"
[9] "Applied Soil Ecology"
[10] "Forest Ecology and Management"