Recology

R/etc.

Museum metadata - the Asian Art Museum of San Francisco

I was in San Francisco last week for an altmetrics conference at PLOS. While there, I visited the Asian Art Museum, just the Roads of Arabia exhibition.

It was a great exhibit. While I was looking at the pieces, I read many labels, and thought, "hey, what if someone wants this metadata"?

Since we have an R package in development for scraping museum metadata (called musemeta), I just started some scraping code for this museum. Unfortunately, I don't think the pieces from the Roads of Arabia exhibit are on their site, so no metadata to get. But they do have their main collection searchable online at http://www.asianart.org/collections/collection. Examples follow.

Installation

install.packages("devtools")
devtools::install_github("ropensci/musemeta")
library("musemeta")

Get metadata for a single object

You have to get the ID for the piece from their website, e.g., 11462 from the url http://searchcollection.asianart.org/view/objects/asitem/nid/11462. Once you have an ID you can pass it in ot the aam() function.

(out <- aam(11462))
#> <AAM metadata> Molded plaque (tsha tsha)
#>   Object id: 1992.96
#>   Object name: Votive plaque
#>   Date: approx. 1992
#>   Artist: 
#>   Medium: Plaster mixed with resin and pigment
#>   Credit line: Gift of Robert Tevis
#>   On display?: no
#>   Collection: Decorative Arts
#>   Department: Himalayan Art
#>   Dimensions: 
#>   Label: Molded plaques (tsha tshas) are small sacred images, flat or
#>           three-dimensional, shaped out of clay in metal molds. The
#>           images are usually unbaked, and sometimes seeds, paper, or
#>           human ashes were mixed with the clay. Making tsha tshas is
#>           a meritorious act, and monasteries give them away to
#>           pilgrims. Some Tibetans carry tsha tshas inside the amulet
#>           boxes they wear or stuff them into larger images as part of
#>           the consecration of those images. In Bhutan tsha tshas are
#>           found in mani walls (a wall of stones carved with prayers)
#>           or piled up in caves.The practice of making such plaques
#>           began in India, and from there it spread to other countries
#>           in Asia with the introduction of Buddhism. Authentic tsha
#>           tshas are cast from clay. Modern examples , such as those
#>           made for the tourist trade in Tibet, are made of plaster
#>           and cast from ancient (1100-1200) molds and hand colored to
#>           give them the appearance of age.

The output is printed for clarity, but you can dig into each element, like

out$label
#> [1] "Molded plaques (tsha tshas) are small sacred images, flat or three-dimensional, shaped out of clay in metal molds. The images are usually unbaked, and sometimes seeds, paper, or human ashes were mixed with the clay. Making tsha tshas is a meritorious act, and monasteries give them away to pilgrims. Some Tibetans carry tsha tshas inside the amulet boxes they wear or stuff them into larger images as part of the consecration of those images. In Bhutan tsha tshas are found in mani walls (a wall of stones carved with prayers) or piled up in caves.The practice of making such plaques began in India, and from there it spread to other countries in Asia with the introduction of Buddhism. Authentic tsha tshas are cast from clay. Modern examples , such as those made for the tourist trade in Tibet, are made of plaster and cast from ancient (1100-1200) molds and hand colored to give them the appearance of age."

Get metadata for many objects

The aam() function is not vectorized, but you can easily get data for many IDs via lapply type functions, etc.

lapply(c(17150,17140,17144), aam)
#> [[1]]
#> <AAM metadata> Boys sumo wrestling
#>   Object id: 2005.100.35
#>   Object name: Woodblock print
#>   Date: approx. 1769
#>   Artist: Suzuki HarunobuJapanese, 1724 - 1770
#>   Medium: Ink and colors on paper
#>   Credit line: Gift of the Grabhorn Ukiyo-e Collection
#>   On display?: no
#>   Collection: Prints And Drawings
#>   Department: Japanese Art
#>   Dimensions: H. 12 5/8 in x W. 5 3/4 in, H. 32.1 cm x W. 14.6 cm
#>   Label: 40 —é木–Ø春t信M 相'Š撲–o—Vび‚ÑSuzuki Harunobu, 1725?–1770Boys sumo wrestling ( Sumō
#>           ?)c. 1769Woodblock print ( nishiki-e) Hosoban
#> 
#> [[2]]
#> <AAM metadata> Autumn Moon of Matsukaze
#>   Object id: 2005.100.25
#>   Object name: Woodblock print
#>   Date: 1768-1769
#>   Artist: Suzuki HarunobuJapanese, 1724 - 1770
#>   Medium: Ink and colors on paper
#>   Credit line: Gift of the Grabhorn Ukiyo-e Collection
#>   On display?: no
#>   Collection: Prints And Drawings
#>   Department: Japanese Art
#>   Dimensions: H. 12 1/2 in x W. 5 3/4 in, H. 31.7 cm x W. 14.6 cm
#>   Label: 30 —é木–Ø春t信M 『w•—流—¬æ…八"ª景Œi』x 「u松¼•—の‚Ì秋H月ŒŽ」vSuzuki Harunobu, 1725?–1770"Autumn Moon of
#>           Matsukaze" (Matsukaze no shū ?)From Fashionable Eight Views
#>           of Noh Chants (Fū ?ū ?1768–1769Woodblock print
#>           (nishiki-e)Hosoban
#> 
#> [[3]]
#> <AAM metadata> Hunting for fireflies
#>   Object id: 2005.100.29
#>   Object name: Woodblock print
#>   Date: 1767-1768
#>   Artist: Suzuki HarunobuJapanese, 1724 - 1770
#>   Medium: Ink and colors on paper
#>   Credit line: Gift of the Grabhorn Ukiyo-e Collection
#>   On display?: no
#>   Collection: Prints And Drawings
#>   Department: Japanese Art
#>   Dimensions: H. 10 1/2 in x W. 8 in, H. 26.7 cm x W. 20.3 cm
#>   Label: 34 —é木–Ø春t信M Œu狩Žëり‚èSuzuki Harunobu, 1725?–1770Hunting for
#>           fireflies1767–1768Woodblock print ( nishiki-e) Chū ?

No search, boo

Note that there is no search functionality yet for this source. Maybe someone can add that via pull requests :)

Like the others

The others sources in musemeta mostly work the same way as the above.

icanhaz altmetrics

The Lagotto application is a Rails app that collects and serves up via RESTful API article level metrics data for research objects. So far, this application has only been applied to scholarly articles, but will see action on datasets soon.

Martin Fenner has lead the development of Lagotto. He recently set up a discussion site if you want to chat about it.

The application has a nice GUI interface, and a quite nice RESTful API.

Lagotto is open source! Because of this, and the quality of the software, other publishers have started using it to gather and deliver publicly article level metrics data, including:

The PLOS instance at http://alm.plos.org/ is always the most up to date with the Lagotto software, but Crossref has the largest number of articles.

I've been working on three clients for the Lagotto REST API, including for a while now on R, recently on Python, and just last week on Ruby.

Please do try the clients, report bugs, request features - you know the open source drill...

I'd say the R client is the most mature, while Python is less so, end the Ruby gem the least mature.

Installation

R

install.packages("devtools")
devtools::install_github("ropensci/alm")

Python

git clone https://github.com/cameronneylon/pyalm.git
cd pyalm
git checkout scott
python setup.py install

Ruby

gem install httparty json rake
git clone https://github.com/sckott/alm.git
cd alm
make # which runs build and install tasks

If you don't have make, then just run gem build alm.gemspec and gem install alm-0.1.0.gem seperately.

Example

In this example, we'll get altmetrics data for two DOIs: 10.1371/journal.pone.0029797, and 10.1371/journal.pone.0029798 (click on links to go to paper).

R

library('alm')
ids <- c("10.1371/journal.pone.0029797","10.1371/journal.pone.0029798")
alm_ids(ids, info="summary")
#> $meta
#>   total total_pages page error
#> 1     2           1    1    NA
#> 
#> $data
#> $data$`10.1371/journal.pone.0029798`
#> $data$`10.1371/journal.pone.0029798`$info
#>                            doi
#> 1 10.1371/journal.pone.0029798
#>                                                                                     title
#> 1 Mitochondrial Electron Transport Is the Cellular Target of the Oncology Drug Elesclomol
#>                                                                canonical_url
#> 1 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029798
#>       pmid   pmcid                        mendeley_uuid
#> 1 22253786 3256171 b08cc99e-b526-3f0c-adaa-d5ee6d0d978a
#>            update_date     issued
#> 1 2014-12-09T02:52:47Z 2012-01-11
#> 
#> $data$`10.1371/journal.pone.0029798`$signposts
#>                            doi viewed saved discussed cited
#> 1 10.1371/journal.pone.0029798   4346    14         2    26
#> 
#> 
#> $data$`10.1371/journal.pone.0029797`
#> $data$`10.1371/journal.pone.0029797`$info
#>                            doi
#> 1 10.1371/journal.pone.0029797
#>                                                                             title
#> 1 Ecological Guild Evolution and the Discovery of the World's Smallest Vertebrate
#>                                                                canonical_url
#> 1 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029797
#>       pmid   pmcid                        mendeley_uuid
#> 1 22253785 3256195 897fbbd6-5a23-3552-8077-97251b82c1e1
#>            update_date     issued
#> 1 2014-12-09T02:52:46Z 2012-01-11
#> 
#> $data$`10.1371/journal.pone.0029797`$signposts
#>                            doi viewed saved discussed cited
#> 1 10.1371/journal.pone.0029797  34282    81       244     8

Python

import pyalm
ids = ["10.1371/journal.pone.0029797","10.1371/journal.pone.0029798"]
pyalm.get_alm(ids, info="summary")

#> {'articles': [<ArticleALM Mitochondrial Electron Transport Is the Cellular Target of the Oncology Drug Elesclomol,
#> DOI 10.1371/journal.pone.0029798>,
#>   <ArticleALM Ecological Guild Evolution and the Discovery of the World's Smallest Vertebrate,
#>         DOI 10.1371/journal.pone.0029797>],
#>  'meta': {u'error': None, u'page': 1, u'total': 2, u'total_pages': 1}}

Ruby

require 'alm'
Alm.alm(ids: ["10.1371/journal.pone.0029797","10.1371/journal.pone.0029798"], key: ENV['PLOS_API_KEY'])

#> => {"total"=>2,
#>  "total_pages"=>1,
#>  "page"=>1,
#>  "error"=>nil,
#>  "data"=>
#>   [{"doi"=>"10.1371/journal.pone.0029798",
#>     "title"=>"Mitochondrial Electron Transport Is the Cellular Target of the Oncology Drug Elesclomol",
#>     "issued"=>{"date-parts"=>[[2012, 1, 11]]},
#>     "canonical_url"=>"http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029798",
#>     "pmid"=>"22253786",
#>     "pmcid"=>"3256171",
#>     "mendeley_uuid"=>"b08cc99e-b526-3f0c-adaa-d5ee6d0d978a",
#>     "viewed"=>4346,
#>     "saved"=>14,
#>     "discussed"=>2,
#>     "cited"=>26,
#>     "update_date"=>"2014-12-09T02:52:47Z"},
#>    {"doi"=>"10.1371/journal.pone.0029797",
#>     "title"=>"Ecological Guild Evolution and the Discovery of the World's Smallest Vertebrate",
#>     "issued"=>{"date-parts"=>[[2012, 1, 11]]},
#>     "canonical_url"=>"http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029797",
#>     "pmid"=>"22253785",
#>     "pmcid"=>"3256195",
#>     "mendeley_uuid"=>"897fbbd6-5a23-3552-8077-97251b82c1e1",
#>     "viewed"=>34282,
#>     "saved"=>81,
#>     "discussed"=>244,
#>     "cited"=>8,
#>     "update_date"=>"2014-12-09T02:52:46Z"}]}

Dealing with multi handle errors

At rOpenSci we occasssionally hear from our users that they run into an error like:

Error in function (type, msg, asError = TRUE)  : 
  easy handled already used in multi handle

This error occurs in the httr package that we use to do http requests to sources of data on the web. It happens when e.g., you make a lot of requests to a resource, then it gets interrupted somehow - then you make another call, and you get the error above. Let's try it with the an version of httr (v0.5):

library("httr")
# run, then esc to cause multi handle error
replicate(50, GET("http://google.com/"))
# then retry single call, which trows multi handle error
GET("http://google.com/")
#> Error in function (type, msg, asError = TRUE)  : 
#>   easy handled already used in multi handle

There are any number of reasons why your session may get interrupted, including an internet outage, the web service you are requesesting data from times out, etc. There hasn't been a straight-forward way to handle this, until recently.

In httr version 0.6, there are two new functions handle_find() and handle_reset() to help deal with this error.

First, install newest httr from Github

install.packages("devtools")
devtools::install_github("hadley/httr")
library("httr")

Make a bunch of requests to google, interrupting part way through

replicate(50, HEAD("http://google.com/"))

Then retry single call, which trows multi handle error

HEAD("http://google.com/")
#> Error in function (type, msg, asError = TRUE)  : 
#>   easy handled already used in multi handle

Find handle

handle_find("http://google.com/")
#> Host: http://google.com/ <0x10f3d1600>

Reset handle

handle_reset("http://google.com/")

Try call again, this time it should work

HEAD("http://google.com/")
#> Response [http://www.google.com/]
#>   Date: 2014-12-08 13:37
#>   Status: 200
#>   Content-Type: text/html; charset=ISO-8859-1
#> <EMPTY BODY>

Usage in ropensci packages

We have more work to do yet to integrate this into our packages. It's great you can reset a handle as above, but to reset the handle you need to search for the URL used in the request, which our users would have to dig into the code for the function they are using. That is easy-ish to do, but perhaps not everyone knows they can get to the code easily. So, we may try seting a parameter in functions that would let reset the handle to clear this error.

Note

Note that Hadley is planning on eliminating RCurl dependency (https://github.com/hadley/httr/issues/172), so there may be a different solution in the future.