Getting data from figures in published papers

The problem: There are a lot of figures in published papers in the scholarly literature, like the below, from (Attwood et. al. 2012)): At some point, a scientist wants to ask a question for which they can synthesize the knowledge on that question by collecting data from the published literature. This often requires something like the following workflow: Search for relevant papers (e.g., via Google Scholar). Collect the papers. Decide which are appropriate for inclusion. Collect data from the figures using software on a native application. Examples include GraphClick and ImageJ. Proof data. Analyze data & publish paper. This workflow needs revamping, particularly in step number 3 - collecting the data. This data remains private, moving from one closed source (original publication) to another (personal computer). We can surely do better. ...

September 18, 2012 · 5 min · Scott Chamberlain

Scholarly metadata from R

Metadata! Metadata is very cool. It’s super hot right now - everybody is talking about it. Okay, maybe not everyone, but it’s an important part of archiving scholarly work. We are working on a repo on GitHub rmetadata to be a one stop shop for querying metadata from around the web. Various repos on GitHub we have started - rpmc, rdatacite, rdryad, rpensoft, rhindawi - will at least in part be folded into rmetadata. As a start we are writing functions to hit any metadata services that use the OAI-PMH: “Open Archives Initiative Protocol for Metadata Harvesting” framework. OAI-PMH has six methods (or verbs as they are called) for data harvesting that are the same across different metadata providers: ...

September 17, 2012 · 6 min · Scott Chamberlain