Inspired by httpie
, a Python command line client as a sort of drop in replacement for curl
, I am playing around with something similar-ish in R, at least in spirit. I started a little R pkg called httsnap
with the following ideas:
- The web is increasingly a JSON world, so set
content-type
andaccept
headers toapplications/json
by default - The workflow follows logically, or at least should, from, hey, I got this url, to i need to add some options, to execute request
- Whenever possible, transform output to data.frame’s - facilitating downstream manipulation via
dplyr
, etc. - Do
GET
requests by default. Specify a different type if you don’t wantGET
. Some functionality does GET by default, though in some cases you need to specify GET - You can use non-standard evaluation to easily pass in query parameters without worrying about
&
’s, URL escaping, etc. (seeQuery()
) - Same for body params (see
Body()
)
Install
Install and load httsnap
devtools::install_github("sckott/httsnap")
library("httsnap")
library("dplyr")
Functions so far
Get
- GET requestQuery
- add query parametersAuthenticate
- add authentication detailsProgress
- add progress barTimeout
- add a timeoutUser_agent
- add a user agentVerbose
- give verbose outputBody
- add a bodyh
- add headers by key-value pair
These are named to avoid conflict with httr
Intro
A simple GET
request
"https://httpbin.org/get" %>%
Get()
#> $args
#> named list()
#>
#> $headers
#> $headers$Accept
#> [1] "application/json, text/xml, application/xml, */*"
#>
#> $headers$`Accept-Encoding`
#> [1] "gzip"
#>
#> $headers$Host
#> [1] "httpbin.org"
#>
#> $headers$`User-Agent`
#> [1] "curl/7.37.1 Rcurl/1.95.4.1 httr/0.6.1 httsnap/0.0.2.99"
#>
#>
#> $origin
#> [1] "24.21.209.71"
#>
#> $url
#> [1] "https://httpbin.org/get"
You’ll notice that Get()
doesn’t just get the response, but also checks for whether it was a good response (the HTTP status code), and extracts the data.
Or you can just pass the URL into the function itself
Get("https://httpbin.org/get")
#> $args
#> named list()
#>
#> $headers
#> $headers$Accept
#> [1] "application/json, text/xml, application/xml, */*"
#>
#> $headers$`Accept-Encoding`
#> [1] "gzip"
#>
#> $headers$Host
#> [1] "httpbin.org"
#>
#> $headers$`User-Agent`
#> [1] "curl/7.37.1 Rcurl/1.95.4.1 httr/0.6.1 httsnap/0.0.2.99"
#>
#>
#> $origin
#> [1] "24.21.209.71"
#>
#> $url
#> [1] "https://httpbin.org/get"
You can buid up options by calling functions via pipes, and see what the options look like
"https://httpbin.org/get" %>%
Progress() %>%
Verbose()
#> <http request>
#> url: https://httpbin.org/get
#> config:
#> Config:
#> List of 4
#> $ noprogress :FALSE
#> $ progressfunction:function (...)
#> $ debugfunction :function (...)
#> $ verbose :TRUE
Then execute the GET request when you’re ready
"https://httpbin.org/get" %>%
Progress() %>%
Verbose() %>%
Get()
#> $args
#> named list()
#>
#> $headers
#> $headers$Accept
#> [1] "application/json, text/xml, application/xml, */*"
#>
#> $headers$`Accept-Encoding`
#> [1] "gzip"
#>
#> $headers$Host
#> [1] "httpbin.org"
#>
#> $headers$`User-Agent`
#> [1] "curl/7.37.1 Rcurl/1.95.4.1 httr/0.6.1 httsnap/0.0.2.99"
#>
#>
#> $origin
#> [1] "24.21.209.71"
#>
#> $url
#> [1] "https://httpbin.org/get"
Example 1
Get scholarly article metadata from the Crossref API
"https://api.crossref.org/works" %>%
Query(query = "ecology") %>%
.$message %>%
.$items %>%
select(DOI, title, publisher)
#> DOI title
#> 1 10.4996/fireecology Fire Ecology
#> 2 10.5402/ecology ISRN Ecology
#> 3 10.1155/8641 ISRN Ecology
#> 4 10.1111/(issn)1526-100x Restoration Ecology
#> 5 10.1007/248.1432-184x Microbial Ecology
#> 6 10.1007/10144.1438-390x Population Ecology
#> 7 10.1007/10452.1573-5125 Aquatic Ecology
#> 8 10.1007/10682.1573-8477 Evolutionary Ecology
#> 9 10.1007/10745.1572-9915 Human Ecology
#> 10 10.1007/10980.1572-9761 Landscape Ecology
#> 11 10.1007/11258.1573-5052 Plant Ecology
#> 12 10.1007/12080.1874-1746 Theoretical Ecology
#> 13 10.1111/(issn)1442-9993 Austral Ecology
#> 14 10.1111/(issn)1439-0485 Marine Ecology
#> 15 10.1111/(issn)1365-2435 Functional Ecology
#> 16 10.1111/(issn)1365-294x Molecular Ecology
#> 17 10.1111/(issn)1461-0248 Ecology Letters
#> 18 10.1002/9780470979365.ch7 Behavioural Ecology
#> 19 10.1111/fec.2007.21.issue-5
#> 20 10.1111/rec.0.0.issue-0
#> publisher
#> 1 Association for Fire Ecology
#> 2 Hindawi Publishing Corporation
#> 3 Hindawi Publishing Corporation
#> 4 Wiley-Blackwell
#> 5 Springer Science + Business Media
#> 6 Springer Science + Business Media
#> 7 Springer Science + Business Media
#> 8 Springer Science + Business Media
#> 9 Springer Science + Business Media
#> 10 Springer Science + Business Media
#> 11 Springer Science + Business Media
#> 12 Springer Science + Business Media
#> 13 Wiley-Blackwell
#> 14 Wiley-Blackwell
#> 15 Wiley-Blackwell
#> 16 Wiley-Blackwell
#> 17 Wiley-Blackwell
#> 18 Wiley-Blackwell
#> 19 Wiley-Blackwell
#> 20 Wiley-Blackwell
Example 2
Get Public Library of Science article metadata via their API, make a histogram of number of tweets for each article
"https://api.plos.org/search" %>%
Query(q = "*:*", wt = "json", rows = 100,
fl = "id,journal,alm_twitterCount",
fq = 'alm_twitterCount:[100 TO 10000]') %>%
.$response %>%
.$docs %>%
.$alm_twitterCount %>%
hist()
Notes
Okay, so this isn’t drastically different from what httr
already does, but its early days.