Recology

R/etc.

analogsea - v0.1 notes

My last blog post introduced the R package I'm working on analogsea, an R client for the Digital Ocean API.

Things have changed a bit, including fillig out more functions for all API endpoints, and incorparting feedback from Hadley and Karthik. The package is as v0.1 now, so I thought I'd say a few things about how it works.

Note that Digital Ocean's v2 API is in beta stage now, so the current version of analogsea at v0.1 works with their v1 API. The v2 branch of analogsea is being developed for their v2 API.

If you sign up for an account with Digital Ocean use this referral link: https://www.digitalocean.com/?refcode=0740f5169634 so I can earn some credits. thx :)

First, installation

Note: I did try to submit to CRAN, but Ripley complained about the package name so I'd rather not waste my time esp since people using this likely will already know about install_github().

devtools::install_github("sckott/analogsea")

Load the library

library("analogsea")
## Loading required package: magrittr

Authenticate has changed a bit. Whereas auth details were stored as environment variables before, I'm just using R's options. do_auth() will ask for your Digital Ocean details. You can enter them each R session, or store them in your .Rprofile file. After successful authentication, each function simply looks for your auth details with getOption(). You don't have to use this function first, though if you don't your first call to another function will ask for auth details.

do_auth()

sizes, images, and keys functions have changed a bit, by default outputting a data.frame now.

List available regions

regions()
##   id            name slug
## 1  3 San Francisco 1 sfo1
## 2  4      New York 2 nyc2
## 3  5     Amsterdam 2 ams2
## 4  6     Singapore 1 sgp1

List available sizes

sizes()
##   id  name  slug memory cpu disk cost_per_hour cost_per_month
## 1 66 512MB 512mb    512   1   20       0.00744            5.0
## 2 63   1GB   1gb   1024   1   30       0.01488           10.0
## 3 62   2GB   2gb   2048   2   40       0.02976           20.0
## 4 64   4GB   4gb   4096   2   60       0.05952           40.0
## 5 65   8GB   8gb   8192   4   80       0.11905           80.0
## 6 61  16GB  16gb  16384   8  160       0.23810          160.0
## 7 60  32GB  32gb  32768  12  320       0.47619          320.0
## 8 70  48GB  48gb  49152  16  480       0.71429          480.0
## 9 69  64GB  64gb  65536  20  640       0.95238          640.0

List available images

head(images())
##        id                  name             slug distribution public sfo1
## 1 3209452 rstudioserverssh_snap             <NA>       Ubuntu  FALSE    1
## 2    1601        CentOS 5.8 x64   centos-5-8-x64       CentOS   TRUE    1
## 3    1602        CentOS 5.8 x32   centos-5-8-x32       CentOS   TRUE    1
## 4   12573        Debian 6.0 x64   debian-6-0-x64       Debian   TRUE    1
## 5   12575        Debian 6.0 x32   debian-6-0-x32       Debian   TRUE    1
## 6   14097      Ubuntu 10.04 x64 ubuntu-10-04-x64       Ubuntu   TRUE    1
##   nyc1 ams1 nyc2 ams2 sgp1
## 1   NA   NA   NA   NA   NA
## 2    1    1    1    1    1
## 3    1    1    1    1    1
## 4    1    1    1    1    1
## 5    1    1    1    1    1
## 6    1    1    1    1    1

List ssh keys

keys()
## $ssh_keys
## $ssh_keys[[1]]
## $ssh_keys[[1]]$id
## [1] 89103
##
## $ssh_keys[[1]]$name
## [1] "Scott Chamberlain"

One change that's of interest is that most of the various droplets_*() functions take in the outputs of other droplets_*() functions. This means that we can pipe outputs of one droplets_*() function to another, including non-droplet_* functions (see examples).

Let's create a droplet:

(res <- droplets_new(name="foo", size_slug = '512mb', image_slug = 'ubuntu-14-04-x64', region_slug = 'sfo1', ssh_key_ids = 89103))
$droplet
$droplet$id
[1] 1880805

$droplet$name
[1] "foo"

$droplet$image_id
[1] 3240036

$droplet$size_id
[1] 66

$droplet$event_id
[1] 26711810

List my droplets

This function used to be do_droplets_get()

droplets()
## $droplet_ids
## [1] 1880805
##
## $droplets
## $droplets[[1]]
## $droplets[[1]]$id
## [1] 1880805
##
## $droplets[[1]]$name
## [1] "foo"
##
## $droplets[[1]]$image_id
## [1] 3240036
##
## $droplets[[1]]$size_id
## [1] 66
##
## $droplets[[1]]$region_id
## [1] 3
##
## $droplets[[1]]$backups_active
## [1] FALSE
##
## $droplets[[1]]$ip_address
## [1] "162.243.152.56"
##
## $droplets[[1]]$private_ip_address
## NULL
##
## $droplets[[1]]$locked
## [1] FALSE
##
## $droplets[[1]]$status
## [1] "active"
##
## $droplets[[1]]$created_at
## [1] "2014-06-18T14:15:35Z"
##
##
##
## $event_id
## NULL

As mentioned above we can now pipe output of droplet*() functions to other droplet*() functions.

Here, pipe output of lising droplets droplets() to the events() function

droplets() %>% events()
## Error: No event id found

In this case there were no event ids to get event data on.

Here, we'll get details for the droplet we just created, then pipe that to droplets_power_off()

droplets(1880805) %>% droplets_power_off
## $droplet_ids
## [1] 1880805
##
## $droplets
## $droplets$droplet_ids
## [1] 1880805
##
## $droplets$droplets
## $droplets$droplets$id
## [1] 1880805
##
## $droplets$droplets$name
## [1] "foo"
##
## $droplets$droplets$image_id
## [1] 3240036
##
## $droplets$droplets$size_id
## [1] 66
##
## $droplets$droplets$region_id
## [1] 3
##
## $droplets$droplets$backups_active
## [1] FALSE
##
## $droplets$droplets$ip_address
## [1] "162.243.152.56"
##
## $droplets$droplets$private_ip_address
## NULL
##
## $droplets$droplets$locked
## [1] FALSE
##
## $droplets$droplets$status
## [1] "active"
##
## $droplets$droplets$created_at
## [1] "2014-06-18T14:15:35Z"
##
## $droplets$droplets$backups
## list()
##
## $droplets$droplets$snapshots
## list()
##
##
## $droplets$event_id
## NULL
##
##
## $event_id
## [1] 26714109

Then pipe it again to droplets_power_on()

droplets(1880805) %>%
  droplets_power_on
## $droplet_ids
## [1] 1880805
##
## $droplets
## $droplets$droplet_ids
## [1] 1880805
##
## $droplets$droplets
## $droplets$droplets$id
## [1] 1880805
##
## $droplets$droplets$name
## [1] "foo"
##
## $droplets$droplets$image_id
## [1] 3240036
##
## $droplets$droplets$size_id
## [1] 66
##
## $droplets$droplets$region_id
## [1] 3
##
## $droplets$droplets$backups_active
## [1] FALSE
##
## $droplets$droplets$ip_address
## [1] "162.243.152.56"
##
## $droplets$droplets$private_ip_address
## NULL
##
## $droplets$droplets$locked
## [1] FALSE
##
## $droplets$droplets$status
## [1] "off"
##
## $droplets$droplets$created_at
## [1] "2014-06-18T14:15:35Z"
##
## $droplets$droplets$backups
## list()
##
## $droplets$droplets$snapshots
## list()
##
##
## $droplets$event_id
## NULL
##
##
## $event_id
## [1] 26714152
Sys.sleep(6)
droplets(1880805)$droplets$status
## [1] "off"

Why not use more pipes?

droplets(1880805) %>%
  droplets_power_off %>%
  droplets_power_on %>%
  events

Last time I talked about installing R, RStudio, etc. on a droplet. I'm still working out bugs in that stuff, but do test out so it can get better faster. See do_install().

analogsea - an R client for the Digital Ocean API

I think this package name is my best yet. Maybe it doesn't make sense though? At least it did at the time...

Anyway, the main motivation for this package was to be able to automate spinning up Linux boxes to do cloud R/RStudio work. Of course if you are a command line native this is all easy for you, but if you are afraid of the command line and/or just don't want to deal with it, this tool will hopefully help.

Most of the functions in this package wrap the Digital Ocean API. So you can do things like create a new droplet, get information on your droplets, destroy droplets, get information on available images, make snapshots, etc. Basically everything you can do from their website you can do here. Note that all functions are prefixed with do_ (for Digital Ocean).

The droplet creation part is what we can leverage to spin up a cloud machine to then install R on, and optionally RStudio server, and even RStudio Shiny server. This allows you to stay within R entirely, not having to go to ssh into the Linux machine itself or go to the Digital Ocean website (after initial setup of course).

If you try this, I recommend using this on R on the command line as you can more easily kill the R session if something goes wrong, and quickly open a new tab/window to ssh into the Linux machine if you want.

First, installation

devtools::install_github("sckott/analogsea")

Load the library

library("analogsea")

Firt, authenticate. This will ask for your Digital Ocean details. You can enter them each R session, or store them in your .Renviron file. After successful authentication, each function simply looks for your auth details with Sys.getenv().

do_auth()

List available regions

sapply(do_regions()$regions, "[[", "name")
## [1] "San Francisco 1" "New York 2"      "Amsterdam 2"     "Singapore 1"

List available images

sapply(do_images()$images, "[[", "name")
##  [1] "rstudioserverssh_snap"                          
##  [2] "CentOS 5.8 x64"                                 
##  [3] "CentOS 5.8 x32"                                 
##  [4] "Debian 6.0 x64"                                 
##  [5] "Debian 6.0 x32"                                 
##  [6] "Ubuntu 10.04 x64"                               
##  [7] "Ubuntu 10.04 x32"                               
##  [8] "Arch Linux 2013.05 x64"                         
##  [9] "Arch Linux 2013.05 x32"                         
## [10] "CentOS 6.4 x32"                                 
## [11] "CentOS 6.4 x64"                                 
## [12] "Ubuntu 12.04.4 x32"                             
## [13] "Ubuntu 12.04.4 x64"                             
## [14] "Ubuntu 13.10 x32"                               
## [15] "Ubuntu 13.10 x64"                               
## [16] "Fedora 19 x32"                                  
## [17] "Fedora 19 x64"                                  
## [18] "MEAN on Ubuntu 12.04.4"                         
## [19] "Ghost 0.4.2 on Ubuntu 12.04"                    
## [20] "Wordpress on Ubuntu 13.10"                      
## [21] "Ruby on Rails on Ubuntu 12.10 (Nginx + Unicorn)"
## [22] "Redmine on Ubuntu 12.04"                        
## [23] "Ubuntu 14.04 x32"                               
## [24] "Ubuntu 14.04 x64"                               
## [25] "Fedora 20 x32"                                  
## [26] "Fedora 20 x64"                                  
## [27] "Dokku v0.2.3 on Ubuntu 14.04"                   
## [28] "Debian 7.0 x64"                                 
## [29] "Debian 7.0 x32"                                 
## [30] "CentOS 6.5 x64"                                 
## [31] "CentOS 6.5 x32"                                 
## [32] "Docker 0.11.1 on Ubuntu 13.10 x64"              
## [33] "Django on Ubuntu 14.04"                         
## [34] "LAMP on Ubuntu 14.04"                           
## [35] "node-v0.10.28 on Ubuntu 14.04"                  
## [36] "GitLab 6.9.0 CE"

List available sizes

do.call(rbind, do_sizes()$sizes)
##       id name    slug    memory cpu disk cost_per_hour cost_per_month
##  [1,] 66 "512MB" "512mb" 512    1   20   0.00744       "5.0"         
##  [2,] 63 "1GB"   "1gb"   1024   1   30   0.01488       "10.0"        
##  [3,] 62 "2GB"   "2gb"   2048   2   40   0.02976       "20.0"        
##  [4,] 64 "4GB"   "4gb"   4096   2   60   0.05952       "40.0"        
##  [5,] 65 "8GB"   "8gb"   8192   4   80   0.1191        "80.0"        
##  [6,] 61 "16GB"  "16gb"  16384  8   160  0.2381        "160.0"       
##  [7,] 60 "32GB"  "32gb"  32768  12  320  0.4762        "320.0"       
##  [8,] 70 "48GB"  "48gb"  49152  16  480  0.7143        "480.0"       
##  [9,] 69 "64GB"  "64gb"  65536  20  640  0.9524        "640.0"

Let's create a droplet:

(res <- do_droplets_new(name="foo", size_slug = '512mb', image_slug = 'ubuntu-14-04-x64', region_slug = 'sfo1', ssh_key_ids = 89103))
$status
[1] "OK"

$droplet
$droplet$id
[1] 1733336

$droplet$name
[1] "foo"

$droplet$image_id
[1] 3240036

$droplet$size_id
[1] 66

$droplet$event_id
[1] 25278892


attr(,"class")
[1] "dodroplet"

List my droplets

do_droplets_get()
## $status
## [1] "OK"
## 
## $droplets
## $droplets[[1]]
## $droplets[[1]]$id
## [1] 1733336
## 
## $droplets[[1]]$name
## [1] "foo"
## 
## $droplets[[1]]$image_id
## [1] 3240036
## 
## $droplets[[1]]$size_id
## [1] 66
## 
## $droplets[[1]]$region_id
## [1] 3
## 
## $droplets[[1]]$backups_active
## [1] FALSE
## 
## $droplets[[1]]$ip_address
## [1] "107.170.211.252"
## 
## $droplets[[1]]$private_ip_address
## NULL
## 
## $droplets[[1]]$locked
## [1] FALSE
## 
## $droplets[[1]]$status
## [1] "active"
## 
## $droplets[[1]]$created_at
## [1] "2014-05-28T05:59:22Z"

Cool, we have a new Linux box with 512 mb RAM, running Ubuntu 14.04 in the SF region. Notice that I'm using my SSH key here. If you don't use your SSH key, Digital Ocean will email you a password, which you then use. We just have to wait a bit (sometimes 20 seconds, sometimes a few minutes) for it to spin up.

Now we can install stuff. Here, I'll install R, and RStudio Server. This step prints out the progress as you would see if you did ssh into the box itself outside of R. The RStudio Server instance will pop up in your default browser when this operation is done.

do_install(res$droplet$id, what='rstudio', usr='hey', pwd='there')

You can install some things like the libcurl and libxml libraries too with the deps parameter.

When you're done, you can destroy your droplet from R too

do_droplets_destroy(res$droplet$id)
## $status
## [1] "OK"
## 
## $event_id
## [1] 25279124

Let me know if you have any thoughts :)

Logistic plot reboot

Someone asked about plotting something like this today

I wrote a few functions previously to do something like this. However, since then ggplot2 has changed, and one of the functions no longer works.

Hence, I fixed opts() to theme(), theme_blank() to element_blank(), and panel.background = element_blank() to plot.background = element_blank() to get the histograms to show up with the line plot and not cover it.

The new functions:

loghistplot  <- function(data) {
  names(data) <- c('x','y') # rename columns

  # get min and max axis values
  min_x <- min(data$x)
  max_x <- max(data$x)
  min_y <- min(data$y)
  max_y <- max(data$y)

  # get bin numbers
  bin_no <- max(hist(data$x, plot = FALSE)$counts) + 5

  # create plots
  a <- ggplot(data, aes(x = x, y = y)) +
    theme_bw(base_size=16) +
    geom_smooth(method = "glm", family = "binomial", se = TRUE,
                colour='black', size=1.5, alpha = 0.3) +
    scale_x_continuous(limits=c(min_x,max_x)) +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor=element_blank(),
          panel.background = element_blank(),
          plot.background = element_blank()) +
    labs(y = "Probability\n", x = "\nYour X Variable")

  theme_loghist <- list(
    theme(panel.grid.major = element_blank(),
          panel.grid.minor=element_blank(),
          axis.text.y = element_blank(),
          axis.text.x = element_blank(),
          axis.ticks = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          plot.background = element_blank())
  )

  b <-
  ggplot(data[data$y == unique(data$y)[1], ], aes(x = x)) +
    theme_bw(base_size=16) +
    geom_histogram(fill = "grey") +
    scale_y_continuous(limits=c(0,bin_no)) +
    scale_x_continuous(limits=c(min_x,max_x)) +
    theme_loghist +
    labs(y='\n', x='\n')

  c <- ggplot(data[data$y == unique(data$y)[2], ], aes(x = x)) +
    theme_bw(base_size=16) +
    geom_histogram(fill = "grey") +
    scale_y_continuous(trans='reverse', limits=c(bin_no,0)) +
    scale_x_continuous(limits=c(min_x,max_x)) +
    theme_loghist +
    labs(y='\n', x='\n')

  grid.newpage()
  pushViewport(viewport(layout = grid.layout(1,1)))

  vpa_ <- viewport(width = 1, height = 1, x = 0.5, y = 0.5)
  vpb_ <- viewport(width = 1, height = 1, x = 0.5, y = 0.5)
  vpc_ <- viewport(width = 1, height = 1, x = 0.5, y = 0.5)

  print(b, vp = vpb_)
  print(c, vp = vpc_)
  print(a, vp = vpa_)
}
logpointplot  <- function(data) {
  names(data) <- c('x','y') # rename columns

  # get min and max axis values
  min_x <- min(data$x)
  max_x <- max(data$x)
  min_y <- min(data$y)
  max_y <- max(data$y)

  # create plots
  ggplot(data, aes(x = x, y = y)) +
    theme_bw(base_size=16) +
    geom_point(size = 3, alpha = 0.5, position = position_jitter(w=0, h=0.02)) +
    geom_smooth(method = "glm", family = "binomial", se = TRUE,
                colour='black', size=1.5, alpha = 0.3) +
    scale_x_continuous(limits=c(min_x,max_x)) +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor=element_blank(),
          panel.background = element_blank()) +
    labs(y = "Probability\n", x = "\nYour X Variable")

}

Install ggplot2 and gridExtra if you don't have them:

install.packages(c("ggplot2","gridExtra"), repos = "http://cran.rstudio.com")

And their use:

Logistic histogram plots

loghistplot(data=mtcars[,c("mpg","vs")])

plot of chunk unnamed-chunk-5

loghistplot(movies[,c("rating","Action")])

plot of chunk unnamed-chunk-6

Logistic point plots

loghistplot(data=mtcars[,c("mpg","vs")])

plot of chunk unnamed-chunk-7

loghistplot(movies[,c("rating","Action")])

plot of chunk unnamed-chunk-8