I tested the two services with a very simple script. The script simply creates a dataframe of 10000 numbers via rnorm, and assigns them to a factor of one of two levels (a or b). I then take the mean of the two factor levels with the aggregate function.
In CRdata you need to put in some extra code to format the output in a browser window. For example, the last line below needs to have ‘<crdata_object>’ on both sides of the output object so it can be rendered in a browser. And etc. for other things that one would print to a console. Whereas you don’t need this extra code for using Cloudnumbers.
dat <- data.frame(n = rnorm(10000), p = rep(c('a','b'), each=5000)) out <- aggregate(n ~ p, data = dat, mean)
Here is a screenshot of the output from CRdata with the simple script above.
This simple script ran in about 20 seconds or so from starting the job to finishing. However, it seems like the only output option is html. Can this be right? This seems like a terrible only option.
In Cloudnumbers you have to start a workspace, upload your R code file.
Then, start a session…
choose your software platform…
choose packages (one at a time, very slow)…
then choose number of clusters, etc.
Then finally star the job.
Then it initializes, then finally you can open the console, and
Then from here it is like running R as you normally would, except on the web.
Who wins (at least for our very minimal example above)
- Speed of entire process (not just running code): CRdata
- Ease of use: CRdata
- Cost: CRdata (free only)
- Least annoying: Cloudnumbers (you don't have to add in extra code to run your own code)
- Opensource: CRdata (you can use publicly available code on the site)
- Long-term use: Cloudnumbers (more powerful, flexible, etc.)