Some Comments on Cloud Computing ================================ by Harry Mangalam v1.0, Sept 15 2011 // Convert this file to HTML & move to it's final dest with the command: // export fileroot="/home/hjm/nacs/Cloud_Comments"; asciidoc -a toc -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]* moo:~/public_html Are clouds anything more than 'clusters wrapped in magical thinking'? Is cloud computing more than the attempt to monetize a particular organization's investment in sysadmin? Does it help anyone complete their work more effectively, more cheaply? Is the technology churn worth it? Would it not be more effective to standardize/stablize on your own, well-understood technology? Complexity / Predictability --------------------------- - Moving software to a cloud increases the number of moving pieces and changes the ownership and responsibility of those pieces. Increasing software complexity is strongly correlated with more bugs in the system, especially when there are geographically dispersed networks that connect all that juicy goodness. - Cloud computing works well for orgs that have large dynamic computational needs sporadically (or regularly with long periodicty) and would rather buy cloud expertise than hardware (ie: architectural firms that need to run a huge render job every few weeks). This allows a fairly large activation cost to be amortized over many runs. - Administrative computing maps better onto clouds than does research computing for a variety of reasons, mostly having to do with amount of data to compute and the relative stability of the codes. Once a business process is set up, it tends to concretize (for better or worse). But note that administrative computing is often quite regular so that the dynamic range of its needs can be addressed locally. - I would argue that using clouds does not /decrease/ the amount of expertise you require to run your codes, but /changes/ the allocation of what that knowledge is. The delta in that knowledge (as for any such case) will cost nontrivial $ and gaining that knowledge will take time. "Good judgement comes from experience; Experience comes from bad judgement." - Relatedly, the supposed reduced System Administrator (SA) cost for running your own cluster in terms of HW and SW maint is replaced with the arguably more expensive sophisticated SA costs for running your codes in the cloud. The argument that SA costs decrease with the cloud seems to be an overstatement at best. Your 6 year-old can probably instantiate a cloud cluster on EC2 with a single click, but she might take more time to debug a driver-level MPI problem in an Molecular Dynamics cloud code. - Clouds will obviously scale better if the computation is done with Open Source code. If it's a licensed application (commercial CFD, MATLAB for example), the per-node license fees may swamp the actual computational cost). Currently most licenses must be maintained by the customer; not with the Cloud, so if you only want to run a huge MATLAB job once a year, you still have to pay for MATLAB for the entire year. This may change as Cloud vendors exert more influence on app makers. - If your problem is not time-critical and you have large codes that can be run at low priority and can be checkpointed to as to consume only low-cost cycles, clouds would be a good solution. But this seems to be in the minority of cases. - Clouds provides a contractual, hard cost for your computing. This may be worth while, even tho you're paying someone else's profit margin. On the other hand, if you have better things to do with your money than tying up $ in infrastructure or your cost projections for your own resources are too uncertain, this may be a good tradeoff. Data Security ------------- - Clouds introduce geographical distribution and uncertainty as well as the usual data exposure when your data moves across public networks. This can often spread data across multiple legal boundaries. While service agreements and data-level encryption can address this, it can be a serious issue with some types of data. For example, Amazon's S3 storage is not HIPAA compliant by itself, but it is being used to store encrypted HIPAA data. In general, cloud computation makes more sense where where data security and or integrity doesn't matter. - Commercial cloud infrastructures often come with some level of backup and data integrity. This obviates what can be a complex and expensive part of maintaining your own. But you'd better be sure what this entails and what kinds of minimum data integrity it ensures. - Recent 'cloudbursts' from http://goo.gl/dKcZ8[Amazon], http://goo.gl/bt2Pq[Sony], and http://goo.gl/UmWY1[Microsoft], and more recently http://goo.gl/BAl9o[Google and Microsoft again] for example, show that even vendors with extraordinary talents in maintaining cloud infrastructure can make mistakes that can wipe out their infrastructure, customer base, or both for days or permanently. - Educause has a paper that was mentioned in this list: http://goo.gl/HUuia[If It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues]. It has a number of links that are worth reading, tho IMHO most of them are viewing clouds thru remarkably rosy glasses. This impression may be colored by my own experiences as an SA, which by definition means living in the world of IICGWIWGW (If It Can Go Wrong, It Will Go Wrong). HPC and Research Computing -------------------------- - [GPU] clouds should work well on large GPU codes since GPU clusters run as fast in the cloud as on your desktop) - Large runs using Amazon's virtual HPC clusters (which /are/ HPC clusters) should work well, but the cost may overwhelm the convenience, depending on the length of the run. - Many modern research algorithms are very memory intensive and require 100s of GB of RAM for calculations. Some of these can be broken down to address with http://en.wikipedia.org/wiki/MapReduce[MapReduce] approaches, but not easily. Since no Amazon HPC resources offer such hardware, these problems are still restricted to local solutions. - Codes where there is little data IO and lots of computation, such as molecular dynamics, Monte Carlo sampling, etc. (see GPU computing above). Depending on the app, sometimes huge output (MD trajectory files) can be generated so that the storage costs overwhelm the CPU savings. - The virtualization of research computing does have some advantages if the hardware is advanced enough to support the algorithms. It could provide a standardized way to access large resources, essentially replacing the nation's Supercomputing Centers with centralized access to commodity supercomputing. Unlike the National Supercomputing Centers, it would be self-supporting (or go broke) and would therefore make the true cost of large computations apparent. - so far, most granting agencies are not set up to fund calculations via the Cloud, so there's little reason for PIs to move to Cloud calculations, beyond evaluating it as a research problem. References for Research Computing --------------------------------- - http://www.pc2.de/uploads/tx_sibibtex/providing_ssaas.pdf[Providing Scientific Software as a Service in Consideration of Service Level Agreements] - http://onlinelibrary.wiley.com/doi/10.1002/spe.1055/pdf[Virtualized HPC: a contradiction in terms?] - http://www.ucgrid.org/cloud2011/UCCloudSummit2011.html[UC-Cloud Summit/2011 - videos and slide decks on cloud usage at UC] (thanks to Prakashan for getting the vids online)