Some Comments on Cloud Computing
================================
by Harry Mangalam <harry.mangalam@uci.edu>
v1.0, Sept 15 2011


// Convert this file to HTML & move to it's final dest with the command:
// export fileroot="/home/hjm/nacs/Cloud_Comments"; asciidoc -a toc -a numbered ${fileroot}.txt; scp ${fileroot}.[ht]* moo:~/public_html

Are clouds anything more than 'clusters wrapped in magical thinking'?

Is cloud computing more than the attempt to monetize a particular
organization's investment in sysadmin?  Does it help anyone complete
their work more effectively, more cheaply?  Is the technology churn
worth it?  Would it not be more effective to standardize/stablize on
your own, well-understood technology?  

Complexity / Predictability
---------------------------

- Moving software to a cloud increases the number of moving pieces and
changes the ownership and responsibility of those pieces.  Increasing
software complexity is strongly correlated with more bugs in the
system, especially when there are geographically dispersed networks
that connect all that juicy goodness. 

- Cloud computing works well for orgs that have large dynamic
computational needs sporadically (or regularly with long periodicty)
and would rather buy cloud expertise than hardware (ie: architectural
firms that need to run a huge render job every few weeks).  This
allows a fairly large activation cost to be amortized over many runs.
 
- Administrative computing maps better onto clouds than does research 
computing for a variety of reasons, mostly having to do with amount of
data to compute and the relative stability of the codes.  Once a
business process is set up, it tends to concretize (for better or
worse). But note that administrative computing is often quite regular
so that the dynamic range of its needs can be addressed locally.

- I would argue that using clouds does not /decrease/ the amount of
expertise you require to run your codes, but /changes/ the allocation
of what that  knowledge is.  The delta in that knowledge (as for any
such case) will cost nontrivial $ and gaining that knowledge will take
time.    "Good judgement comes from experience; Experience comes from
bad judgement."

- Relatedly, the supposed reduced System Administrator (SA) cost for
running your own cluster in terms of HW and SW maint is replaced with
the arguably more expensive sophisticated SA costs for running your
codes in the cloud.   The argument that SA costs decrease with the
cloud seems to be an overstatement at best.  Your 6 year-old can
probably instantiate a  cloud cluster on EC2 with a single click, but
she might take more time to  debug a driver-level MPI problem in an
Molecular Dynamics cloud code. 

- Clouds will obviously scale better if the computation is done with
Open Source code.  If it's a licensed application  (commercial CFD,
MATLAB for example), the per-node license fees may swamp the actual
computational cost).  Currently most licenses must be maintained by
the customer; not with the Cloud, so if you only want to run a huge
MATLAB job once a year, you still have to pay for MATLAB for the
entire year.  This may change as Cloud vendors exert more influence on
app makers.

- If your problem is not time-critical and you have large codes that
can be run at low priority and can be checkpointed to as to consume
only low-cost cycles, clouds would be a good solution.  But this seems
to be in the minority of cases.

- Clouds provides a contractual, hard cost for your computing. This
may be worth while, even tho you're paying someone else's profit
margin.   On the other hand, if you have better things to do with your
money than tying up $ in infrastructure or your cost projections for
your own resources are too uncertain, this may be a good tradeoff.


Data Security
-------------

- Clouds introduce geographical distribution and uncertainty as well
as the usual data exposure when your data moves across public
networks.  This can often spread data across multiple legal
boundaries. While service agreements and data-level encryption can
address this, it can be a serious issue with some types of data.  For
example, Amazon's S3 storage is not HIPAA compliant by itself, but it
is being used to store encrypted HIPAA data.  In general, cloud
computation makes more sense where where data security and or
integrity doesn't matter.

- Commercial cloud infrastructures often come with some level of
backup and  data integrity.  This obviates what can be a complex
and expensive part of maintaining your own.  But you'd better be
sure what this entails and what kinds of minimum data integrity it
ensures.

- Recent 'cloudbursts' from http://goo.gl/dKcZ8[Amazon],
http://goo.gl/bt2Pq[Sony], and http://goo.gl/UmWY1[Microsoft], and
more recently http://goo.gl/BAl9o[Google and Microsoft again] for 
example, show that even vendors with extraordinary talents in
maintaining  cloud infrastructure can make mistakes that can wipe out
their infrastructure, customer base, or both for days or permanently. 

- Educause has a paper that was mentioned in this list:
http://goo.gl/HUuia[If It's in the  Cloud, Get It on Paper: Cloud
Computing Contract Issues].  It has a number of links that are worth
reading,  tho IMHO most of them are viewing clouds thru remarkably
rosy glasses.  This impression may be colored by my own experiences as
an SA, which by definition means living in the world of IICGWIWGW (If
It Can Go Wrong, It Will Go Wrong).


HPC and Research Computing
--------------------------
- [GPU] clouds should work well on large GPU codes since GPU clusters
run as fast in the cloud as on your desktop)

- Large runs using Amazon's virtual HPC clusters (which /are/ HPC
clusters) should work well, but the cost may overwhelm the
convenience, depending on the length of the run.

- Many modern research algorithms are very memory intensive and
require 100s of GB of RAM for calculations.  Some of these can be
broken down to address with
http://en.wikipedia.org/wiki/MapReduce[MapReduce] approaches, but not
easily.  Since no Amazon HPC resources offer such hardware, these
problems are still restricted to local solutions.

- Codes where there is little data IO and lots of computation, such as
molecular dynamics, Monte Carlo sampling, etc. (see GPU computing
above).  Depending on the app, sometimes huge output (MD trajectory
files) can be generated so that the storage costs overwhelm the CPU
savings.

- The virtualization of research computing does have some advantages
if the hardware is advanced enough to support the algorithms.  It
could provide a standardized way to access large resources,
essentially replacing the nation's Supercomputing Centers with
centralized access to commodity supercomputing.  Unlike the National
Supercomputing Centers, it would be self-supporting (or go broke) and
would therefore make the true cost of large computations apparent.

- so far, most granting agencies are not set up to fund calculations
via the Cloud, so there's little reason for PIs to move to Cloud
calculations, beyond evaluating it as a research problem.


References for Research Computing
---------------------------------

- http://www.pc2.de/uploads/tx_sibibtex/providing_ssaas.pdf[Providing Scientific Software as a Service in Consideration of
Service Level Agreements]

- http://onlinelibrary.wiley.com/doi/10.1002/spe.1055/pdf[Virtualized HPC: a contradiction in terms?]

- http://www.ucgrid.org/cloud2011/UCCloudSummit2011.html[UC-Cloud Summit/2011 - videos and slide decks on cloud usage at UC] 
(thanks to Prakashan for getting the vids online)