1. Introduction and Constraints

UCI, like many UC campuses, is facing the dual squeeze of decreasing IT budgets, and increasing licensing fees for our institutional Backup Systems. We also are facing somewhat more requirements from clients as more data is being gathered or generated, analyzed, and archived. In view of these pressures, the Office of Information Technology (OIT) is evaluating what our requirements are, and what Backup solutions can be used to more economically address our needs. We are evaluating both Proprietary and Open Source Software (OSS) approaches and it may be that the optimal solution is a combination of the 2.

Any Backup approach is guided by at least 2 issues:
  1. The value of the data (or the cost of replacing it).

  2. The cost of backing it up.

Much of the most valuable institutional data is stored on high-cost, high-reliability, highly-secured central servers. This makes backup fairly easy and most such devices have inherent or included data redundancy or data protection, making decisions about what backup system to use much easier (in some cases, there is no choice at all since the proprietary nature of the storage allows only the vendor’s implementation).

The situation is somewhat different for a university, which brings in much of its support $ in the form of overhead on grants. Such faculty-initiated grants brought in ~$328M in external funding last year for UC Irvine alone. Those grants were being composed on and still reside substantially on personal Laptops and Desktops scattered around the campus. The vast majority of them are backed up sporadically, if at all. I’ve had personal experience in trying to rescue at least 5 grants that were lost to disk crashes or accidental deletion days before the submisison date. That is valuable data and it would be exceptionally useful to be able to provide backup services to such users, even disregarding the somewhat less critical primary data from their labs. At UCI, this includes about 400 faculty who write for external grants on a regular basis. Any Backup system should minimally cover these people and therefore the scalability and Mac/Windows client compatibility of any such system is quite important.

Below is a summary of our current backup systems, EMC Networker and EMC Retrospect.

2. Networker Client Load

We currently use Networker to back up ~230 clients (mostly other servers) to 3 backup servers, with storage requirements as described graphically below: UCI_backup_clients_plot.jpg

and statistically here:

Sum       109326.2 ........ total GB
Number    227 ............. # clients
Mean      481.61 .......... GB / client
Median    155.9 ...........     "
Min       0.2 ............. GB on smallest client
Max       4651.2 .......... GB on largest client
Range     4651 ............ diff between 2 above
Variance  523643.27 ....... among all clients
Std_Dev   723.63 ..........        "
SEM       48.02 ...........        "
Skew      2.43 ............        "
Std_Skew  14.99 ...........        "
Kurtosis  6.97 ............        "

We currently use about 1/4 of an FTE to administer the Networker system after setup. The IAT recharge rates for this service is listed here, but in summary, it’s $20/mo/system plus $.39/GB for storage and a $40 charge for file restores.

3. Retrospect Client Load

Retrospect is a disk-only based system,

We currently pay $620/yr for this license and currently have 59 clients (18 Macs, rest Windows).

We currently charge $87.50/user/year with a 50GB limit on storage, with self-service file restores.

4. Open Source Backup Software Evaluation

There are perhaps 50 OSS Backup systems, but most of them are too limited in their features or maturity to be considered. Only 3 seem to rise to the level of possibility, tho for different things:

Amanda, Bacula, and BackupPC share these characteristics:

4.1. Amanda/Zmanda

4.2. Bacula

4.3. BackupPC

4.4. Some Feature Comparisons

Nice Table of Feature Comparisons among OSS Backup packages and proprietary Backup packages

4.5. Crude Measures of popularity

Via Google-linking (a very crude measure; very sensitive to key words)

And for some comparison:

Google Trends indicates that the Search Volume Index is decreasing rapidly for Veritas, is holding constant for EMC Networker, Bacula and BackupPC, but Bacula is ~3x the value of BackupPC, which is itself 2x EMC Networker. Veritas has decreased to just above BackupPC. Zmanda only started in 2006, and does not have much of an index built up, but it show significant spikes in News Reference Volume. "amanda backup" has been decreasing from 2004 and remains about 1/2 of BackupPC and 1/6 of Bacula.

5. Future Projects

I would like to see the automatic backup of all of our faculty’s Desktops/Laptops (\~1000) to shield against catastrophic loss of recent data, but I’m not sanguine about the chances for this due to funding issues, unless we go with a pure OSS solution, which would be considerably better than nothing at all.

5.1. Details

At 10GB per faculty, this would mean a storage server of ~20TB (now a medium-sized file server), and at 1% data changing per day, that means that on the order of 100GB a day would have to be transferred for incremental backups. At 7 MB/s (a decent transfer rate over 100Mb), transmission time is only ~4 hrs, easily done in a night, in parallel sessions. Measured on an Opteron backup server (doing server-side compression & file deduping via hardlinking), it takes about 25% of a CPU to handle 1 backup session, so a 4-core machine could theoretically handle ~16 simultaneous backups, if the bandwidth can supply it with enough data. Our test backup server is currently single-homed, but has 5 interfaces, so could easily be multi-homed. If we use OSS Backup software, it will cost \~$10K for hardware to provide 1000 very valuable PCs or laptops with at least protection against catastrophic loss.

6. Considerations for any such decision

Please feel free to expand on (or critique) these points.

6.1. Clients

6.2. Server & Admin

6.3. Backup Protocol


+ contributed by Scott Talkovic

= contributed by Scott Beardsley

7. Feedback and Suggestions

We are still in a preliminary mode for this evaluation, but if you have suggestions, queries, or would like to be notified of the final result and sent any documentation of the process, please let me know.

7.1. Contributed Suggestions

8. Resources

8.1. Books

The O’Reilly book Backup & Recovery: Inexpensive Backup Solutions for Open Systems is a good overview of some important considerations for a backup system. UC people can read it in its entirety here via O’Reilly Safari

Note that O’Reilly itself uses a combination of commercial and Open Source tools.

8.2. Web sites

Curtis Preston, the author of the above book runs a backup-related site called BackupCentral, which is a very good info clearing house / blog on all things backup.

The slightly irritating, but fairly entertaining Curtis Preston in a 44m video about various backup and dedupe schemes.

SearchDataBackup is another backup-related site.

8.3. Whitepapers

Of variable quality, some sponsored by vendors.

Backup on a Budget (PDF) - by the ubiquitous Curtis Preston. Reiteration of many of his points, especially pointed to the fact that for many organizations, backup is not rocket science, people are the most expensive thing you pay for, and that backup to clouds may be useful (but mostly in rare occassions).

8.4. Useful(?) individual pages

Comments about IBM’s Tivoli Storage Manager

8.4.1. fwbackups

Via TechRepublic’s 10 outstanding Linux backup utilities. Very short list, with some interesting choices. fwbackups is a very slick little program writ in Python/GTK that can work on all platforms (but GTK on Windows requires the whole GTK lib). It’s not an enterprise system, but if you have a Personal Linux box to back up it’s pretty straightforward, and can use rsync/ssh to encrypt over the wire. However, it does require your own shell login and dedicated dir space on a server.

Boxbackup is not in the running for an enterprise backup system, but is interesting for near-realtime backup for a small group of clients.

9. Latest version

The latest version of this document should always be here.