1. Introduction

This document describes:

This project was undertaken as a test case to try out the Turbogears web development platform in preparation for a larger project. It turned out to be a fairly smooth project, but it's a little schizophrenic as BackupPC is Perl-based and Turbogears is Python-based. Nevertheless, I believe it has some value. Below I describe what the web interface does, why it does it, and what you should be aware of.

2. What is it for?

BackupPC is an Open Source, disk-based backup service written in Perl that runs on a *nix server and uses some neat features to backup specified directories on groups of computers running a variety of Operating Systems. The web interface I'm describing currently works with MS Win2K, WinXP, and WinVista (the latter with minor client-side changes)., Those are the the most popular Desktop OSs in our environment, but BackupPC also works with MacOSX and Linux and while those systems are even easier to support, they are less popular at UCI. Therefore, the web interface was initially targeted at Windows, but a new interface that includes MacOSX and Linux is underway and the mechanism for installing the client software for both those platforms is well-characterized and considerably simpler than with Windows.

BackupPC is useful for short-term backups - up to months - to prevent catastrophic loss of data if the client disk crashes or if the PC walks away (increasingly more likely with more people using laptops). In our environment, it is especially targeted at researchers whose entire intellectual corpus may be on their hard disk - their publication record, email, lab & personnel records, scientific results, presentations, and especially their in-progress grants. The loss of this data really would be quite serious.

After personally using BackupPC for a while, I very much like the architecture and performance (even with multi-user servers), was very impressed by the protocol options, and REALLY liked that clients could restore their backups by themselves, but was concerned that it was too complicated for a mere human to install on a Windows machine. Hence this Web interface.

3. How does it work?

3.1. The short version

3.1.1. For Windows (2k, XP, Vista)

3.1.2. For MacOSX & Linux (very soon)

* Due to individual configuration changes wanted, non-standard firewall configuration, and occasional missing DLLs, about 1/5 of the installs done so far have required intervention or correction. As we install on more clients, we will get a better idea of what the usual configuration state is. Also, some email systems will strip zip files from emailo attachments due to security concerns. This transfer can be re-written to email only the URL, and login info to download the same zipfile from the server.

3.2. The long version

3.2.1. For Windows (2k, XP, Vista)

Windows is the most troublesome platform, but the following also describes the approach for MacOSX and Linux clients.

3.2.2. For Linux Client

We assume that each Linux client has a recent, working rsync executable. If it doesn't, the installation script will tell you to install it and then you can rerun the script again.

For each Linux client, the server needs to generate 3 files:

The backuppc client runs under a regular user UID so it can only read things that it should - sensitive system files cannot be read (altho sensitive user files can be if they are not explicitly excluded). If the backup needs exceptions, they should be made by manipulating the appropriate group in /etc/group.

The rsyncd.* files above should have the following permissions:

-rw-r--r-- 1 root root  987 2008-06-18 08:23 rsyncd.conf
-rw-r--r-- 1 root root  357 2008-06-18 08:23 rsyncd.exclude
-rw------- 1 root root   52 2008-10-15 15:07 rsyncd.secrets <== NOTE!!

The rsyncd.conf file defines how the different backup modules or shares are backed up.

Here's the rsyncd.conf file that will back up an entire /home dir on Linux or MacOSX


# GLOBAL OPTIONS
log file=/var/log/rsyncd
pid file=/var/run/rsyncd.pid
# the following line 'auth users' is not a real login user.  This is the ID that the
# rsync server and client have agreed to use to allow rsyncs back and forth.
auth users = user_from_form

# the 'secrets file' contains the CLEARTEXT passwords of the 'auth users' group
# so it MUST NOT be readable by everyone.
secrets file = /etc/rsyncd.secrets
# the 'dont compress' line contains regexes for files that should not be compressed
dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.iso *.bz2 *.tbz *.exe

# MODULE OPTIONS

[home]
        comment = The complete /home dir (all users)
# Note that everything in /home will be read by the uid defined
# below. If that user doesn't have sufficient privs, the backup will just
# skip the file/dir. Adjust required privs to the user by adding that user
# to an appropriate group in '/etc/group'
        path = /home
        use chroot = no
        max connections=1
        lock file = /var/lock/rsyncd

# the default for read only is yes...
        read only = yes
        list = yes

# the uid/gid should be that of a regular user so we don't accidentally read private data
        uid = backuppc
        gid = backuppc

# this sets a universal exclude, but each Module (or share) can have its own exclude file.
        exclude from = /etc/rsyncd.exclude
        strict modes = yes
# 1st we deny all hosts
        hosts deny = *
# then allow ONLY the BackupPC server.  Files will be read only if
# 1 - a request comes from the BackupPC server IP
# 2 - the auth matches
# 3 - an rsyncd (with the correct auth) is running on the client
# 4 - the files have ownership and privs that allows the backuppc user to read them
        hosts allow = ###.###.84.82
        ignore errors = no
        ignore nonreadable = yes
        transfer logging = yes
        timeout = 600
        refuse options = checksum dry-run

4. Installation Problems noted

5. Day to day usage

Once the client software is installed, you can either wait for the server to initiate a backup or you can log into the BackupPC web admin page and initiate a backup yourself (advised). The 1st run will do a complete backup of the requested directory tree, taking a time proportional to the size of the directory. Successive backups will take time proportional to the changed files (the backup protocol is rsync).

As an example, my home dir is about 43GB, but after subtracting mp3s, pictures, and Virtualbox partition, there are only about 12GB to be backed up, which took about 35m. The time taken to do an incremental backup was less than 2 min over a 100Mb connection, most of that time exchanging rsync comparison data.

The slowdown on my system (a 1.6GHz, 2core laptop) when doing the backup is almost undetectable unless I'm doing a lot of other disk activity.

6. Remaining known issues

7. Server Changes required for the auto-registration mechanism

7.1. Owners and Permissions

BackupPC normally runs under a non-root user and group (normally backuppc for both) and stores its configuration files in /etc/BackupPC. Because of the web interaction, the backuppc group must include the web user (often www-data) so the cgi script can write new configuration files into that directory. The additional options that the web form allows requires some additional directories to be created in the root (/etc/BackupPC) dir beyond those created by a standard BackupPC installation

7.2. Handling Dynamic IPs

An increasing number of people are using laptops as their main computer. Such machines often connect using dynamic IP assignment (DHCP) so that their IP number can vary from day to day.

The feature that allows BackupPC to back up even such roving PCs requires the creation of a new directory (/etc/BackupPC/alive) to hold the periodic IP updates. This directory is populated by the client machines periodically sending their current IP #s to the server via a restricted, constricted (low bandwidth connection) rsyncd process running on the server.

On MacOSX (M) and Linux (L), if the client is registered as DHCP, it will get a few additional bits of code that

Every X minutes (15 is the default), the server runs a cron script that checks to see if any DHCP info has changed and if so, stream-edits it into the server's /etc/hosts file between identifying fencepost markers, as below

     20 #### BackupPC hosts BEGIN - DO NOT MODIFY THIS LINE ####
     21 ###.###.###.147    flip.xxx.uci.edu        flip
     22 ###.###.###.231   jackrabbit.xxx.uci.edu  jackrabbit
     23 ###.###.###.148    flop.xxx.uci.edu        flop
     24 ###.###.###.8      tatry.xxx.uci.edu      tatry
     25 ###.###.###.144    flipper.xxx.uci.edu     flipper
     26 ###.###.###.31    haggis.xxx.uci.edu      haggis
     27 ###.###.###.211    mmg-dhcp77.xxx.uci.edu  mmg-dhcp77 < 'real' hostname=fredo
     28 #### BackupPC hosts END - DO NOT MODIFY THIS LINE ####

These entries, being in the /etc/hosts file, pre-empt the official hostname <-> IP mapping of the client machine. This should bypass the requirement of making up a unique hostname, but may saddle the user with a hostname that no relationship to the name that the user gave it or thinks of it as (see fredo example above).

We may be able to address this with more sophisticated web forms, but for now, I'm going to just keep the DHCP-supplied hostname and have the user make one up that will be used here.

8. Security Concerns FAQ

BackupPC and our web interface to it are not highly secure. They are meant to provide Backup services in a fairly protected environment (local networks, not Enterprise WANs) Because of this, the default configuration provides for reasonable security but not full end-to-end encryption and encrypted storage. A determined, talented attacker should be able to read BackupPC data over the wire, but over modern switched wired networks this should not be trivial. Note that for security and bandwidth constraints, wireless networks should not be used as a communication channel for BackupPC. We can certainly add ssh encryption to the protocol, but it's not being done now.

Because our implementation of BackupPC uses rsync as the transfer protocol, any security weakness of rsync also applies to BackupPC.

The web registration form constrains Windows clients to use the C:\Documents and Settings directory or a (probably) new C:\Data or C:\Backup dir in part to make them copy data to that dir to be backed up so that they will not inadvertantly back up sensitive data. The current exclude list lists the default file types that are excluded.

The client-side configuration files included with the installation package allow an interested client to manipulate many of the rsyncd parameters. She could increase the directories to back up, enable backing up normally excluded files (such as her MP3 collection, tax returns, and in fact her entire disk, OS included.) However, this would show up on a quota scan and the BackupPC administrator could suspend her backups and/or delete her stored backups.

The imalive.py script sends small files back to the server via an bandwidth-constrained (1kB/s) rsyncd process. A normally registered, but malevolent client could configure his client to send a stream of data back to the server to slowly fill up the root file system and therefore crash the server. At this point in the development process, I haven't added a check for that (is the imalive dir larger than X bytes), but it could easily be added to the imalive.py script.

9. Question & Answers

These Q&As are answered to the best of my ability. Please correct them if you find an error or mistaken assumption.

Q: If a person fills out a form for another PC, can that person subvert the other PC?

A: Only if the owner of the target PC copies and installs the customized package for the target PC. The owner would normally not get the client package emailed to them or if they did and they didn't know what it was, they wouldn't normally(?) follow the directions to install something they didn't ask for.

If the person who filled out the form for another PC was doing so in a support role, the administrative owner of the PC would still have to agree to allow the client software to be installed.

Q: Could a cracker provide a doctored download package to the client?

A: Yes, but they would have to also find a way to introduce it to the target PC and run the installation scripts. They could do this via social engineering, for example.

Q: Is the rsync stream readable in transit?

A: As it's written now, yes, altho with switched networks, this should be less of a problem. In addition, the rsync stream is not a linear stream of files, so while it's possible, it's not trivial. This protocol should not be used for backing up clients over unencrypted wireless networks where all communication streams are broadcast.

Q: What changes are made to the client PC?

A: on WinXP, a .bat file is run by the user to install the package. It installs the Cygwin DLL & rsync as a Windows Service, pokes a hole in the Windows firewall for port 873 (the default rsync port), and allows a specific rsync server to request backups of the share selected during the registration. It leaves the rsync.secrets file in the C:\rsyncd directory, which contains the unencrypted rsync password to the server. If the PC is identified as a DHCP client, it also installs a scheduled task using schtasks to send the IP number of the server to allow for changing IP #s (the server checks for updated IP#s before attempting a backup). It does not install ssh on the client to use as an encryption tunnel, altho this could be done.

Q: Will the zip file email attachment interfere with the delivery of the notification email?

A: It could, depending on how you've set up your anti-virus and email-scanning software. Some scanning software doesn't allow zipped attachments; some don't allow binaries inside of zipped attachments. There are some work-arounds, but they require a re-write of the server software to send the minimum info (as text files) and to download the rest via URL.

Q: What sensitive information resides on the server?

A:The per-PC configuration files contain the cleartext password for client rsync services, so you must be very careful to keep this directory readable only by the backuppc user. Also, the client files can be read by the root user and anyone who has sudo privs on the server, so it's strongly recommended that the server be dedicated to the BackupPC service or at least that it is not used for shell login by many people. This is not specific to the web interface, but is also the case with the native BackupPC configuration.

Q: Will the server back up my sensitive files?

A: It will back up everything in the directory that you specify (ie C:\My Documents). The program of course makes no distinction as to whether they're sensitive or not. It will ignore all the files you tell it to ignore via the C:\rsyncd\rsync.exclude file which lists the types of tiles to exclude. This file is editable to increase the kinds of files to ignore. It uses regular expressions to define filename patterns to ignore.

10. Server PreRequisites

11. Setup

I'm going to assume that you're using the Ubuntu 8.04 LTS release.

12. Files created / modified by BackupPC and WebBUPC

/etc
  |-- hosts    <- BackupPC DHCP hosts added between fenceposts
  |-- crontab <- entry for running /etc/BackupPC/alive/alive.py
  |     [3,18,21,33,48 * * * *  root  cd /etc/BackupPC/alive && ./alive.py ]
  `-- BackupPC
      |-- LOCK
      |-- README_apache
      |-- config.pl <- BackupPC configuration file (Perl syntax)
      |-- etchosts <- interim file of data inserted into /etc/hosts
      |-- hosts <- BackupPC hostname list with IP type, declared  user
      |-- htpasswd.4.backuppc <- for BackupPC web interface (encrypted passwords)
      |-- rsyncd.lock
      |-- rsyncd.secrets <- includes CLEARTEXT passwd (RW only BUPC admin user)
      |
      |-- alive <- contains IP#:hostname mapping info
      |   |-- alive.py <- for processing the alive files, see crontab entry above
      |   |-- flip.xxx.uci.edu_hjmangalam_dhcp
      |   |-- flop.xxx.uci.edu_hjmangalam_dhcp
      |   |-- <etc>
      |   |-- haggis.xxx.uci.edu_hjmangalam_dhcp
      |   |-- jackrabbit.xxx.uci.edu_mlwaterm_dhcp
      |   `-- tatry.xxx.uci.edu_tsoeller_dhcp
      |
      |-- backup <- contains backups for debugging; created by alive.py
      |   |-- etchosts.20081017T123806
      |   |-- etchosts.20090122T155833
      |   |-- etchosts.20090123T154801
      |   |-- etchosts.20090123T160301
      |   |-- hosts.20080420T180006
      |   |-- hosts.20080420T180950
      |   |-- hosts.20090120T093145
      |   |-- hosts.20090120T161744
      |   |-- hosts.20090121T122344
      |   |-- htpasswd.4.backuppc.20080420T180006
      |   |-- htpasswd.4.backuppc.20080420T180950
      |   |-- rsyncd.secrets.20090120T093145
      |   |-- rsyncd.secrets.20090120T161744
      |   `-- rsyncd.secrets.20090121T122344
      |
      `-- pc   <- contains config files for each client
       |          all files have to be owned by user.backuppc user/group.
       |          [If root wrote them, backuppc can't read them]
       |-- LOCK
       |-- aries.xxx.uci.edu.pl
       |-- athina.xxx.uci.edu.pl
       |-- bongo.xxx.uci.edu.pl
       |-- cg1.xxx.uci.edu.pl
       |-- flip.xxx.uci.edu.pl             $Conf{RsyncdUserName}  = "dubya";
       |-- kenyi.xxx.uci.edu.pl  ie......  $Conf{RsyncdPasswd}    = "thegreat1";
       |-- flop.xxx.uci.edu.pl             $Conf{RsyncShareName}  = [BACKUP];
       |-- jackrabbit.xxx.uci.edu.pl
       |... etc
       |-- palisade.xxx.uci.edu.pl
       |-- philia.xxx.uci.edu.pl
       |-- pinch2.xxx.uci.edu.pl
       |-- psatarri.xxx.uci.edu.pl
       `-- tatry.xxx.uci.edu.pl