Wednesday, May 14, 2008
Using rsync for backup, part I
Huge hard drives are cheap these days, and almost always they come bundled with some fancy back-up solution; not at all surprising, given that it is precisely what majority of customers are going to use new disk space for.
Interestingly enough, there aren't that many good backup utilities for Linux, or open source tools of this sort in general. Most likely this is because people still use "rsync" for Linux, Unix, and multi-platform backups.
"rsync" is very far from being "perfect" backup tool (more on that below), but at least is is old, stable, reliable, simple to use and available for all kinds of platforms, including Cygwin and native Windows port.
Therefore, we will begin by reviewing the most basic ways to use rsync for backups.
Note that it seems like these blatant deficiencies caused rsync maintainers in the last few years to introduce new options which are supposed to help properly implement backup operation. At this moment though, I don't want to get into this; the following tips are based on rsync version 2.6.3 (protocol version 28, released in 2004) or higher.
First, we need to start and configure rsync server on a dedicated server. Here is the plan.
Server configuration (everything is executed as "root"):
- Decide whether you want to run rsync daemon through inet (most of servers are used this way) or as a stand-alone server (like apache for example). The advantages of either approach are outlines in this passage:
If you start off the rsync daemon through your inet daemon, then you incur much more overhead with each rsync call. You basically restart the rsync daemon for every connection your server machine gets! It's the same reasoning as starting Apache in standalone mode rather than through the inet daemon. It's quicker and more efficient to start rsync in standalone mode if you anticipate a lot of rsync traffic. Otherwise, for the occasional transfer follow the procedure to fire off rsync via the inet daemon. This way the rsync daemon, as small as it is, doesn't sit in memory if you only use it once a day or whatever. Your call.
- Let's assume for the following that you like myself are going to run rsync via inet. All modern Linux distributions have necessary hookup already done for you, and all you need to do is to open appropriate UI and enable rsync server. However, just in case, here is command-line configuration instructions from rsyncd.conf manual page:
When run via inetd you should add a line like this to /etc/services:
rsync 873/tcp
and a single line something like this to /etc/inetd.conf:
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon - Create config file /etc/rsyncd.conf like that (it is assumed that "/ext/BACKUP" is the root of your backup area):
log file = /ext/BACKUP/log/rsyncd.log pid file = /var/run/rsyncd.pid lock file = /var/run/rsync.lock secrets file = /etc/rsyncd.scrt auth users = rsync read only = no transfer logging = yes list = yes [MyModule] path = /ext/BACKUP/mymoduledir comment = Description of what this backup location is for
Create all necessary sub-directories of /ext/BACKUP (including "log") and change ownership of all "modules" directories like /ext/BACKUP/mymoduledir to 'nobody':#chown nobody /ext/BACKUP/mymoduledir
- Create password file /etc/rsyncd.scrt with one line only which looks like that:
rsync:password
('rsync' can be any 'user' name, see 'auth user' configuration option above).
Change access mode of this file to 0600 :
#chmod 0600 /etc/rsyncd.scrt - If external access (from outside of your local network) is required, you can change default port 873 to something else and/or open this port in your firewall or router.
Client configuration (executed by any user who has read access to files being backed up)
- Create file /etc/rsync.p (if acting as root) or ~/.rsync.p (if regular user) with just one line which is password from server configuration section. Change its mode to 0600.
- You can now start backup job by executing this command:
rsync -az path_1 ... path_N --password-file /etc/rsync.p rsync@server::MyModule
Where path_1 ... path_N are all directories you want to copy to MyModule backup location.
Note however that (a) rsync attaches a special meaning to paths which end with a slash, like '/home/user/', basically interpreting them as '/home/user/*'; this is most useful when backing up just one path, and (b) for every path in a list, special sub-directory will be created under /ext/BACKUP/mymoduledir, which corresponds to the last component of the path (after special treatment of end-slashes is taken into account, of course). It is your responsibility to make sure these do not overlap. - If you want to automate regular backups, create a cron job like that:
30 2 * * * /usr/bin/rsync options as above - As another example, under Cygwin you use command like this to back up all files in all drives while excluding some Windows directories:
rsync -az /cygdrive/c /cygdrive/d \
--exclude 'System Volume Information/' --exclude /c/WINNT/ \
--delete-excluded rsync@rsync.myhomeserver.com::laptop
Here is why this approach, while useful, need certain rework to provide true backup solution:
- There is no way to save to the server file owner information. All files are saved as created by "nobody" (you can change this default, but you cannot force server to save original file ownership data);
- If directory (in the original location) has restricted access, server might not be able to back up files within this directory (you can force server to relax permissions, but then you would have lost original permissions info);
- While detailed log file can be saved (to this end we suggested option 'transfer logging' above), there are no (good) tools to analyze it and report backup failures, especially for backups which are run in the background by cron.
service rsync { disable = no socket_type = stream wait = no user = root server = /usr/bin/rsync server_args = --daemon log_on_failure += HOST instances = 2 }(from here) and do "/etc/init.d/xinetd restart".