Note: if you want to encrypt your backup device, see the follow-up article: rsync backup to encrypted volume.
It's important to backup your important computer files to avoid losing them if your PC fails. But if you have gigabytes of data you want to keep safe, simply copying all of your files from one place to another will likely take hours.
Luckily Linux offers a command-line tool called rsync which only copies a file to a backup location if the file is new or has changed since the last backup. This selective copying usually means an rsync backup takes a fraction of the time taken by a simple "copy all" backup.
Seeing as you ought to backup your system frequently, it makes sense to create a backup script which runs customised rsync commands to backup all of the files you want to keep, but none of those you don't. This page is all about creating a Bash script which runs an rsync backup. I'm using Kubuntu 14.10, but the advice on this page should apply to users of other Linux distributions (though you may need to vary some steps, so check carefully for any incompatible commands, options, etc).
Important note: this page is intended to help people discover how useful rsync can be. You follow the advice on this page at your own risk, so make sure you understand the consequences of any commands or actions you intend to take. Read the man pages for any commands or flags with which you're not familiar.
The first step is deciding which directories need regular backup. Even though Linux keeps most user files in the /home/ directories, it's possible that you have valuable files stored elsewhere on the system. I have a file backup checklist which might help you to avoid forgetting anything.
Once you've decided which directories need to be part of the backup, it's time to decide which of their sub-directories do not need to be part of the backup. For example, in Ubuntu Linux when running the GNOME desktop, each home directory contains a hidden directory called ".gvfs" which caused me a lot of trouble when I first started using rsync (the directory seems to refer to itself, so that rsync gets stuck in an infinite loop of backing up the same files over and over again). You may also want to avoid backing up your deleted items, so the ".local/share/Trash" directory needs to be skipped. You might also have other directories which you don't need in your backup, so note these down.
A useful tool is the Disk Usage Analyser. This can scan a directory tree and graphically show you how much disk space each directory uses. This sort of analysis can turn up some surprises, such as a log file which has grown to a massive size.
Finally, to reduce the amount of backup time and storage needed, it makes a lot of sense to go through your selected directories and see whether there are old or duplicated files which can be deleted. A useful tool for finding duplicate files is FSlint which can search a target directory and report on either duplicate files or empty directories, both of which clutter up your system. Just make sure you carefully read the instructions for FSlint before you use its delete or merge options because they can radically change your filesystem.
A backup should survive the failure of your main hard disk, so it makes little sense to save your backup to a path on that disk. Which means that you're likely to be using an external hard disk, a network-attached storage device on your own network, or a storage server on a remote network.
Before you can write to a hard disk you need to have a suitably large target partition on the external disk. The easiest way to do this is to connect the external drive (which Ubuntu will mount automatically for you) and then run the graphical tool GParted (which ought to be installed by default in Ubuntu and can be found in the main menu under System / Administration). With your USB hard drive connected you can see the details of any existing partitions (and make certain you are looking at your USB-connected hard disk and not one of your main system hard disks). Then if necessary you can tell GParted to delete partitions, resize partitions, or create new partitions in unallocated space.
For your backup volume, it makes sense to create a partition which uses the same file system as the volume which hosts your personal files. For most Ubuntu users their home directory will be held on a partition which is formatted using the ext3 or ext4 partition (and you can check this using GParted), so an ext3 or ext4 partition is probably the best choice for your backup volume. You also need to make the backup volume partition at least large enough to contain all of the files you intend to backup, but it will usually be wise to make the backup partition at least two or three times bigger than currently necessary, as you're very likely to need more space as time goes on. If you want to keep historical backup copies (for example, one for each year) then you'll need even more space.
Note that if you need to be able to read the partition from a Windows operating system, you'll need to use either the NTFS or FAT32 format because Windows has no time for non-Microsoft formats. The problem with using one of these formats is that they won't be able to correctly store your Linux file structure, and information such as symlinks and owner and group permissions will be discarded when writing to an NTFS or FAT32 partition. The data in your files ought to remain intact, but the loss of metadata can mean that restoring from backup results in a very different data landscape to what you had originally, and you'll have to spend a lot of time redefining permissions on your restored files.
Note: many newer versions of Linux (including Kubuntu 14.10) will mount an external hard drive partition to a consistent path based on the volume label given to the partition. For example, if the partition has the volume label "Buffalo_backup" then when user bob attaches the drive and mounts that partition, Kubuntu will mount it at path /media/bob/Buffalo_backup and this path should be the same each time. If you find this is the case in your Linux desktop then you can skip this section and leave you /etc/fstab file untouched.
Sometimes, however, Linux will allocate a different device name each time you connect a USB hard drive. For instance the drive might appear as /dev/sdd5 one time and then /dev/sde4 another time. Because this will be a nuisance from the point of view of a backup script, we need to add the UUID of our USB backup target volume to the /etc/fstab file so that we can mount the drive using the same command each time. The UUID of the backup partition can be found in GParted by right-clicking on the target partition and selecting "Information". Next you need to decide where you want this backup target mounted, such as /media/Buffalo_backup for instance, and also what mount options should be used. Add all this information to the /etc/fstab file in a line which will look something like this:
# The Buffalo DriveStation backup partition
UUID=5fc5ac7d-085e-4f3f-b5b2-6c7ee32b3d9c /media/Buffalo_backup ext4
noauto,group,relatime,journal_checksum,auto_da_alloc
(Note that the second and third lines should actually all be one single line, but they won't fit on this webpage as one line.)
The group
mount option is a little strange, as it tells Linux to permit mounting of this partition to any user who is a member of the same group as the special device which represents this partition, such as /dev/sdd5. But USB devices don't always get the same special device path, and the owner and group values for special devices seem to get reset to "root" and "disk" regularly anyway. So if you want to limit mounting of your backup partition to a particular set of users, you can use the group
mount option and then make those users members of the "disk" group. (I'd much rather see an explicit "group=somegroup" mount option, but this method will have to do for now.)
Also note the noauto
mount option. This tells Linux not to automatically mount the drive (on bootup for example). However, Ubuntu still seems to attempt to mount the partition if you connect the USB hard drive after bootup, which is annoying as it generates a permissions error if you're using the group
mount option. I don't know how to force Ubuntu (or GNOME) to obey the noauto option.
The mount path (/media/Buffalo_backup in our example) should be an empty directory and it should already exist, so make sure to create the path using mkdir if it does not already exist. Also, the mount point path must not contain spaces as this might cause trouble later on.
Now users who belong to group "disk" ought to be able to mount the backup partition on the USB hard drive by simply typing mount /media/Buffalo_backup into a terminal (while the USB drive is connected and powered on, obviously). This makes it easy to mount the drive from within a script, as the partition can now be mounted with the same command every time.
If, for example, you have a network-attached storage drive on your network at 192.168.178.250 which offers a Samba share called "share" available to Samba user "sam" and you want the mountpoint to be /media/Maxtor_backup and marked as being owned by Linux user account "bob" and Linux group "bob", then you can do this in one easy command:
sudo mount -t smbfs -o username=sam,uid=bob,gid=bob \
//192.168.178.250/share /media/Maxtor_backup
(Note that a backslash followed immediately by a single newline is ignored by Bash, so they are used on this webpage to avoid scrollbars where long commands appear. But you don't have to have these line breaks in your own script.)
If the mount point path (/media/Maxtor_backup in this example) does not already exist, you'll need to create it with mkdir first. Your mount point should be an empty directory (to avoid confusion, otherwise the files in that directory will be temporarily unreachable while the backup volume is mounted on the same path). Also, the mount point path must not contain spaces because this may lead to trouble later.
If the connection is made successfully, you will be prompted to enter the password for the Samba user account "sam". (Note that you'll first be asked for your Linux password if sudo
has not already been used in the last few minutes.)
If your target Samba server supports CIFS Unix extensions then you may not need to specify the uid and gid arguments. But if you do specify these arguments, or if the target server does not support the CIFS Unix extensions, be aware that the owner and group information for your files will probably not be applied to the backup copies. This is unlikely to affect the actual content of the backup files, but it will mean that if you need to restore your files from this Samba share then every file will be owned by the owner and group specified in the uid and gid arguments. The mode information will also be discarded, and symlinks will be skipped or throw a warning. This is likely to mean a lot of permissions restructuring on your restored filesystem to get it to resemble your original filesystem.
If you're connecting to a remote server (that is, you're sending your backup to a machine which is not on your local network) then you don't need to mount your backup volume, because you simply tell the rsync command to connect to a remote path. However, this is an advanced topic and is not covered by this page. See the rsync and rsyncd.conf man pages for more information about sending a backup to a remote server.
(If you are sending your backup to a remote machine, be aware that the rsyncd.conf man page recommends using SSH to encrypt the transfer, because using the rsync daemon protocol currently offers no encryption of the data in transit.)
Suppose we've identified the /var/www/ directory tree (that is, everything in /var/www/ and its sub-directories and so on) as being in need of regular backup. And that we've identified that the /var/www/.Trash-1000/ directory does not need to be part of the backup. And that we don't want any hidden directories to be included in the backup. Then if our backup volume is an ext (Linux) filesystem mounted at /media/Buffalo_backup/ we might use the following call to rsync:
sudo rsync --archive --hard-links \
--verbose --human-readable --itemize-changes --progress \
--delete --delete-excluded --exclude='/.Trash-1000/' --exclude='/.*/' \
/var/www/ /media/Buffalo_backup/var/www/
(Again, note that a backslash followed immediately by a single newline is ignored by Bash, and you don't have to have these line breaks in your own script, it just makes this webpage look tidier.)
This command basically says copy new and changed files from the /var/www/ directory tree to the /media/Buffalo_backup/var/www/ directory tree. It's very important to end both of these paths with a forward slash, otherwise the behaviour of rsync changes and the wrong set of files will be backed up to the wrong destination directory. Also note that rsync will not create the backup directory for you, and will throw an error if the directory does not already exist. So before executing the rsync command you may want to call:
mkdir --parents /media/Buffalo_backup/var/www/
The behaviour of rsync can be modified using its many different flags (or options). The flags I've used are:
--archive
--recursive
, --links
, --perms
, --times
, --group
, --owner
, and --devices --specials
(which are all described below). Do not use --archive
if your backup volume does not support Linux permissions, owner/group information, symlinks, devices and special files, because permission and owner information will be discarded and you'll see warnings generated for every symlink, device or special file encountered. Instead just specify the flags that do apply to your backup volume, such as --recursive
and --times
.
--recursive
--links
--copy-links
flag. See the rsync man page for the details, and also read about the --copy-unsafe-links
and --safe-links
flags which may be of interest.--perms
--times
--modify-window
flag.--group
and --owner
--devices --specials
--hard-links
--verbose
--human-readable
--itemize-changes
--progress
--delete
--delete-excluded
--exclude
flag.--exclude
'/.*/'
which means that every hidden directory (whose name begins with a dot) will be skipped by rsync. (Warning: I cannot recommend excluding all hidden directories, because sometimes very important files are kept in hidden directories. For example, a Git repository stores its commit data in a .git directory, and forgetting to backup your Git repository data will hurt. It's much better to exclude specific hidden directories which you are sure you don't want.)This is just a small fraction of the total number of flags that rsync offers to modify the way it behaves. Make sure to see the man page for rsync to find out whether any of the other flags are more suitable to your requirements (and to understand better the flags listed on this page).
Given that rsync has so many options, once you've crafted the rsync command that does what you need it's a good idea to store it in a script so that you can easily call it again in future. The example script below wraps up the following actions:
Here is the script in full:
#!/bin/bash
# Script to backup personal files to the external USB drive.
# Specify the mount point here (DO NOT end mount_point with a forward-slash).
mount_point='/media/Buffalo_backup'
echo "#####"
echo ""
# Check whether target volume is mounted, and mount it if not.
if ! mountpoint -q ${mount_point}/; then
echo "Mounting the external USB drive."
echo "Mountpoint is ${mount_point}"
if ! mount ${mount_point}; then
echo "An error code was returned by mount command!"
exit 5
else echo "Mounted successfully.";
fi
else echo "${mount_point} is already mounted.";
fi
# Target volume **must** be mounted by this point. If not, die screaming.
if ! mountpoint -q ${mount_point}/; then
echo "Mounting failed! Cannot run backup without backup volume!"
exit 1
fi
echo "Preparing to transfer differences using rsync."
# Use the year to create a new backup directory each year.
current_year=`date +%Y`
# Now construct the backup path, specifying the mount point followed by the path
# to our backup directory, finishing with the current year.
# (DO NOT end backup_path with a forward-slash.)
backup_path=${mount_point}'/rsync-backup/'${current_year}
echo "Backup storage directory path is ${backup_path}"
echo "Starting backup of /home/bob . . . "
# Create the target directory path if it does not already exist.
mkdir --parents ${backup_path}/home/bob/
# Use rsync to do the backup, and pipe output to tee command (so it gets saved
# to file AND output to screen).
# Note that the 2>&1 part simply instructs errors to be sent to standard output
# so that we see them in our output file.
sudo rsync --archive --verbose --human-readable --itemize-changes --progress \
--delete --delete-excluded \
--exclude='/.gvfs/' --exclude='/Examples/' --exclude='/.local/share/Trash/' \
--exclude='/.thumbnails/' --exclude='/transient-items/' \
/home/bob/ ${backup_path}/home/bob/ 2>&1 | tee /home/bob/rsync-output.txt
echo "Starting backup of /var/www . . . "
mkdir --parents ${backup_path}/var/www/
# This time use the -a flag with the tee command, so that it appends to the end
# of the rsync-output.txt file rather than start a new file from scratch.
sudo rsync --archive --verbose --human-readable --itemize-changes --progress \
--delete --delete-excluded \
--exclude='/.Trash-1000/' \
/var/www/ ${backup_path}/var/www/ 2>&1 | tee -a /home/bob/rsync-output.txt
# Ask user whether target volume should be unmounted.
echo -n "Do you want to unmount ${mount_point} (no)"
read -p ": " unmount_answer
unmount_answer=${unmount_answer,,} # make lowercase
if [ "$unmount_answer" == "y" ] || [ "$unmount_answer" == "yes" ]; then
if ! umount ${mount_point}; then
echo "An error code was returned by umount command!"
exit 5
else echo "Dismounted successfully.";
fi
else echo "Volume remains mounted.";
fi
echo ""
echo "####"
To modify this Bash script to suit you, first you need to change the value of the mount_point variable so that it matches the path at which you'll mount your backup volume. For an external USB drive this should match whatever mount path you've specified for the backup volume in your /etc/fstab file, as described earlier on this page, or whatever mount path Linux consistently uses when mounting the backup volume. For a Samba network share this needs to be whatever mount path you intend to use for the mount command.
If your backup volume is on an external USB drive then the simple mount ${mount_point}
line can stay as it is. However, if your backup is to be written to a local Samba network share then you need to replace this mount command with something similar to the command suggested in the section about Samba earlier on this page, but make sure to use ${mount_point} in the command at the point where the mount path goes.
In the above script I've created a variable called current_year and then used it and mount_point to create a final variable called backup_path which is the actual path to which the backup files will be written. By using the current year, calculated using the date command, you can automatically create a new backup each year. This is a good idea, just in case your files ever become damaged or corrupted and you don't notice until after running your backup script. Then you can at least refer back to a previous year's backup of the file. In fact, you could do this monthly rather than yearly if you need to be paranoid. But bear in mind that every time you start a fresh backup, rsync will have to copy everything (because it's starting from scratch in a new, empty backup directory and won't be able to simply copy new and changed files since the last backup took place). This not only takes more time, it also takes up much more disk space.
Modify the value of the backup_path variable as you see fit, but make sure that it doesn't end with a slash, as this is added as necessary later in the script. Also make sure that it cannot contain spaces because spaces in the path, even if escaped with backslashes, may lead to the script calling rsync commands on partial paths (reading up to where the space is encountered) which is really not what you want. So check that the current_year and mount_point cannot contain spaces, and that any string literals you place around these variables to form the value of backup_path are free of spaces.
The mkdir
command is called before each rsync command, to make sure that the directory structure needed to store each backup set definitely exists, as rsync will fail with an error if this isn't the case. Change the path after ${backup_path}
from /var/www/
or /home/bob/
to match whichever path the corresponding rsync command will backup.
Next comes the rsync command, so replace this with your own carefully crafted rsync command. If you want to log the entire output of the rsync run, make sure to add 2>&1
to the end of the rsync command, and then pipe the output to the tee
command. Give the tee command the path to a log file, and use the -a
flag if you want tee to append to an existing log file rather than start a new one. Now all rsync output, including errors, will be output to the terminal and also written to the specified log file.
The above script contains two rsync commands, but your script can contain any number you like. Just bear in mind that you need to call the rsync command with sudo because the flags such as --perms
, --owner
and --devices
require the command be run as super-user. This means that if one rsync command takes a long time to complete, you'll probably have to enter your password again by the time the script runs the next rsync command. This will become a nuisance if you have many rsync commands in one script, as the script may pause and wait for the password every time it reaches a new "sudo rsync" line. One way around this might be to run the script itself using the sudo command, which ought to mean you only need to enter your password once, though giving the entire script super-user privilege may introduce security risks so be wary if you take this route.
At the end of the script, the user is asked whether they want to unmount the backup volume. If the user enters y or yes then the backup volume will be unmounted. Any other input will leave the backup volume mounted.
rsync relies on file timestamps to quickly work out whether or not a file has changed since the last backup. This usually works fine, but if you have some directory or file whose timestamp does not get updated even when its content changes then rsync will very likely fail to copy recent changes, leaving your backup copy further and further out of date.
As an example, I use VeraCrypt to manage a tiny encrypted file in my home directory, to hold sensitive data. But by default VeraCrypt does not update the last-modified timestamp of the encrypted file (presumably to increase plausible deniability in countries with authoritarian regimes). Because the timestamp was not getting updated, I almost lost several months worth of critical changes to text documents because rsync presumed that the encrypted file was exactly the same as it had been the last time I ran a backup. To disable this feature of VeraCrypt: go to "Settings", then the "Security" tab, and then untick (make empty) the box "Preserve modification timestamp of file containers". But also check that this has the desired effect, because there have been problems reported even with this box unticked.
Also bear this in mind with any other specialised file or directory whose last-modification timestamp may not be updated even when its content has changed.
If the timestamps are simply not going to work for your scenario, note that rsync also offers a --checksum
option which tells rsync to use a 128-bit checksum, instead of the timestamp, to work out whether the file content has changed. The checksum mode will be considerably slower, and will involve considerably more IO (disk read) activity, so try to limit checksum mode to specific directories. See the follow-up article for an example of this targeted checksum approach.
Once you've run your rsync script and produced a backup of your files, they will be stored in a readily usable form on the backup volume. This means that restoring from backup is as simple as just copying everything from the most recent backup directory to your new working system disk. You can use your preferred method of copying everything from one place to another, but using the cp
command which comes with Bash is probably the easiest method. Whichever method you use, don't forget about hidden files, which are not always copied automatically by command line tools, and are not always visible by default when using graphic user interface tools.
cp
commandUsing a Bash console, the following command should copy all (including hidden) directories and files from the specified directory on the backup volume to the appropriate directory on the system disk.
cp --archive /media/Buffalo_backup/home/bob/. /home/bob/
Customise this command to suit your own file paths and read the man page for cp
to check which flags suit you. (The man page for cp
is not exactly rich with information, but read through it anyway and do some hunting around online for more information if you're not sure whether the --archive
flag is what you need.)
The dot at the end of the source path is important: without the dot you will find that everything has been copied into a new directory on your system disk with the path /home/bob/bob
which is very probably not what you want. Without the dot you'll probably also find that hidden files and directories don't get copied.
The --archive
option tells the cp
command to recurse into all sub-directories, and to preserve links and all file attributes. I have learnt the hard way that failing to preserve the timestamps will mean that the copies placed onto your system disk will all be set with the current date and time. Which might not bother you until you next run your rsync backup script and realise that every file on your system disk now appears to be newer than the backup copy, forcing rsync to copy every single file to the backup volume (and setting the timestamps of the files on the backup volume to the newer date). So even if you don't use the --archive
flag, consider at least using the --preserve
flag with a value of timestamps
.
Once you've restored from backup, it's probably a good idea to rename the backup directory on the backup volume from "2014" to something like "2014 pre-rebuild". This is just so that if your new filesystem contains problems, your next backup won't overwrite the previous, trusted backup.
Also remember that if your backup volume does not support symlinks, file permissions, owner and group information, etc, you will need to manually reconstruct this structural information on your new filesystem once you copy the files from your backup.
If you want to encrypt the backup device/volume, then see the follow-up article rsync backup to encrypted volume.