Unison is a piece
of software that allows you to keep your files synchronized and backed
up across different computers. This guide describes the advantages of
keeping your files synchronized and provides a tutorial for setting up
Unison.
Introduction
Unison, a free
cross-platform file synchronization program, can not only provide you
with multiple backups of your files, but more importantly, grant you the
freedom to simultaneously use different computers with access to all of
your files, thus liberating you from the confines of one particular
machine. Unison allows you to access the same set of files from any
computer (running Mac OS X, Windows XP, or UNIX/Linux variants) and
keeps these files up-to-date by always maintaining the most
recently-modified version of each file during synchronization. I
personally use Unison to keep replicas of all of my personal files
across several different computers. Because my documents and
configuration files are accessible from every computer, I am free to use
whichever one is the most convenient at the moment without the hassle of
transferring files using floppy disks, USB drives, or email.
For example, if I am in the office on a Linux machine and want to work
on a paper for a class, I can just open up the file and start typing.
Before I leave, I simply synchronize that directory to my server. Then
when I get home, I run synchronize on my Mac and continue working on my
paper. If I feel like watching TV later while continuing to work, I can
simply switch to using my Windows laptop. Then I can finish up final
edits at the office the next morning on my Linux box. By the time I am
done for the night, not only have I edited the same paper on three
different computers without the hassle of emailing copies to myself, but
I have three identical copies of it so that if any one of my computers
blows up, I can still turn in my paper on time. Unison has allowed me
to have the peace of mind that comes with having my files seamlessly
backed-up while I am working on them and also the freedom of being able
to do my work wherever and whenever is convenient.
This document describes some of the benefits of using Unison and
provides some tips on doing so, although it is not meant to be a
comprehensive how-to guide or user
manual. Please email me (contact info) if you have any
questions, comments, or suggestions.
The Benefits of Unison
Liberation from a particular machine or operating system: This
is perhaps the most practical and visible advantage of using Unison in
your daily computing life. If you can have access to your files from
any machine that you use (and assuming that you have programs on each
machine that can utilize these files), then it really doesn't matter
which one you use. Furthermore, if you can put custom configuration
files for your shell or applications in your Unison hierarchy and
simply use symlinks to refer to these copies on every computer that
you use, then you can have a uniform working environment. For
example, I use the BASH shell on every computer, whether it be Windows
XP with Cygwin, Mac OS X, FreeBSD, or whatever Linux is in front of me
at the moment. I have a common BASH configuration file shared by all
machines, and files particular to each machine. My prompt looks the
same on all machines, and I can use all of the same aliases and shell
functions. When I find a cool BASH function when browsing the
Internet at work, I can simply add it to my common BASH config file,
sync. it, and when I get home at night, I can access that same
function on my home machine. This freedom allows you to transcend the
incessant bickering over which operating system is better: you can use
whatever OS has the programs you want for some particular application,
or simply whatever OS is in front of you at the moment.
Live backups via file replication: Your personal data
(documents, photographs, emails, etc...) is the most valuable
component in your interaction with computers, because it can be
irreplaceable if lost. As I have mentioned in my Backup Quick Tips mini-article, data
backup is something that everybody should do, but unfortunately, few
people do it on a regular basis. In contrast to traditional backup
methods, the great benefit of using Unison to replicate your files
across different computers is that your backups are alive. They are
not sitting on some archive tape in the basement; they are on the hard
drives of each and every computer you use.
Seamless control and verification of backups: By synchronizing
your Unison file replicas, you are the one who controls your backups
so that you can be confident that they are being performed correctly.
You verify the integrity of your backups simply by switching computers
and accessing the files during your normal course of work. One
problem with backups, dubbed as "Backup
Trauma" in this humorous video clip, is that they may not be
performed correctly. Backup trauma occurs when you think that your
organization is properly backing up your files, when in fact they are
not. You never consider backing up your own files because you know
that your company takes care of that. If you lose a file, you don't
sweat because you know that the sysadmins have a backup, but to your
surprise, their backup wasn't done properly ... that's when backup
trauma strikes. With Unison, though, you control your own backups,
and the more replicas you have, the less likely that you will lose
your data.
Fast, non-traumatic recovery from hardware failures: A hard
drive crash or total computer meltdown is traumatic for most people.
Why? Not because they need to pay a few hundred dollars for new
hardware, but because they have just lost most or all of their
precious data. If they are somewhat diligent about backups, they
probably have some old backup CD's from a few months ago, but that's
still a few months of lost work. With Unison, you back up basically
as often as you use your computer, so you will at worst lose only the
data that you have immediately been working on for the past few hours.
If one of your machines dies, then it is annoying to pay to buy new
hardware and install your OS and software again (which is trivial if
you have an OS with automated package management software such as
Fink, RPM Manager, or Apt-Get), but it is non-traumatic because you
have not lost any data. If you replicated the configuration files for
your favorite applications, then restoring their pre-crash state is as
easy as re-installing and moving those files back to the correct
places. Unison allows data to transcend hardware; after all, hardware
is cheap and plentiful, but your data is irreplaceable.
Who Should Use Unison
I am not going to preach that everybody in the world should use Unison.
I think that everybody should back-up their data regularly, but Unison
is overkill for simply backing up data. However, if you use more than
one computer on a regular basis, then you can probably gain benefits
from Unison. Here are some typical configurations for different types
of users:
Casual home user with no access to a server: If you are a
typical home user who has a laptop and desktop computer but no access
to a file server, then you probably use a removable USB or hard drive
to shuttle files back and forth between your computers. With Unison,
you can still use that method of transferring data, except that you
can be confident that all of your computers will always have
up-to-date copies of files (as long as you remember to synchronize).
For example, you can do some work on your laptop, synchronize with the
removable drive, move the drive to the desktop computer, synchronize
again before you start working, and have both computers (as well as
the removable drive) contain the most recent versions of all files,
regardless of which computer you used to edit them.
University student: If you are a student at a modern
university, you probably have a certain amount of storage space on the
university servers as well as ssh remote login access, which
is enough to run Unison. You should definitely take advantage of this
space because it is probably well-maintained and regularly backed-up.
After all, your tuition is helping to pay the salaries of people who
are in charge of protecting your data. You can synchronize your
various machines against the school's servers.
Personal server administrator: The ideal way to run Unison is
if you can set up your own personal server with ssh login
capabilities (This is possible with any flavor of UNIX or Linux, Mac
OS X, and Windows XP with Cygwin). My suggestion is to dedicate one
computer as your 'Unison server' which holds all of your relevant data
and synchronize all of your computers to that server. This is the
setup that my colleagues and I use.
Security Concerns
When I first tell people about the benefits of keeping multiple replicas
of their personal data on different machines, preferably at different
physical locations, one recurring concern is security: If I place my
data on the university server, wouldn't people have access to it? If I
start my own server and run Unison via ssh, wouldn't
anybody on the Internet technically be able to connect and see my files?
These are all valid concerns, but think about the following: Would you
rather risk losing your data or having somebody else access it? The
safest way to secure your data is to have one computer with all of your
files on it and never connect it to the Internet, but that's obviously
unreasonable. It is true that the more places your data resides, the
more vulnerable it is to third-party snoopers. However, if you are
careful with choosing strong passwords, using secure tools like
ssh, and storing your data on reliable servers, then you
should be fine.
Furthermore, how much do other people really care about your data? How
sensitive is your data anyways? If it's just old lab reports and
vacation photos, then it's no big embarrassment if someone gets a hold
of those files. However, if you have truly sensitive information, then
either do not store them online or encrypt them first using a program
such as ccrypt before uploading them. The deal with online
security (like real-life personal security) is that if somebody is
really out to get you, then they will find a way to get you. The best
way to defend yourself under normal circumstances is to take some simple
precautions to prevent yourself from being the target of random
undirected attacks, and to not make enemies who would want to steal your
data.
Level 1: Basic Unison setup
Here are some general tips for setting up Unison. This is not meant to
be a comprehensive guide, and is merely a supplement to the official
Unison manual.
1. Organizing your files:
Even before you install Unison, you need to first organize all of the
files you want to synchronize in your replicas. Before you run Unison
for the first time on your data, it is important that all of your files
and folders are named and organized the way that you want. This is
because Unison does not know when things are renamed. If
memo.txt is renamed to memo-pad.txt, then Unison
thinks that the file memo.txt has been deleted and a new file
memo-pad.txt was created. Of course, you can re-name files and
directories all you want, but Unison will simply think that you deleted
and created identical new versions, which could get annoying.
I suggest that all of your files be organized in sub-directories under
one main directory, which will be the root directory for your
synchronization.
2. Determining your roots:
You need to now figure out which computers and hard drives you want to
use to house the replicas of your files (these locations are called
roots), and how they are going to communicate with one another (either
locally or remotely via ssh). I recommend a star topology
where one server (if possible) with a constant Internet connection is
the central root, and all other computers synchronize with it remotely
via ssh. This effectively turns the Unison peer-to-peer system
into a client-server system. If you don't have access to a server, then
you can use a removable hard drive as your central root and move it to
different computers when you want to synchronize the files.
3. Installing Unison:
There are several versions of Unison available for download, and it is
very important that you install the SAME version on every computer you
want to use with Unison. This is because different versions are not
compatible at all. I personally use Unison 2.13.16 because it is fairly
stable and readily available for binary install in package management
software on many platforms. For Windows XP, I run Unison under Cygwin and use the Cygwin Setup tool to
select and install the Unison 2.13.16 binary. For Mac OS X, I use Fink.
For Debian-based Linux distros, I use apt-get, and the same goes for
other operating systems with package management software. You should
only download the Unison source and compile it as a last resort.
4. Performing the initial copies:
Before you run Unison for the first time to synchronize files between
two locations (either two computers connected through a network or a
computer and a removable hard drive), you should first copy all of the
files (which should all be located under some root directory) from one
location to the other one. (You can use scp to securely copy
files over a network.) This step ensures that both replicas will start
out identical (except possibly for permissions bits, which don't exist
in Windows, but Cygwin tries to simulate them nonetheless.)
5. Setting up your Unison profile:
On the computer where you are invoking Unison, it looks for a profile
located in the ~/.unison/ directory to know which two locations
(called roots) to synchronize and which options to invoke Unison with.
I have one common profile named common.prf with properties that
are shared by all profiles:
# Helps out a lot on Windows
fastcheck = true
# place new files at the top of the list
sortnewfirst = true
# turn on ssh compression
rshargs = -C
ignore = Name Thumbs.db
ignore = Name *~
ignore = Name *.tmp
Among other things, this common profile tells Unison to perform a fast
file name/date check for Windows (which greatly speeds up performance)
and ignore certain temporary and useless files. Every profile I set up
will include this one. Here is an example of a simple profile named
simple.prf:
include common
root = /home/pgbovine/my-unison-root
root = ssh://pgbovine@some.fileserver.com//home/pgbovine/my-unison-root
When I invoke Unison with the command unison simple, it looks
in the ~/.unison/ directory for a profile named
simple.prf and virtually appends all of the options listed in
that profile onto the command-line when invoking Unison. The two
root parameters are very important. These are the two
locations that Unison is trying to synchronize between. In this case,
the first root is the my-unison-root directory on my computer
(where all of my data is stored), and the second root is the same
directory on a remote file server (which must also be able to run Unison
2.13.16).
6. Running Unison for the first time:
Ok, now that you have made your initial copies and set-up a basic
profile which tells Unison which two locations (roots) to synchronize,
you are ready to run Unison for the first time. You can invoke Unison
by typing unison simple, and it will use the options in
simple.prf (and common.prf because of the 'include
common' statement). During this first run, Unison will take quite a
long time because it traverses through all files and builds up auxiliary
metadata about each one of them (stored in a file in the
~/.unison/ directory). After it is done, it will ask you
questions when there are conflicts between files. Press ? to
see the choices that you have when Unison asks you a question.
However, no files should be different during this initial run because
you have just made a fresh identical copy across the two roots. Most
likely, you will encounter conflicts in the permissions bits, because
different operating systems have different defaults (called the 'umask'
in UNIX environments). Assuming that your first root is where you
copied the files from, you will want to settle all conflicts by
propagating changes from left-to-right. (In my experience, the default
setting on Mac OS X is that you cannot change permissions on certain
files, and Unison will thus fail when trying to propagate permissions.
Use the 'chflags nouchg' command to change this so that you can set file
permissions.)
If you don't care about permissions bits, use the -perms 0
option to not synchronize any permissions bits. I always find these
bits irritating, so I ignore them whenever possible. For example, when
I synchronize between my Mac and its VFAT-formatted removable hard
drive, I use -perms 0, fastcheck true, and
pretendwin true to ignore permissions and perform fast checking
even on the relatively slow & primitive VFAT filesystem.
If you are tired of repeatedly telling Unison to propagate all changes
from left-to-right, then press q to quit. You should then
invoke Unison with the -auto and -force options, which
forces it to always synchronize in favor of the root that you specify so
that you don't need to mash buttons a zillion times to confirm your
selections. In this example, you should use:
to perform a sync where the first root always wins. After Unison
finishes propagating all changes, those two roots have now been
initialized. When you run Unison again on those two roots, it should go
much faster because the metadata has already been stored. You need to
repeat this process with every pair of roots that you want to
synchronize. If possible, I suggest that you adopt a star topology and
synchronize all roots against a central server root, which minimizes the
number of pair-wise synchronizations required.
7. Using Unison:
Now that you've got Unison set-up, all you need to do is remember to
type unison your-profile-name at the shell every time that you
want to synchronize all of your files. If you name your profile
default.prf, then you can simply type unison without a
profile name. Remember to synchronize every time right after you login
to a machine and right before you logout. Unison is only effective if
you use it :) Try editing files, adding new files/directories, deleting
files/directories to get accustomed to how Unison works. That's
basically all there is to it. Run synchronize, tell Unison which way to
propagate files if there are conflicts, and answer other questions if
necessary.
Level 2: Making Unison part of your daily life
Now that you have Unison set-up and running, here are some more tips for
enhancing your experience.
Setting up uniform configuration files:
I have the same BASH configuration scripts on every computer that I use
(this can easily extend to configuration files for other applications)
because I keep them all in my Unison repository. I also keep all of my
Unison profiles in the repository. For example, all of your various
BASH configuration scripts for different machines can be located in
my-unison-root/Documents/config/. Then, you can simply make
symlinks from your home directory on every machine to the appropriate
files within my-unison-root/Documents/config/. Therefore, your
configuration files are backed-up and your directory structure
preserved, but you can still access the files from any machine via
symlinks. This provides some platform-independence, because you can
have the same configuration files for your favorite applications
regardless of what computer you use.
Setting the UNISONLOCALHOSTNAME environment variable:
Unison uniquely identifies hosts by the hostname environment variable
set by the operating system. Sometimes this hostname can change without
your knowledge. Also, when you login remotely to university computers
that all mount a network file system, you may login to a different
computer every time and hence have different hostnames. Unison will not
be able to recognize that a certain root has already been synchronized
if the hostname changes. Therefore, just to be safe, you should always
set a unique UNISONLOCALHOSTNAME environment variable for every computer
in the shell initialization file (e.g., .bashrc or one of your own
custom files sourced by .bashrc). Here is an example of such a line to
do so in BASH: export UNISONLOCALHOSTNAME=little-mac
This makes sure that Unison will always recognize my Mac by the name
'little-mac', regardless of whether its real hostname changes.
Use 'scp' to copy large amounts of data remotely before running Unison:
Unison sometimes stalls or times out if it has to copy large amounts of
data across the Internet, so if you know that you have added a huge
directory to one root, use the scp tool to securely copy it
remotely before running Unison. Then Unison will run a lot faster
because it will think that those files are up-to-date and do not need to
be propagated in either direction. I have heard (but never tried
myself) that running Unison with the -debug XXX option (see
User Manual for what to fill in for XXX) will prevent the timeouts on
large transfers.
Synchronizing only parts of your directory structure:
I have Unison running on several computers and remote storage accounts,
but not all of them have enough space to store all of my files. Thus, I
have configured my Unison profile to only synchronize a subset of paths
from my Unison root directory to those computers, and to ignore certain
types of files. This can be accomplished using the -path and
ignore options. Here is an example profile that demonstrates
this:
# Paths to synchronize
path = Documents
path = Pictures/Old pictures
# Ignore all video files
ignore = Name *.avi
ignore = Name *.AVI
ignore = Path Documents/Videos
Between a certain pair of roots (unspecified here), I only want to
synchronize the Documents and Pictures/Old pictures
directories (and all of their sub-directories), and ignore all files
that end with an .avi extension as well as files in
Documents/Videos.
uwd() - The very useful Unison Working Directory function:
In order to speed up Unison's performance, it is useful to direct it to
only synchronize the current directory (and all of its sub-directories).
This is especially useful if your roots contain many files and
directories. For example, if I modify 1 file in some documents folder,
I don't want to wait for Unison to traverse through all of my pictures
folders before concluding the obvious fact that they haven't changed at
all. I want to be able to run Unison only on the current folder. This
can be accomplished with a -path option when invoking Unison.
Thus, if I know that only the files in a folder named
Documents/current-docs relative to the location of the root
have changed, then I can invoke Unison using:
unison profile-name -path Documents/current-docs
inserting the appropriate profile for profile-name. This will
only run Unison on the Documents/current-docs directory, which
is exactly what you want ... unless (here is a really subtle but
important point) the profile-name profile contains
-path entries of its own. Remember that Unison takes all
entries in the profile and appends them onto the command-line. Thus, if
profile-name looked like the following:
which does NOT do the correct thing because it synchronizes everything
in the Documents directory. One work-around is to only invoke
the explicit -path option on profiles without explicit
path entries.
Instead of explicitly typing in your current directory relative to the
Unison root in order to synchronize it, you can instead write a shell
function that calls pwd -P to get the name of the current
directory (the -P gets the true path without symlinks), check
if it's a sub-directory of your root, and if so, subtract the name of
the root to get the proper path to pass into the Unison -path
option. Here is an example of such a function, which I call uwd(), for
Unison Working Directory, that you can incorporate into your BASH
configuration file:
# unison the current working directory only
function uwd () {
p=`pwd -P`
p=${p#*$HOME/my-unison-root/}
p=${p#*$HOME/my-unison-root}
if [ -d "$HOME/my-unison-root/$p" ] ; then
unison profile-name-for-uwd -path "$p"
else
echo "$p is not a directory in a Unison hierarchy on this machine"
fi
}
Unfortunately, there is no one uwd() script that will work on every
computer (because your directory structure will differ), but the general
idea is evident in my script. You can use it as the basis for
developing your own scripts. Remember that
profile-name-for-uwd MUST NOT contain any path
entries, or uwd() may not work properly.
Now that you have uwd() properly set-up in your BASH configuration file,
you can go in any directory that is a sub-directory of the root, and
simply type in uwd to synchronize it and all of its
sub-directories. I find that uwd() is a very useful function for
speeding up Unison because when I only know that I have modified files
in a certain directory, there is no point in synchronizing the entire
root.
Level 3: Taking Unison to the Extreme
If you have made it this far, then you're probably ready for some more
advanced ideas which I have not yet implemented myself:
Automating Unison with cron jobs:
The one overhead of working with Unison is that you need to remember to
run it every time you login and logout of a computer (or whenever you
want to update or backup your files). If you forget to run it sometime,
then some of your files may be out-of-sync, which can be annoying. If
you have machines which you always leave on (which I don't), then you
can schedule nightly cron jobs to run Unison and automatically
synchronize all of your roots. If you do so, you will need to run it
with the -batch option so that it does not ask any questions at
all, and also do some ssh or other authentication configuration
so that you don't need to manually type in your password every time. If
you can manage to set this up, then you have transformed Unison into an
automated live backup tool in addition to its normal duties, and have
truly become a Unison master.
Acknowledgments
This article was formulated out of ideas shared and developed in
discussions with Derek Rayside,
who indoctrinated me into the Unison way of life.