Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Unison File Synchronizer: Liberation through Data Replication

Summary
Unison is a piece of software that allows you to keep your files synchronized and backed up across different computers. This guide describes the advantages of keeping your files synchronized and provides a tutorial for setting up Unison.

Introduction

Unison, a free cross-platform file synchronization program, can not only provide you with multiple backups of your files, but more importantly, grant you the freedom to simultaneously use different computers with access to all of your files, thus liberating you from the confines of one particular machine. Unison allows you to access the same set of files from any computer (running Mac OS X, Windows XP, or UNIX/Linux variants) and keeps these files up-to-date by always maintaining the most recently-modified version of each file during synchronization. I personally use Unison to keep replicas of all of my personal files across several different computers. Because my documents and configuration files are accessible from every computer, I am free to use whichever one is the most convenient at the moment without the hassle of transferring files using floppy disks, USB drives, or email.

For example, if I am in the office on a Linux machine and want to work on a paper for a class, I can just open up the file and start typing. Before I leave, I simply synchronize that directory to my server. Then when I get home, I run synchronize on my Mac and continue working on my paper. If I feel like watching TV later while continuing to work, I can simply switch to using my Windows laptop. Then I can finish up final edits at the office the next morning on my Linux box. By the time I am done for the night, not only have I edited the same paper on three different computers without the hassle of emailing copies to myself, but I have three identical copies of it so that if any one of my computers blows up, I can still turn in my paper on time. Unison has allowed me to have the peace of mind that comes with having my files seamlessly backed-up while I am working on them and also the freedom of being able to do my work wherever and whenever is convenient.

This document describes some of the benefits of using Unison and provides some tips on doing so, although it is not meant to be a comprehensive how-to guide or user manual. Please email me if you have any questions, comments, or suggestions.

The Benefits of Unison

  • Liberation from a particular machine or operating system: This is perhaps the most practical and visible advantage of using Unison in your daily computing life. If you can have access to your files from any machine that you use (and assuming that you have programs on each machine that can utilize these files), then it really doesn't matter which one you use. Furthermore, if you can put custom configuration files for your shell or applications in your Unison hierarchy and simply use symlinks to refer to these copies on every computer that you use, then you can have a uniform working environment. For example, I use the BASH shell on every computer, whether it be Windows XP with Cygwin, Mac OS X, FreeBSD, or whatever Linux is in front of me at the moment. I have a common BASH configuration file shared by all machines, and files particular to each machine. My prompt looks the same on all machines, and I can use all of the same aliases and shell functions. When I find a cool BASH function when browsing the Internet at work, I can simply add it to my common BASH config file, sync. it, and when I get home at night, I can access that same function on my home machine. This freedom allows you to transcend the incessant bickering over which operating system is better: you can use whatever OS has the programs you want for some particular application, or simply whatever OS is in front of you at the moment.

  • Live backups via file replication: Your personal data (documents, photographs, emails, etc...) is the most valuable component in your interaction with computers, because it can be irreplaceable if lost. As I have mentioned in my Backup Quick Tips mini-article, data backup is something that everybody should do, but unfortunately, few people do it on a regular basis. In contrast to traditional backup methods, the great benefit of using Unison to replicate your files across different computers is that your backups are alive. They are not sitting on some archive tape in the basement; they are on the hard drives of each and every computer you use.

  • Seamless control and verification of backups: By synchronizing your Unison file replicas, you are the one who controls your backups so that you can be confident that they are being performed correctly. You verify the integrity of your backups simply by switching computers and accessing the files during your normal course of work. One problem with backups, dubbed as "Backup Trauma" in this humorous video clip, is that they may not be performed correctly. Backup trauma occurs when you think that your organization is properly backing up your files, when in fact they are not. You never consider backing up your own files because you know that your company takes care of that. If you lose a file, you don't sweat because you know that the sysadmins have a backup, but to your surprise, their backup wasn't done properly ... that's when backup trauma strikes. With Unison, though, you control your own backups, and the more replicas you have, the less likely that you will lose your data.

  • Fast, non-traumatic recovery from hardware failures: A hard drive crash or total computer meltdown is traumatic for most people. Why? Not because they need to pay a few hundred dollars for new hardware, but because they have just lost most or all of their precious data. If they are somewhat diligent about backups, they probably have some old backup CD's from a few months ago, but that's still a few months of lost work. With Unison, you back up basically as often as you use your computer, so you will at worst lose only the data that you have immediately been working on for the past few hours. If one of your machines dies, then it is annoying to pay to buy new hardware and install your OS and software again (which is trivial if you have an OS with automated package management software such as Fink, RPM Manager, or Apt-Get), but it is non-traumatic because you have not lost any data. If you replicated the configuration files for your favorite applications, then restoring their pre-crash state is as easy as re-installing and moving those files back to the correct places. Unison allows data to transcend hardware; after all, hardware is cheap and plentiful, but your data is irreplaceable.

Who Should Use Unison

I am not going to preach that everybody in the world should use Unison. I think that everybody should back-up their data regularly, but Unison is overkill for simply backing up data. However, if you use more than one computer on a regular basis, then you can probably gain benefits from Unison. Here are some typical configurations for different types of users:

  • Casual home user with no access to a server: If you are a typical home user who has a laptop and desktop computer but no access to a file server, then you probably use a removable USB or hard drive to shuttle files back and forth between your computers. With Unison, you can still use that method of transferring data, except that you can be confident that all of your computers will always have up-to-date copies of files (as long as you remember to synchronize). For example, you can do some work on your laptop, synchronize with the removable drive, move the drive to the desktop computer, synchronize again before you start working, and have both computers (as well as the removable drive) contain the most recent versions of all files, regardless of which computer you used to edit them.

  • University student: If you are a student at a modern university, you probably have a certain amount of storage space on the university servers as well as ssh remote login access, which is enough to run Unison. You should definitely take advantage of this space because it is probably well-maintained and regularly backed-up. After all, your tuition is helping to pay the salaries of people who are in charge of protecting your data. You can synchronize your various machines against the school's servers.

  • Personal server administrator: The ideal way to run Unison is if you can set up your own personal server with ssh login capabilities (This is possible with any flavor of UNIX or Linux, Mac OS X, and Windows XP with Cygwin). My suggestion is to dedicate one computer as your 'Unison server' which holds all of your relevant data and synchronize all of your computers to that server. This is the setup that my colleagues and I use.

Security Concerns

When I first tell people about the benefits of keeping multiple replicas of their personal data on different machines, preferably at different physical locations, one recurring concern is security: If I place my data on the university server, wouldn't people have access to it? If I start my own server and run Unison via ssh, wouldn't anybody on the Internet technically be able to connect and see my files?

These are all valid concerns, but think about the following: Would you rather risk losing your data or having somebody else access it? The safest way to secure your data is to have one computer with all of your files on it and never connect it to the Internet, but that's obviously unreasonable. It is true that the more places your data resides, the more vulnerable it is to third-party snoopers. However, if you are careful with choosing strong passwords, using secure tools like ssh, and storing your data on reliable servers, then you should be fine.

Furthermore, how much do other people really care about your data? How sensitive is your data anyways? If it's just old lab reports and vacation photos, then it's no big embarrassment if someone gets a hold of those files. However, if you have truly sensitive information, then either do not store them online or encrypt them first using a program such as ccrypt before uploading them. The deal with online security (like real-life personal security) is that if somebody is really out to get you, then they will find a way to get you. The best way to defend yourself under normal circumstances is to take some simple precautions to prevent yourself from being the target of random undirected attacks, and to not make enemies who would want to steal your data.

Level 1: Basic Unison setup

Here are some general tips for setting up Unison. This is not meant to be a comprehensive guide, and is merely a supplement to the official Unison manual.

1. Organizing your files:

Even before you install Unison, you need to first organize all of the files you want to synchronize in your replicas. Before you run Unison for the first time on your data, it is important that all of your files and folders are named and organized the way that you want. This is because Unison does not know when things are renamed. If memo.txt is renamed to memo-pad.txt, then Unison thinks that the file memo.txt has been deleted and a new file memo-pad.txt was created. Of course, you can re-name files and directories all you want, but Unison will simply think that you deleted and created identical new versions, which could get annoying.

I suggest that all of your files be organized in sub-directories under one main directory, which will be the root directory for your synchronization.

2. Determining your roots:

You need to now figure out which computers and hard drives you want to use to house the replicas of your files (these locations are called roots), and how they are going to communicate with one another (either locally or remotely via ssh). I recommend a star topology where one server (if possible) with a constant Internet connection is the central root, and all other computers synchronize with it remotely via ssh. This effectively turns the Unison peer-to-peer system into a client-server system. If you don't have access to a server, then you can use a removable hard drive as your central root and move it to different computers when you want to synchronize the files.

3. Installing Unison:

There are several versions of Unison available for download, and it is very important that you install the SAME version on every computer you want to use with Unison. This is because different versions are not compatible at all. I personally use Unison 2.13.16 because it is fairly stable and readily available for binary install in package management software on many platforms. For Windows XP, I run Unison under Cygwin and use the Cygwin Setup tool to select and install the Unison 2.13.16 binary. For Mac OS X, I use Fink. For Debian-based Linux distros, I use apt-get, and the same goes for other operating systems with package management software. You should only download the Unison source and compile it as a last resort.

4. Performing the initial copies:

Before you run Unison for the first time to synchronize files between two locations (either two computers connected through a network or a computer and a removable hard drive), you should first copy all of the files (which should all be located under some root directory) from one location to the other one. (You can use scp to securely copy files over a network.) This step ensures that both replicas will start out identical (except possibly for permissions bits, which don't exist in Windows, but Cygwin tries to simulate them nonetheless.)

5. Setting up your Unison profile:

On the computer where you are invoking Unison, it looks for a profile located in the ~/.unison/ directory to know which two locations (called roots) to synchronize and which options to invoke Unison with. I have one common profile named common.prf with properties that are shared by all profiles:

# Helps out a lot on Windows
fastcheck = true

# place new files at the top of the list
sortnewfirst = true

# turn on ssh compression
rshargs = -C

ignore = Name Thumbs.db
ignore = Name *~
ignore = Name *.tmp

Among other things, this common profile tells Unison to perform a fast file name/date check for Windows (which greatly speeds up performance) and ignore certain temporary and useless files. Every profile I set up will include this one. Here is an example of a simple profile named simple.prf:

include common

root = /home/pgbovine/my-unison-root
root = ssh://pgbovine@some.fileserver.com//home/pgbovine/my-unison-root

When I invoke Unison with the command unison simple, it looks in the ~/.unison/ directory for a profile named simple.prf and virtually appends all of the options listed in that profile onto the command-line when invoking Unison. The two root parameters are very important. These are the two locations that Unison is trying to synchronize between. In this case, the first root is the my-unison-root directory on my computer (where all of my data is stored), and the second root is the same directory on a remote file server (which must also be able to run Unison 2.13.16).

6. Running Unison for the first time:

Ok, now that you have made your initial copies and set-up a basic profile which tells Unison which two locations (roots) to synchronize, you are ready to run Unison for the first time. You can invoke Unison by typing unison simple, and it will use the options in simple.prf (and common.prf because of the 'include common' statement). During this first run, Unison will take quite a long time because it traverses through all files and builds up auxiliary metadata about each one of them (stored in a file in the ~/.unison/ directory). After it is done, it will ask you questions when there are conflicts between files. Press ? to see the choices that you have when Unison asks you a question.

However, no files should be different during this initial run because you have just made a fresh identical copy across the two roots. Most likely, you will encounter conflicts in the permissions bits, because different operating systems have different defaults (called the 'umask' in UNIX environments). Assuming that your first root is where you copied the files from, you will want to settle all conflicts by propagating changes from left-to-right. (In my experience, the default setting on Mac OS X is that you cannot change permissions on certain files, and Unison will thus fail when trying to propagate permissions. Use the 'chflags nouchg' command to change this so that you can set file permissions.)

If you don't care about permissions bits, use the -perms 0 option to not synchronize any permissions bits. I always find these bits irritating, so I ignore them whenever possible. For example, when I synchronize between my Mac and its VFAT-formatted removable hard drive, I use -perms 0, fastcheck true, and pretendwin true to ignore permissions and perform fast checking even on the relatively slow & primitive VFAT filesystem.

If you are tired of repeatedly telling Unison to propagate all changes from left-to-right, then press q to quit. You should then invoke Unison with the -auto and -force options, which forces it to always synchronize in favor of the root that you specify so that you don't need to mash buttons a zillion times to confirm your selections. In this example, you should use:

unison simple -auto -force /home/pgbovine/my-unison-root

to perform a sync where the first root always wins. After Unison finishes propagating all changes, those two roots have now been initialized. When you run Unison again on those two roots, it should go much faster because the metadata has already been stored. You need to repeat this process with every pair of roots that you want to synchronize. If possible, I suggest that you adopt a star topology and synchronize all roots against a central server root, which minimizes the number of pair-wise synchronizations required.

7. Using Unison:

Now that you've got Unison set-up, all you need to do is remember to type unison your-profile-name at the shell every time that you want to synchronize all of your files. If you name your profile default.prf, then you can simply type unison without a profile name. Remember to synchronize every time right after you login to a machine and right before you logout. Unison is only effective if you use it :) Try editing files, adding new files/directories, deleting files/directories to get accustomed to how Unison works. That's basically all there is to it. Run synchronize, tell Unison which way to propagate files if there are conflicts, and answer other questions if necessary.

Level 2: Making Unison part of your daily life

Now that you have Unison set-up and running, here are some more tips for enhancing your experience.

Setting up uniform configuration files:

I have the same BASH configuration scripts on every computer that I use (this can easily extend to configuration files for other applications) because I keep them all in my Unison repository. I also keep all of my Unison profiles in the repository. For example, all of your various BASH configuration scripts for different machines can be located in my-unison-root/Documents/config/. Then, you can simply make symlinks from your home directory on every machine to the appropriate files within my-unison-root/Documents/config/. Therefore, your configuration files are backed-up and your directory structure preserved, but you can still access the files from any machine via symlinks. This provides some platform-independence, because you can have the same configuration files for your favorite applications regardless of what computer you use.

Setting the UNISONLOCALHOSTNAME environment variable:

Unison uniquely identifies hosts by the hostname environment variable set by the operating system. Sometimes this hostname can change without your knowledge. Also, when you login remotely to university computers that all mount a network file system, you may login to a different computer every time and hence have different hostnames. Unison will not be able to recognize that a certain root has already been synchronized if the hostname changes. Therefore, just to be safe, you should always set a unique UNISONLOCALHOSTNAME environment variable for every computer in the shell initialization file (e.g., .bashrc or one of your own custom files sourced by .bashrc). Here is an example of such a line to do so in BASH: export UNISONLOCALHOSTNAME=little-mac

This makes sure that Unison will always recognize my Mac by the name 'little-mac', regardless of whether its real hostname changes.

Use 'scp' to copy large amounts of data remotely before running Unison:

Unison sometimes stalls or times out if it has to copy large amounts of data across the Internet, so if you know that you have added a huge directory to one root, use the scp tool to securely copy it remotely before running Unison. Then Unison will run a lot faster because it will think that those files are up-to-date and do not need to be propagated in either direction. I have heard (but never tried myself) that running Unison with the -debug XXX option (see User Manual for what to fill in for XXX) will prevent the timeouts on large transfers.

Synchronizing only parts of your directory structure:

I have Unison running on several computers and remote storage accounts, but not all of them have enough space to store all of my files. Thus, I have configured my Unison profile to only synchronize a subset of paths from my Unison root directory to those computers, and to ignore certain types of files. This can be accomplished using the -path and ignore options. Here is an example profile that demonstrates this:

# Paths to synchronize
path = Documents
path = Pictures/Old pictures

# Ignore all video files
ignore = Name *.avi
ignore = Name *.AVI
ignore = Path Documents/Videos

Between a certain pair of roots (unspecified here), I only want to synchronize the Documents and Pictures/Old pictures directories (and all of their sub-directories), and ignore all files that end with an .avi extension as well as files in Documents/Videos.

uwd() - The very useful Unison Working Directory function:

In order to speed up Unison's performance, it is useful to direct it to only synchronize the current directory (and all of its sub-directories). This is especially useful if your roots contain many files and directories. For example, if I modify 1 file in some documents folder, I don't want to wait for Unison to traverse through all of my pictures folders before concluding the obvious fact that they haven't changed at all. I want to be able to run Unison only on the current folder. This can be accomplished with a -path option when invoking Unison. Thus, if I know that only the files in a folder named Documents/current-docs relative to the location of the root have changed, then I can invoke Unison using:

unison profile-name -path Documents/current-docs

inserting the appropriate profile for profile-name. This will only run Unison on the Documents/current-docs directory, which is exactly what you want ... unless (here is a really subtle but important point) the profile-name profile contains -path entries of its own. Remember that Unison takes all entries in the profile and appends them onto the command-line. Thus, if profile-name looked like the following:

# Paths to synchronize
path = Documents
path = Pictures/Old pictures

then invoking the above command will actually cause Unison to execute the following:

unison -path Documents -path Pictures/Old pictures -path Documents/current-docs

which does NOT do the correct thing because it synchronizes everything in the Documents directory. One work-around is to only invoke the explicit -path option on profiles without explicit path entries.

Instead of explicitly typing in your current directory relative to the Unison root in order to synchronize it, you can instead write a shell function that calls pwd -P to get the name of the current directory (the -P gets the true path without symlinks), check if it's a sub-directory of your root, and if so, subtract the name of the root to get the proper path to pass into the Unison -path option. Here is an example of such a function, which I call uwd(), for Unison Working Directory, that you can incorporate into your BASH configuration file:

# unison the current working directory only
function uwd () {
  p=`pwd -P`
  p=${p#*$HOME/my-unison-root/}
  p=${p#*$HOME/my-unison-root}

  if [ -d "$HOME/my-unison-root/$p" ] ; then
    unison profile-name-for-uwd -path "$p"

  else
    echo "$p is not a directory in a Unison hierarchy on this machine"
  fi
}

Unfortunately, there is no one uwd() script that will work on every computer (because your directory structure will differ), but the general idea is evident in my script. You can use it as the basis for developing your own scripts. Remember that profile-name-for-uwd MUST NOT contain any path entries, or uwd() may not work properly.

Now that you have uwd() properly set-up in your BASH configuration file, you can go in any directory that is a sub-directory of the root, and simply type in uwd to synchronize it and all of its sub-directories. I find that uwd() is a very useful function for speeding up Unison because when I only know that I have modified files in a certain directory, there is no point in synchronizing the entire root.

Level 3: Taking Unison to the Extreme

If you have made it this far, then you're probably ready for some more advanced ideas which I have not yet implemented myself:

Automating Unison with cron jobs:

The one overhead of working with Unison is that you need to remember to run it every time you login and logout of a computer (or whenever you want to update or backup your files). If you forget to run it sometime, then some of your files may be out-of-sync, which can be annoying. If you have machines which you always leave on (which I don't), then you can schedule nightly cron jobs to run Unison and automatically synchronize all of your roots. If you do so, you will need to run it with the -batch option so that it does not ask any questions at all, and also do some ssh or other authentication configuration so that you don't need to manually type in your password every time. If you can manage to set this up, then you have transformed Unison into an automated live backup tool in addition to its normal duties, and have truly become a Unison master.

Acknowledgments

This article was formulated out of ideas shared and developed in discussions with Derek Rayside, who indoctrinated me into the Unison way of life.

Created: 2005-08-03
Last modified: 2007-12-07