Monthly Archives: November 2011

TT: Uni-directional file synchronization between different hosts with Unison

When you work with at least two computers on the same project on a daily basis you might have a problem. You need to get changed files from host A to host B and vice versa. The problem getting bigger when you work in addition on different operation systems or use more than two hosts. On UNIX/Linux the preferred tool for such a task is Rsync. Unfortunately Rsync synchronize only in one direction, it doesn’t work very well when more than two hosts are involved (and it isn’t really comfortable to set up on Windows) and can’t use a secure communication channel. Another approach is to check-in changed source files into a version control system, like CVS. On host A you check it in and on host B you check it out afterwards. But this means you always need a more or less stable variant of your code, so that other developer can, at least compile, or much better use it. That is not always the case (especially when you leave the office at 11:00 p.m.) and it also doesn’t cover files which aren’t handled by a version control system. Luckily there is a solution for all the problems mentioned which is called Unison. So here comes the second post in the ToolTips series, which covers an easy and portable way for file synchronization.

Installing Unison

Most modern Linux distributions include Unison in their package manage system. On Mac OS X you can use MacPorts. Alternatively you could download a binary version for Mac OS X or Windows here. To prevent surprises and unnecessary trouble it might be a good idea to make sure that every involved system use the same version of Unison. At least on Linux and Mac OS X it is relatively easy to compile Unison from the sources.

Setting up public/private key authentication for ssh

One advantage of Unison over Rsync is that you can use different communication channels for the file transfers. One is ssh. As I always prefer/demand encrypted communication this is a big plus of course. In the default setup you can just use ssh. But for a little bit more comfort I suggest to create a public/private key pair for the authentication.
The following creates public/private keys without a password. Although this is much more easier to use, it should be only used on hosts which are trusted. If you are in doubt, use the normal password approach or even better create a public/private key pair with a password. Create a new public/private key pair with the following command:

user@host-a ~ $ ssh-keygen -t rsa

When you are asked for a password just hit Enter twice. The command creates the private key in ~/.ssh/id_rsa and the public key in ~/.ssh/id_rsa.pub. Now copy the public key to host B:

user@host-a ~ $ scp ~/.ssh/id_rsa.pub host_B:.ssh/authorized_keys

If you already have some public keys on host B, make sure you append the new key and not overwrite the file by the above command. Make the file accessible by the user only with:

user@host-b ~ $ chmod 600 ~/.ssh/authorized_keys

Now you should be able to connect to host B without any interaction needed.

Configuring Unison

Like in the long UNIX tradition, Unison is configured using text files. The files are located in the ~/.unison directory. You can configure more than one synchronization target by choosing a meaningful name. There exists one default target which is configured in the file default.prf. Because I have more than one target I prefer to split the configuration into several files. You can include other project files with the include statement as shown here:

# directory on host a (this is where Unison will be executed)
root       = /mnt/data/projects
# directory on host b (this is the remote host)
root       = ssh://host-b//mnt/data/projects
# which directories to sync?
include projects_files.prf
# options
include options.prf
ignorecase = false
# unison executable on the server
servercmd  = /usr/local/bin/unison

We setup the root directories on both machines, including the configuration file for the project target and some generic option file. We also overwriting the default unison location, because this is a self compiled version. The file options.prf looks like this:

# No staled nfs and mac store files
ignore  = Name .nfs*
ignore  = Name .DS_Store
# options
log     = true
rsrc    = true
auto    = true
#debug   = verbose
#logfile = ~/.unison/unison.log

This just set some generic options which are valid for all my targets. For the specific target projects the file projects_files.prf contains mainly the directories and files which should be ignored:

# No ISOs
ignore    = Path vms/ISO
# Ignore VBox branches
ignore    = Path vbox-*
# No binary output from the other platforms
ignore    = Path vbox*/out/*
# One exception:
ignorenot = Path vbox/out/linux.amd64.additions
# No wine stuff
ignore    = Path vbox*/wine.*
# Tools
ignore    = Name vbox*/tools/{FetchDir,freebsd*,os2*}

So in general, you configure the directory to synchronize and later define directories or files which should be ignored. As you see, you can include or exclude paths as you like. Even simple bash wildcards are possible. As shown in this example I exclude all binary files of a VirtualBox build, because they are useless on another platform. Understanding how Unison decide which directories or files should be synchronized is sometimes difficult. So I suggest to carefully read the documentation and just use the “try and failure” approach ;). Another reason for splitting up the configuration files is you can synchronize these files as well. I have another target which synchronize several configuration files, e.g. .bashrc, .profile, .vim* and the sub-project files of Unison like the projects_files.prf. You can’t synchronize e.g. default.prf, cause the root directories are different from host to host, but the general configuration is always the same. My home target looks like this:

# Which directories/files to sync?
path = .bashrc
path = .ion3
path = .gdbinit
path = .cgdb
path = .valgrind-vbox.supp
path = .vim
path = .vimrc
path = .gvimrc
path = .Xdefaults
path = .gnupg
path = .unison/options.prf
path = .unison/home_files.prf
path = .unison/projects_files.prf
# Do not sync:
ignore = Path .vim/.netrwhist
ignore = Path .ion3/default-session*
ignore = Path .cgdb/readline_history.txt

One of the strengths over other synchronization tools is, you can do this for others host as well. So if you synchronize between host a and host b you can also synchronize between host c and host b. However, a little bit of discipline is necessary. There should be one host which all other host synchronize again.
If you now execute unison the project target will be used. If you execute unison home the files of the home target will be synchronized.

Conclusion

Unison is a very powerful tool. You can synchronize between more than two hosts (OS independent), in a secure way and uni-directional. Currently there is no better tool and I use it on a daily basis.