next up previous contents index
Next: Heterogenous Networks Up: Network Previous: The Apache Web Server   Contents   Index

Subsections


File Synchronization

Today, many people use several computers -- one computer at home, one or several computers at the workplace, and possibly a laptop or PDA on the road. Many files are needed on all these computers. You may want to be able work with all computers and modify the files and subsequently have the latest version of the data available on all computers.


Data Synchronization Software

Data synchronization is no problem for computers that are permanently linked by means of a fast network. In this case, use a network file system like NFS and store the files on a server, enabling all hosts to access the same data via the network. This approach is impossible if the network connection is poor or not permanent. When you are on the road with a laptop, you need to keep copies of all needed files on the local hard disk. However, it will then be necessary to synchronize modified files. When you modify a file on one computer, you have to make sure that a copy of the file is updated on all other computers. For occasional copies, this can be done manually with scp or rsync. However, if many files are involved, the procedure can be complicated and requires great care to avoid errors, such as overwriting a new file with an old file.


Caution


[Risk of data loss] Before you start managing your data with a synchronization system, you should be well acquainted with the program used and test its functionality. A backup is indispensable for important files.

The time-consuming and error-prone task of manually synchronizing data can be avoided by using one of the programs that employ various methods to automate this job. The following summaries are merely intended to convey a general understanding of how these programs work and how they can be used. If you plan to use them, read the program documentation.


InterMezzo

The idea of InterMezzo is the implementation of a file system that exchanges files via the network like NFS, but stores local copies on the individual computers, thus ensuring that the files are available even when the network connection is down. The local copies can be edited. All changes are noted in a special log file. When the connection is restored, these changes are automatically forwarded and the files are synchronized. More information about InterMezzo is available in /usr/share/doc/packages/InterMezzo/InterMezzo-HOWTO.html (not available locally!), if the package is installed.


Unison

Unison is not a network file system. Rather, the files are simply saved and edited locally. The program Unison can be executed manually to synchronize files. When the synchronization is performed for the first time, a database is created on the two hosts, containing check sums, time stamps, and permissions of the selected files. The next time it is executed, Unison can recognize which files were changed and propose the transmission from or to the other host. Usually all suggestions can be accepted.


CVS

CVS, which is mostly used for managing program source versions, offers the possibility to keep copies of the files on multiple computers. Accordingly, it is also suitable for our purpose.

CVS maintains a central repository on the server, in which not only the files but also changes to files are saved. Changes that are performed locally are committed to the repository and can be retrieved from other computers by means of an update. Both procedures must be initiated by the user.

CVS is very resilient to errors in the event that changes occur on several computers. The changes are merged and, if changes took place in the same lines, a conflict will be reported. When a conflict occurs, the database remains in a consistent state. The conflict is only visible for resolution on the client host.


mailsync

In contrast to the synchronization tools covered in the previous sections, mailsync merely serves the purpose of synchronizing e-mails between mailboxes. The procedure can be applied to local mailbox files as well as to mailboxes on an IMAP server.

Based on the message ID contained in the e-mail header, the individual messages are either synchronized or deleted. Synchronization is possible between individual mailboxes and between mailbox hierarchies.


Determining Factors for Selecting a Program


Client-Server vs. Peer-to-Peer

Two different models are commonly used for distributing data. In the first model, all clients synchronize their files with a central server.

The server must be accessible by all clients at least from time to time. This model is used by CVS and InterMezzo.

The other possibility is to let equal hosts synchronize their data among each other. This is the approach Unison uses.


Portability

InterMezzo is a solution that only supports Linux systems at present. In the past, the support was limited to 32-bit little endian architectures (ix86). Due to the migration from the perl-based lento to InterSync, this limitation no longer applies. Nevertheless, caution is needed when synchronizing between different architectures, as this feature has not yet been tested thoroughly. CVS and Unison are also available for many other operating systems, including various Unix and Windows systems.


Interactive vs. Automatic

In InterMezzo, the data synchronization normally occurs automatically in the background as soon as a network connection can be established with the server. Manual intervention is only required when conflicts arise.

In CVS and Unison, the data synchronization is started manually be the user. This enables more control over the data to synchronize and easier conflict handling. On the other hand, if the synchronization intervals are too long, conflicts are more likely to occur.


Speed

Due to their interactive character, Unison and CVS appear to be slower than InterMezzo, which operates in the background. Usually CVS is somewhat faster than Unison.


Conflicts: Incidence and Solution

Conflicts do not occur often in CVS, even if several people work on one large program project. This is because the documents are merged on the basis of individual lines. When a conflict occurs, only one client is affected. Usually conflicts can easily be resolved in CVS.

Unison reports conflicts, allowing the affected files to be excluded from the synchronization. However, changes cannot be merged as easy as in CVS.

Due to the noninteractive character of InterMezzo, conflicts cannot simply be resolved interactively. When conflicts occur, InterSync terminates with an alert message. In this case, the system administrator must intervene and possibly transfer files manually (using rsync or scp) to restore a consistent state.


Selecting and Adding Files

InterMezzo synchronizes the entire file system. Newly added files in the file system automatically appear on the other computers.

Configured in the simplest way possible, Unison synchronizes an entire directory tree. New files appearing in the tree are automatically included in the synchronization.

In CVS, new directories and files must be added explicitly using the command cvs add. Thus, the user has more control over the files to synchronize. On the other hand, new files are often overlooked, especially if the question marks in the output of cvs update are ignored because of the number of files.


History

An additional feature of CVS is that old file versions can be reconstructed. A brief editing remark can be inserted for each change and the development of the files can easily be traced later based on the content and the remarks. This is a valuable aid for theses and program texts.


Data Volume and Hard Disk Requirements

An adequate amount of free space for all distributed data is required on the hard disks of all involved hosts. CVS requires additional space for the repository on the server. The file history is also stored on the server, requiring even more space. When files in text format are changed, only the modified lines need to be saved. Binary files require additional space amounting to the size of the file every time the file is changed.


GUI

Unison offers a graphical user interface that displays the synchronization procedures Unison wants to perform. Accept the proposal or exclude individual files from the synchronization. In text mode, you can interactively confirm the individual procedures.

Experienced users normally control CVS from the command line. However, graphical user interfaces are available for Linux, such as cervisia, as well as for other operating systems, like wincvs. Many development tools (such as kdevelop) and text editors (such as emacs) provide CVS support. The resolution of conflicts is often much easier with these front-ends.

InterMezzo does not offer that much comfort. On the other hand, normally it does not require any interaction and should simply do its job in the background once it is configured.


User Friendliness

The configuration of InterMezzo is relatively difficult and should only be performed by a system administrator who has some experience with Linux. SuSE @nohyphen root privileges are required for the configuration. Unison is easy to use and is also suitable for newcomers.

CVS is more difficult to use. Users should understand the interaction between the repository and local data. Changes to the data should first be merged locally with the repository. This is done with the command cvs update. Then the data must be sent back to the repository with the command cvs commit. If this procedure has been grasped, newcomers will also be able to use CVS with ease.


Security Against Attacks

During transmission, the data should be protected against interception and manipulation. Both Unison and CVS can easily be used via ssh (Secure Shell), providing security against attacks of this kind. Use of CVS or Unison via rsh (Remote Shell) should be avoided and access by way of the CVS ``pserver'' mechanism in insecure networks is not advisable either.

In InterMezzo, the data synchronization is performed via http. This protocol can easily be intercepted or altered. To increase the level of security, SSL can be used, but this makes the configuration a little bit more difficult. InterMezzo should only be used without SSL in secure, trustworthy networks.


Protection Against Data Loss

CVS looks back on a long record of deployment by developers for the management of program projects and is extremely stable. As the development history is saved, CVS even provides protection against certain user errors, such as the unintentional deletion of a file.

Unison is still relatively new, but boasts a high level of stability. However, it is more sensitive to user errors. Once the synchronization of the deletion of a file has been confirmed, there is no way to restore the file.

Currently, InterMezzo is still in alpha stage. As the files are stored in a separate file system, the probability of a major data loss is relatively low. However, something could go wrong with the file synchronization itself, leaving behind wrecked files. The resilience to user errors is also limited: when a file is deleted locally, the same step is performed on all synchronized hosts. For this reason, backups are strongly recommended.

tex2html_deferred


InterMezzo Unison CVS mailsync
C-S/equal C-S P2P C-S P2P
Portability Linux(i386) Lin,Un*x,Win Lin,Un*x,Win Lin,Un*x
Interactive - x x -
Speed ++ - o +
Conflicts - o ++ +
File selection File system Directory Selection Mailbox
History - - x -
Hard disk space o o - +
GUI - + o -
Difficulty - + o o
Attacks - +(ssh) +(ssh) +(SSL)
Data loss o + ++ +


Introduction to InterMezzo

Architecture

InterMezzo uses a special file system type. The files are stored locally on the hard disks of the individual hosts. One of the file systems available in Linux is used for this purpose, preferably ext3 or one of the other journaling file systems. Following the preparation of the partition, the file system is mounted with the type intermezzo. The kernel loads a module with InterMezzo support and all changes performed in this file system are written to a log file.

Following these preliminary steps, InterSync can be started. This program starts a web server, such as apache, other hosts can access to exchange data. When configuring a client, the name of the server must be specified in InterSync. This server will be contacted. A freely selectable designation for the file system -- the ``fileset'' -- is used to identify the file system.

InterSync is the next generation of the older InterMezzo system, which used a perl daemon called lento for the file synchronization. The documentation of InterSync still refers to this older system occasionally. However, this system has been replaced by InterSync. Unfortunately, the module included in standard kernels still supports lento and does not work with InterSync. A newer module is available in the SuSE kernel. For self-compiled kernels, the kernel module should be built with the package km_intersync.

The configuration of InterMezzo requires system administrator permissions. As indicated above, the configuration of InterMezzo is relatively difficult and should therefore be performed by an experienced system administrator. The configuration described below does not provide any protective mechanisms. Thus, others can easily intercept and manipulate the data synchronized with InterMezzo. For this reason, the configuration should only take place in a trustworthy environment, such as a wired private network protected by a firewall.

Configuring an InterMezzo Server

One of the hosts, preferably one having a good network connection, is assigned the role of the server. The entire data synchronization traffic traverses this server.

A separate file system must be set up for the data storage. If you do not have a spare partition and do not use LVM, the easiest way is to set up the file system in the form of a ``loop device'', which enables the use of a file in the local file system as a separate file system.

In the following example, an InterMezzo/ext3 file system with a size of 256 MB is set up in the root directory. The fileset is assigned the designation fset0.


dd if=/dev/zero of=/izo0 bs=1024 count=262144
mkizofs -r fset0 /izo0  # The warning can be ignored

This file system is mounted to /var/cache/intermezzo.


mount -t intermezzo -ofileset=fset0,loop /izo
/var/cache/intermezzo


To do this automatically when the system is booted, an entry must be made in the file /etc/fstab. Now InterSync can be configured by customizing /etc/sysconfig/intersync. The following is entered in this file:

INTERSYNC_CLIENT_OPTS="--fset=fset0"
INTERSYNC_CACHE=/var/cache/intermezzo
INTERSYNC_PROXY=""

Now InterSync can be started with the following command:


/etc/init.d/intersync start

To do this automatically when the system is booted, the service can be entered in the list of services to start with insserv intersync.

Configuring InterMezzo Clients

The configuration of the clients (one server can serve multiple clients) is virtually the same as that of the server. The only difference is that the name of the server has to be specified in the variable INTERSYNC_CLIENT_OPTS when configuring /etc/sysconfig/intersync:


INTERSYNC_CLIENT_OPTS="-fset=fset0 -server=sun .cosmos.com "

sun .cosmos.com must be replaced with the network name of the server. If possible, the file systems on all computers should have the same size.

Troubleshooting

As soon as a client is started, changes to files located in the directory /var/cache/intermezzo/ should also be visible on the server and all other clients. If this is not the case, this is usually because no connection can be established to the server or due to a configuration error such as a wrong ``fileset'' designation. To find the error, analyze the log messages in the system log /var/log/messages. The web server started logs its data to /var/intermezzo-X/. The log file of the kernel, which records changes to the file system, is located in /var/cache/intermezzo/.intermezzo/fset0/kml and can be viewed with kmlprint.

When a conflict occurs, normally one of the InterSync processes is aborted. If the file synchronization is no longer performed, look for indications in the log file and use /etc/init.d/intersync status to check if the synchronization service is still active.

If necessary, refer to the package documentation:


/usr/share/doc/packages/intersync/

http://www.inter-mezzo.org/


Introduction to Unison


Uses

Unison is an excellent solution for synchronizing and transferring entire directory trees. The synchronization is performed in both directions and can be controlled by means of an intuitive graphical front-end. A console version can also be used. The synchronization can be automated so interaction with the user is not required, but experience is necessary.


Requirements

Unison must be installed on the client as well as on the server. In this context, the term server refers to a second, remote host (unlike CVS, explained in 15).

In the following section, Unison is used together with ssh. An ssh client must be installed on the client and an ssh server must be installed on the server.


Using Unison

The approach used by Unison is the association of two directories (``roots'') with each other. This association is symbolic -- it is not an online connection. In our example, the directory layout is as follows:

Client: Server:
/home/tux/dir1 /home/geeko/dir2

You want to synchronize these two directories. The user is known as newbie on the client and as SuSE @nohyphen geeko on the server. The first thing to do is to test if the client-server communication works:



unison -testserver /home/tux/dir1
ssh://geeko@server//homes/geeko/dir2



The most frequently encountered problems are:

  • The Unison versions used on the client and server are not compatible
  • The server does not allow SSH connections
  • None of the two specified paths exists

If everything works, omit the option -testserver.

During the first synchronization, Unison does not yet know the relationship between the two directories and submits suggestions for the transfer direction of the individual files and directories. The arrows in the ` Action' column indicate the transfer direction. A question mark means that Unison is not able to make a suggestion regarding the transfer direction, as both versions were either changed or are new.

The arrow keys can be used to set the transfer direction for the individual entries. If the transfer directions are correct for all displayed entries, simply click ``Go''.

The characteristics of Unison (e.g., whether to perform the synchronization automatically in clear cases) can be controlled by means of command-line parameters specified when the program is started. The complete list of all parameters can be viewed with unison -help.

For each pair, a synchronization log is maintained in the user directory  /.unison. Configuration sets such as  /.unison/example.prefs can also be stored in this directory:

 /.unison/example.prefs
root=/home/foobar/dir1
root=ssh://fbar@server//homes/fbar/dir2
batch=true

To start the synchronization, specify this file as the command-line parameter as in unison example.prefs.


More Information

The official documentation of Unison is extremely useful. For this reason, this section merely provides a brief introduction. The complete manual is available at http://www.cis.upenn.edu/~bcpierce/unison/ and in the SuSE package unison.


Introduction to CVS

Uses

CVS is suitable for synchronization purposes if individual files are edited frequently and are stored in a loose file format, such as ASCII text or program source text. The use of CVS for synchronizing data in other formats, such as JPEG files, is possible, but leads to large amounts of data, as all variants of a file are stored permanently on the CVS server. Furthermore, in such cases most of the capabilities of CVS cannot be used.

The use of CVS for synchronizing files is only possible if all workstations can access the same server. In contrast, with the program Unison the data can be passed by host A through the hosts B and C to the server S.

Configuring a CVS Server

The ``server'' is the place where all valid files are located, including the latest versions of all files. Any stationary workstation can be used as a server. If possible, the data of the CVS should be included in regular backups.

When configuring a CVS server, it might be a good idea to grant the user access to the server via ssh. If the user is known to the server as newbie and the CVS software is installed on the server as well as on the client (e.g., notebook), the following environment variables must be set on the client side:


CVS_RSH=ssh
CVS_ROOT=newbie @server:/serverdir

The command cvs init can be used to initialize the CVS server from the client side. This needs to be done only once.

Finally, the synchronization must be assigned a name. Select or create a directory on the client that will exclusively contain files to manage with CVS (the directory can also be empty). The name of the directory will also be the name of the synchronization. In our example, we use a directory called synchome. Change to this directory and enter the following command to set the synchronization name to synchome:


   cvs import synchome NEWBIE NEWBIE _0

Many CVS commands require a comment. For this purpose, CVS starts an editor (the editor defined in the environment variable $EDITOR or vi if no editor was defined). The editor call can be circumvented by entering the comment in advance on the command line, such as in the following example:


cvs import -m 'this is a test' synchome
NEWBIE NEWBIE _0


Using CVS

From now on, the synchronization repository can be ``checked out'' from all hosts:


cvs co synchome

This creates a new subdirectory synchome on the client. To commit your changes to the server, change to the directory synchome (or one of its subdirectories) and enter cvs commit.

By default, all files (including subdirectories) are committed to the server. To commit only individual files or directories, specify them as in cvs commit file1 directory1. New files and directories must be added to the repository before they are committed to the server with a command like cvs add file1 directory1. Subsequently, the newly added files and directories can be committed: cvs commit file1 directory1.

If you change to another workstation, ``check out'' the synchronization repository, provided this has not been done during an earlier session at the same workstation (see above).

The synchronization with the server is started with the command cvs update. You can also update individual files or directories as in cvs update file1 directory1. If you first want to see the difference from the versions stored on the server, use the command cvs diffor cvs diff file1 directory1. Use cvs -nq update to see which files would be affected by an update.

Here are some of the status symbols displayed during an update:

U
The local version was updated.
M
The local version was modified. If there were changes on the server, it was possible to merge the differences in the local copy.
P
The local version was patched with the version on the server.
C
The local file conflicts with current version in the repository.
?
This file does not exist in the CVS.

The status M indicates a locally modified file. Either commit the local copy to the server or remove the local file and run the update again. In this case, the missing file will be retrieved from the server. If you commit a locally modified file and the file was changed and commited before in the same line, you might get a conflict indicated with C.

In this case look at conflict marks (»> and «<) in the file and decide between the two versions. As this can be a rather unpleasant job, you might decide to abandon your changes, delete the local file, and enter cvs up to retrieve the current version from the server.

More Information

This section merely offers a brief introduction to the many possibilities of CVS. Extensive documentation is available at the following URLs:

        http://www.cvshome.org/
        http://www.gnu.org/manual/


Introduction to mailsync

Uses

mailsync is mainly suitable for the following three tasks:

  • Synchronization of locally stored e-mails with e-mails stored on a server
  • Migration of mailboxes to a different format or to a different server
  • Integrity check of a mailbox or search for duplicates

Configuration and Use

mailsync distinguishes between the mailbox itself (the ``store'') and the connection between two mailboxes (the ``channel''. The definitions of the stores and channels are stored in  /.mailsync. The following paragraphs explain a number of store examples.

A simple definition might appear as follows:

store saved-messages {
        pat     Mail/saved-messages
        prefix  Mail/
}

Mail/ is a subdirectory of the user's home directory that contains e-mail folders, including the folder saved-messages. If mailsync is started with


mailsync -m saved-messages

an index of all messages will be listed in saved-messages. If the following definition is made

store localdir {
          pat     Mail/*
          prefix  Mail/
}

the command mailsync -m localdir will cause all messages stored under Mail/ to be listed. In contrast, the command mailsync localdir lists the folder names. The specifications of a store on an IMAP server appear as follows:

store imapinbox {
  server  {mail.edu.harvard.com/user=gulliver}
  ref     {mail.edu.harvard.com}
  pat     INBOX
}

The above example merely addresses the main folder on the IMAP server. A store for the subfolders would appear as follows:

store imapdir {
  server  {mail.edu.harvard.com/user=gulliver}
  ref     {mail.edu.harvard.com}
  pat     INBOX.*
  prefix  INBOX.
}

If the IMAP server supports encrypted connections, the server specification should be changed to

server  {mail.edu.harvard.com/ssl/user=gulliver}

or, if the server certificate is not known, to

server {mail.edu.harvard.com/ssl/novalidate-cert/user=gulliver}

The prefix will be explained later.

Now the folders under Mail/ should be connected to the subdirectories on the IMAP server:

channel folder localdir imapdir {
        msinfo .mailsync.info
}

mailsync uses the msinfo file to keep track of the messages that have already been synchronized.

The command mailsync folder does the following:

  • The mailbox pattern is expanded on both sides
  • The prefix is removed from the resulting folder names
  • The folders are synchronized in pairs (or created if they do not exist)

Accordingly, the folder INBOX.sent-mail on the IMAP server will be synchronized with the local folder Mail/sent-mail (provided the definitions explained above exist). The synchronization between the individual folder is performed as follows:

  • If a message already exists on both sides, nothing happens
  • If the message is missing on one side and is new (not listed in the msinfo file), it will be transmitted there
  • If the message merely exists on one side and is old (already listed in the msinfo file), it will be deleted there (because the message that had obviously existed on the other side was deleted)

To know in advance which messages will be transmitted and which will be deleted during a synchronization, start mailsync with a channel and a store:


mailsync folder localdir

This command produces a list of all messages that are new on the local host as well as a list of all messages that would be deleted on the IMAP side during a synchronization. Similarly, the command


mailsync folder imapdir

produces a list of all messages that are new on the IMAP side and a list of all messages that would be deleted on the local host during a synchronization.

Possible Problems

In the event of a data loss, the safest method is to delete the relevant channel log file msinfo. Accordingly, all messages that only exist on one side will be viewed as new and therefore will be transmitted during the next synchronization.

Only messages with a message ID are included in the synchronization. Messages lacking a message ID are simply ignored, which means they are neither transmitted nor deleted. A missing message ID is usually caused by faulty programs when sending or writing a message.

On certain IMAP servers, the main folder is addressed with INBOX and subfolders are addressed with a randomly selected name (in contrast to INBOX and INBOX.name). Therefore, for such IMAP servers, it is not possible to specify a pattern exclusively for the subfolders.

After the successful transmission of messages to an IMAP server, the mailbox drivers (c-client) used by mailsync set a special status flag. For this reason, some e-mail programs, like mutt, are not able to recognize these messages as new. The setting of this special status flag can be disabled with the option -n.

More Information

The README in /usr/share/doc/packages/mailsync/, which is included in the package mailsync, provides additional information. In this connection, RFC 2076 ``Common Internet Message Headers'' is of special interest.


next up previous contents index
Next: Heterogenous Networks Up: Network Previous: The Apache Web Server   Contents   Index
root 2003-11-05