Two phase expunge

A vanilla Cyrus mailstore removes mail messages immediately when they are expunged and mail folders immediately when they are deleted. Any requests by users to recover mail messages and mail folders which they have deleted accidentally have to be dealt with by going to dump tape (with limited chance of success given the infrequent nature of dumps). This isn't a very satisfactory state of affairs and compares poorly to the regime on the previous generation of our mail system where snapshots of user accounts are made several times a days and kept online for at least a week.

In theory, user agents are supposed to protect users from deleting messages by accident: the IMAP delete/expunge model requires an explicit confirmation from the user before messages are removed. Unfortunately:

Other IMAP user agents and Webmail clients follow the desktop metaphor and implement a Trash folder, though this isn't a natural IMAP concept, and is slightly awkward to implement within IMAP.

Experience suggests that people manage to delete mailboxes and expunge messages by accident despite the safeguards which do exist. Consequently we have implemented a "two phase expunge" system where deleted mailboxes and expunged messages are hidden, but only actually removed from disk at a later point in time. The deleted folder and expunged messages are available to user agents (most significantly our Webmail interface) using special mailbox hierarchies which are invisible until referenced. In the Unix namespace, these names are:

.DELETED/ Recently deleted mail folders
.EXPUNGED/ Recently expunged messages
.EXPUNGED/.DELETED/ Recently expunged messages in recently deleted folders

When a folder is deleted, a suffix is tagged on denoting the time at which the folder was deleted. This deals with the case of mail folders with high turnover rates (e.g: postponed-msgs):

A001 list ".DELETED/" postponed%
* LIST (\Noinferiors) "/" ".DELETED/postponed-msgs-20030826-20:52:19"
 . . .
* LIST (\Noinferiors) "/" ".DELETED/postponed-msgs-20030829-11:26:32"
A001 OK Completed (0.000 secs 19 calls)

Expunged messages and deleted mail folders are normally removed by an overnight expire job which scans through user accounts and removes all expunged messages and deleted mail folders that were expunged/deleted more than a given number of days ago. In addition the quota system is modified to record the amount of expunged data that a particular user is taking up (this doesn't acount against their normal, live quota). This allows us to place space limits as well as time limits on that data. A special overflow log is used to record accounts which are taking up excessively large amounts of expunge disk quota: the expire job runs automatically for those accounts to prevent denial of service attacks and other unpleasant effects if a large volume of mail is flowing through a specific account. All of these variables can be tuned individually for each user.

A two phase expunge system does of course require more disk space than a system which immediately removes deleted folders and expunged messages. Our experience so far suggests that that overhead is around 50% if we want to hold onto most data for a period of a month. However, modern harddisks are large, typically too large given the high I/O load associated with email applications which require lots of spindles. Expunged data occupies disk blocks, but doesn't create additional disk I/O.

A nice side effect of the two phase expunge system involves concurrent access to mail folders: one session can fetch messages which have been recently expunged by the other. In a vanilla Cyrus, this causes an IMAP_IOERROR condition. We use this to good advantage in the replication system.


David Carter <dpc22@cam.ac.uk>