Better Butter Archive

One of the latest things that is stable and ready for production in the Linux world is the butter filesystem. At Tate Dev Ops I have made use of it to create a revolutionary archive system. In the past backups were cumbersome, big, and resource consuming. XFS snapshots for instance require one to restore a whole volume of many things, and involved a great deal of duplication. Beyond being a backup this archive system  is workable on the small scale, such as the past of one user or one page of a website.

The first concept to grasp is the delta-compression and delta means 'what has changed.' The butter filesystem has a new feature, a reference link, and it is an option to the copy command not the link command. One of these copies when made involves no duplication of data. When either file after the copy changes the butter filesystem just stores a delta, a change that makes the copy different. The result is that space is saved, and this also means the size in your cache is smaller which increases performance.

To secure the archive process the SSH 'forced command' feature is used and each system to be archived has two key-pairs. One keypair does a 'restricted rsync' and rsync is a tool for synchronising file changes over a network. The other key simply triggers the snapshot process and nothing else. The synchronisation is performed to a folder called 'current' and it runs as root and preserves permissions and extended attributes. The snapshot script is triggered after this and this copies the 'current' as reference links to a directory named after the date. Thus, a delta-compressed snapshot for a specific time is performed.

With this structure where archives are simply filesystem directories and files one gets great power. One can log into the archive server with a password, and access the snapshots, and work with them on a small scale. In fact one can even chroot into their snapshot and start a service such as MySQL to recover a single table. One can work with snapshots comparatively and assist their auditing with the hindsight in the archive.

Though these actions must be performed as root and so the operator on the recovery/access side of the archive must be very trustworthy. However, this administrator could add SSH forced commands with keypairs that run scripts submitted by users of the system. It is of course not difficult to have many archive servers, and work is being done so that SELinux can be used to limit archive access on the recovery/access side.

There are plans to modify operating system package managers so that they make checksums of all the files they make. These can simply be stored on the systems where packages are installed. They then end up in the archive and one can find things like rootkits by checking the integrity of files in the present against the checksums stored when they were installed. This archive is about far more than backups, and ensures integrity and empowers recovery.