OFF: Drives and backups (was: Re: +++stop press+++advertising aid item french hassan+++one man isolator+++)
Paul Mather
paul at GROMIT.DLIB.VT.EDU
Mon Aug 25 12:23:26 EDT 2008
On 25 Aug 2008, at 10:52 AM, Arjan Hulsebos wrote:
> On Mon, 25 Aug 2008 09:31:24 -0500, Carl Edlund Anderson wrote
>> Well, up until now, I use the incredibly dumb and brute force method
>
> Out of curiosity, has anyone gone through the drill of restoring
> their data
> from a backup? We once backed up a fileserver, swapped its disk for
> a fresh
> one, and did a restore. The process seemed to be successful, only to
> find out
> at the next boot that mere random patterns had been written to the
> disk. The
> only way to get a usable disk again was a low level format....
>
> Not to scare you in any way, but if you're _not_ using Carl's brute
> force
> method, do go through the drill. It's worth it.
I've restored from backups several times, including, I'm happy to say
as a Mac user, a complete system restore successfully from a Time
Machine backup. (I've also done many successful restores from a
Tivoli TSM regular backup.)
Usually, when I'm archiving data to DVD, I'll include an MD5 checksum
of all the data on the disc along with the files themselves. Then, I
check the written data against that. It's no protection against the
DVD subsequently becoming unreadable, but at least it guards against
the backup-wrote-random-gibberish-instead-of-the-data-you'd-intended
scenario you mention above. All decent backup software will also let
you verify a backup, too. DDS DAT drives, which I've used in the past
for backups, feature read-after-write tape heads.
One of the things to remember about hard drives is that though they
have automatic bad sector reallocation, it is only triggered on a
write. So, if you have data sitting on a sector that subsequently
goes bad, there's nothing the drive can do about it. That's where
redundant schemes like RAID come in: it can try and recover the data
automatically from other drives in the RAID, or from parity
information. Alas, this is where you can also discover that other
drives in the RAID, say same models from the same manufacturer or same
batch, have also failed in similar fashion, and the multiple failures
cause the RAID itself to fail.
Because of this cluster failure phenomenon, enterprise level RAID
controllers will usually have an option for the controller to
periodically "police" the entire surface of all attached drives,
making sure the data are readable.
Arjan, you might want to look into using the ZFS filesystem for your
fileservers. One of its main design features is to try not to trust
data coming from various subsystems unless it can verify it. Thus, it
employs various levels of checksumming and redundancy. It tries to be
proactive about data integrity, too. It has a "scrub" function that
tries to discover bad sectors, and, in a RAID configuration, automatic
resilvering when bad data are discovered.
ZFS is supported by Solaris and FreeBSD 7+, to give two examples.
(Mac OS X 10.5 Leopard features read-only ZFS support and a read-write
ZFS kernel module via the Developer Zone. Hopefully, Time Machine in
10.6 will use ZFS as its underlying file system...)
Cheers,
Paul.
e-mail: paul at gromit.dlib.vt.edu
"Without music to decorate it, time is just a bunch of boring production
deadlines or dates by which bills must be paid."
--- Frank Vincent Zappa
More information about the boc-l
mailing list