Back me up, store me away, and do so redundantly

I, like many others, have had my fair share of hard drive crashes; and like many others, I have my tastes when it comes to brands. My favorite brand is Seagate, my least favorite brand is Maxtor. This poses a big problem because they joined into Seagate Maxtor, so I usually lean towards Western Digital these days. The point is that you can love a brand as much as you want, but hard drives can and will fail. And will do so at the least appropriate the moment.

The best case scenario is that you have a very recent backup. The worst case scenario is that you don’t have any backup, and you lose valuable data, from either an emotional or professional point of view. Often, from both. This usually leads to nervous breakdowns, extensive cursing, going through a list of past, present and future deities to blame, and possibly weeping. I’ve done all of that, and I’m not ashamed of admitting so.

I have since taken on a “handmade” backup strategy. Time Machine takes care of the main system (minus a few folders), and I make extra copies of specific material (my photo collection, for instance) on different hard drives. It kind of works, and it’s better than backing things up on DVD, but it still feels flaky.

Optical media is the worst. It is cheap, but the limited size of each disc (4.4 GB) calls for voodoo rituals when trying to back up something bigger than that, not to mention having to go through the same thing in reverse when the time comes to pull it out again. Moreover, when the first CD-Rs came out, the manufacturers said that they would last decades. It never happened. Of course, quality differs, but I have had allegedly good discs, namely Verbatim and Sony, die on me after less than three years. When DVD-Rs came out, manufacturers said that these would last centuries. Yet they barely last a decade, unless you keep them in time capsules. The issue is that, unlike printed discs, user-recordable optical media is based upon organic material. As such, it is easily attacked by molds and fungi. I have witnessed with my very eyes the decay of a DVD-R, starting from the outside and slowly — and literally — eating it up towards the center. The solution would be to re-burn everything every 3 or 4 years, but this adds to the expense and is just extremely inconvenient, not to mention that it takes up a lot of space, in the most physical sense of the term.

Hard drives are a better solution: a much higher density (which cynics would define as the ability to lose more data at once), and generally, with today’s technology, a much higher reliability. Yet I have had drives die on me just because the power went out at the wrong time, or simply out of the blue. The click of death is a nightmare to me, and while cryogenic therapy can help sometimes, it’s not guaranteed. It also seems, from my empirical experience, that hard drives paradoxically last longer if they are used on a daily basis. Keep a disk off for a few years, and it may just never work again.

While having an array of hard disks works, it’s still not the best way to handle backups. However, a distinction should be made between backups and storage. The two concepts often overlap, but they are fundamentally different. A backup is a safety copy, something that you need to be able to recover should the main copy become inaccessible. Storage is for material that you put aside and that you may never need again. In other words, the main copy of a backup set is always available, but there is effectively no main copy of things stored away.

USB hard drives are a good solution for both, but they have one drawback: as you need more space, you start collecting power bricks and using up USB ports, leading to the purchase of USB hubs to connect to one another in a waterfall fashion. All of this adds extra risks: what if one hub dies and takes anything connected to it, both directly and indirectly, with it? USB enclosures are a better way to handle this, since you only have one or two of them and swap the disks inside. This procedure usually takes some time and involves dealing with small screws.

USB docks come to the rescue: they serve a purpose very similar to USB enclosures, but they are vertical and take disks vertically, much like a toaster. It’s a breeze to switch disks like that.

A common solution for storage, especially if more than one machine is used in a given household or office, is NAS, or Network Attached Storage. At its minimum, it’s a very basic computer with one hard drive and an Ethernet port, providing access to the former through common protocols such as SMB/CIFS, AFP or NFS. The Linksys NSLU2 is a very small device with a slow CPU (ARM5 at 266 MHz) and little memory (32 MB), and takes up to two USB hard drives. A whole set of unofficial firmwares add extra capabilities, but with so little power and with the forced use of USB, it’s still quite limited.

More current self-enclosed NAS boxes, such as the Netgear ReadyNAS family, have two or more slots. This is when things become interesting, because RAID gets in the picture. I’m not going to discuss RAID here, so please head over to Wikipedia to learn more if you’re not familiar with the matter.

Two-slot devices usually support three levels of RAID: 0, 1 and JBOD. Since we’re talking reliability, RAID-1 is nice. With two 1-terabyte disks, you get 1 TB of space (50% waste) with potential for either drive to fail while the other retains the data. Not bad. In any case, two-slot NAS boxes can be found for as cheap as €100, with better versions starting at €150 or so. Note that I’m using the prices for Italy. The disks are not included, so with two 1-TB drives (each priced €60), the total price is €270. With only one terabyte of usable space, and this means €0.26/GB, with no real ability to expand beyond replacing both disks with bigger units and keeping the 50% waste.

Four-slot units belong to another level, and mostly targeted at SOHO users. They are priced accordingly (hardly anything below €320, diskless) but support RAID-5. This is where things get hot. Four 1-terabyte disks yield about 3 TB of usable space (25% waste), and any one disk can die at a given time. This is nice. The total price of a fully loaded NAS would be at least €560, with a cost per gigabyte of €0.18.

An alternative is to use an actual computer to do all of that. There are operating systems specifically developed for that, such as FreeNAS (based upon FreeBSD), which is so power-conscious that you can install it on a 32 MB (yes, thirty-two megabytes) Compact Flash card, and run it off there. Or you can boot it off a USB stick, or even the CD itself (saving the configuration on a USB stick, which is useful if the machine is old and doesn’t boot off USB at all.) It also supports ZFS, which is extremely neat.

Now, I currently have an old machine that I frankensteinized from different sources. It’s running FreeBSD 8.1 at the moment, and it mostly serves as a download central. The specs are low, really low: AMD Duron 750 MHz, 512 MB PC100 RAM (some of which is defective according to Memtest86+, but I haven’t had any problems in actual usage so far), 120 GB IDE hard drive. It is not really suited for number crunching, and sometimes it’s often faster to download things off Usenet than it is to repair and unpack them. This kind of worries me about using RAID-5, and ZFS-based RAID-Z is definitely out of the picture (the recommended minimum is a 64-bit CPU and 2 GB of RAM.) The good news is that I installed a two-port PCI SATA controller I had lying around and FreeBSD recognized it immediately, so I could easily hook up a couple of SATA drives to it and use RAID-1, which I suppose is better than nothing. I could do that with the current FreeBSD setup, or I could get a €5 CompactFlash-to-IDE adapter and finally put those extra-small cards I’ve had for years to good use with FreeNAS. I would effectively just need to get two hard drives, and that would let me sail by for a while.

The best thing would of course building a dedicated new machine, powerful enough to handle RAID-Z (either with FreeNAS or with FreeBSD.) I toyed around with the idea last night while browsing a website that I have some discounts at. A decent machine, coupled with a couple of 1-TB disks, would set me back about €360. It’s the same as a four-slot NAS box, but with two disks included for the price and the ability to grow over time. However, I have concerns about energy consumption — embedded devices are always less demanding that general-purpose machines — and, in all honesty, having such a thing to run FreeNAS, which is somewhat “castrated,” feels a little bit overkill. Of course, while RAID-5 seems to be a bit tricky on FreeBSD as it requires non-official kernel patches, RAID-Z is supported out of the box and should do fine.

All in all, the cheapest intermediate solution would be probably purchasing two disks and the CF-IDE adapter, and mirror them using FreeNAS. Another good thing would be finding some PC100/PC133 memory of a decent size, say a couple of 512 MB sticks. Then, as needs grows and as money allows, I may switch to a dedicated file server brand-new machine, with two more disks and RAID-Z.

All of this, of course, whilst keeping in mind that RAID is not a backup solution in itself, and only offers protection against drive failure. User-driven deletions are, well, as catastrophic as they’ve ever been.