Leave well alone


ZFS not being a viable option on Fedora, I wanted to create a RAID5 array using mdadm, formatted with XFS.

There are lots of suggestions around the web about how best to do the XFS formatting to achieve optimal performance.

Essentially, the optimzation process revolves around setting stripe unit and stripe width settings. What the “correct” values are for those things can vary according to who you read (and, more pertinently, what you expect to store on your array -lots of little files, or a preponderance of very big files (like virtual machine images, movies and so on).

On the other hand, there’s a school of thought that says XFS is reasonably smart and if you just let it default to automatically-determined values, things will probably be ok.

So I thought I’d check it out.

First, I created my array and let mdadm complete the initial ‘recovery’, so no background I/O was taking place. Then I formatted it as follows:

mkfs.xfs -f -b size=4096 -d sunit=512,swidth=1024 -L bulkdata /dev/md0

And then I benchmarked the array using Gnome’s built-in disks utility:

So not too shabby: 145MB/sec read, 69 MB/sec write and an access time of 15.5msec.

Then I reformatted the array like so:

mkfs.xfs -L bulkdata /dev/md0

Re-measuring the performance in the same way as before, these were the results:

This time, reads of 165MB/s, writes of 74MB/s and access times of 15.5msec. So, significantly faster reads, moderately faster writes and access times about the same: I’d say letting XFS work it out for itself is probably as good a strategy as any -for my hardware, at any rates.

I repeated the tests multiple times, and I also introduced different formatting options with different values for the stripe unit and width and blocksize: I could certainly get worse results, but I was never able to improve on the ‘just let it work it out’ results.

Your mileage may vary, of course. But me: I’m leaving my XFS array well alone and letting it sort itself out!

Storage Solutions

Long-time readers may recall that I bought at least one of these:

It’s a Drobo and it’s been a disaster. First, because it’s temperamental: move it, or power it off then on, or basically breathe anywhere in its general direction, and it will dramatically start flashing red lights, indicating total failure (and lost data).

When you have suppressed the sick feeling in your stomach at that thought, you can power it off and on a few more times and, probably, it will decide to reboot in ‘green light’ mode. At which point your data is safe for as long as you don’t breathe again.

Secondly, it’s incredibly slow to re-protect your data. One of the advertised joys of the Drobo was that you could eject one of the four existing hard disks, pop in a new one of greater size and then just sit back and wait for the thing to re-distribute your data over the whole array so that it was all protected once more. Which is, indeed, what happens… except that you wait a long, long, very long time. And throughout the duration of that wait, your data isn’t protected against another hardware failure at all. When I last tried to do this with 4 2TB drives, the box spent 6.5 days thinking about it.You run with unprotected data for a week and see how you get on sleeping!

And third: the Drobo was and remains relatively noisy. Easily audible above the noise of a loud action movie it’s providing the data for, that’s for sure.

We had a council clean-up this week, so my Drobos finally met their destiny as land-fill, as should have been the case many years ago. I hated them, and I’d never buy another. Good riddance to bad rubbish, I say!

Let me instead introduce you to this beauty:

It’s an HP Proliant N40L Micro Tower Server. It shipped with a single 250GB hard disk, but with space for three others of any capacity. It also shipped with 2GB of ECC RAM and an AMD Turion II running at 1.5GHz.

For that lot, I paid the princely sum of AU$264, from these guys, who were the cheapest I could find. I notice that today, they’re quoting an extra $25 for the same config, so obviously I got lucky!

I added 8GB of ECC RAM -and fitting it was not fun. It requires pulling the entire motherboard out, and that can only be done by unplugging all power, SATA and other cables. Those cables fit tightly in the confined space available -and the SATA connector, in particular, seems to have been welded into place by cruel people with exceptionally fine welding skills. My previous experience of detaching soldered SATA connectors from motherboards came back to haunt me… but we got there in the end. I’ve also added a USB-3.0 expansion card to fill the available PCIx slot, but I didn’t bother adding a graphics card, so it’s strictly on-board graphics for me.

All up, including postage, I think one of these things cost me AU$370. So naturally, I got two. And that still means I’ve bought two brand new, quality servers for hardly much more than I paid for one of my original Drobos!

Of course, you only get 250GB of storage for that price: you have to fill the other three slots (non-hot-swappable) yourself. Fortunately, I had quite a few 2TB drives knocking about the place, thanks in part to me having recently destroyed Drobos en masse. So, pop those in (very easily!), and I now have 6.2TB in each server.

Time then to install a server OS… and it could have been Scientific Linux, of course. But since I have my Technet subscription, and I wouldn’t mind learning more about Windows 2008 R2 administration, on that goes instead. Turn the three 2TB drives into a single 4TB, RAID-5 array… and I now have 8TB of protected storage, humming very quietly in the background. I think each server consumes about 50W, which seems economical enough. Plus, the big bonus: I can barely hear them, even though they sit just behind my triple-monitor setup, precisely at ear-height.

A 1.5GHz CPU doesn’t sound like much: I’ve seen people slagging the Turion IIs off as though they were barely in the Intel Atom class. But Windows 2008 is certainly responsive when I remote desktop to it, and both boxes are running virtual machines (one each) without seeming to struggle. I’ve no complaints about them, anyway: fast enough for this old-timer.

Essentially, all I’ve ever wanted is for near-silent, RAID-protected storage for my music and other multimedia collections. The Drobos failed to provide that on so many levels, it wasn’t funny. These new HP boxes, in contrast, do the job just fine… and give me a capable, stable platform to run permanently two large-ish (6GB RAM and 200GB HDD) virtual machines at the same time, which is a nice bonus. At AU$264, I’d recommend them to anyone, though at AU$289, I find even my enthusiasm waning a bit.

But I still like them a lot, and they have cheered me up no end, allowing me to dump the damn Drobos. That’s worth almost any price, come to think of it.

Big RAID on Linux

My New Year resolutions finally started to come good recently, when I was allowed to buy 3 x 2TB drives for my desktop. I expected that fitting them into the PC case itself would be a pain (I never seem to have enough SATA power connectors, though SATA data cables are practically coming out of my ears! How can there possibly be such a mis-match??!), but I hadn’t expected it to be quite such a trial turning them into a RAID5 system that my Linux distro could recognise and use. But it was… or, at least, it felt like it at the time!

First, it’s “fake raid”. I’m too cheap to buy a real hardware RAID card -and besides, I’d never get it past the Household Budget Watch Committee of One. So, it’s there in the motherboard’s BIOS… a quick F1 on boot-up, a button-press here, and F10 there… bingo, I have 4TB of usable storage and I can afford a hard disk failure. Nice.

Now, boot up Linux and check with the Gparted tool: bummer. There are /dev/sdb, /dev/sdc and /dev/sdd, each identified separately as a 2TB drive (well, OK… 1.82TiB, but that’s inflation for you). But they’re all separate from each other, and there’s no apparent understanding that, actually, all three are doing teamwork now. Luckily for me, the problem is in Gparted, not my hard disks (or even my fake raid setup): it simply doesn’t “do” fake raid.

The good news, however, is that the tools which *can* do fake raid are available -and are, in fact, probably already installed on your distro. The key one is dmraid. If you just type sudo dmraid (or become root and type dmraid), you’ll know soon enough if it’s installed: you should get an error message complaining that no arguments or options have been given. If instead you are told “command not found”, then it’s not installed and you’ll have to install it using your distro’s package manager and (if I were doing it, a reboot afterwards). The other tool you’ll need to set things up properly is parted. That’s not Gparted, the graphical tool which doesn’t understand fake raid, but its command line cousin which does.

So: assuming that dmraid’s been installed and is running (it runs by default in all later editions of Ubuntu and its derivatives, for example), you’ll first need to know under what name your fake raid device has been detected. If you do ls /dev/mapper/*, you should see a weird device name listed there. Mine happened to be isw_ifdbedffj_safedata, and I recognised this to be my fake raid because the name “safedata” was one I’d assigned in the BIOS setup screen when creating the array in the first place.

Now that you know the device name, you can partition it. In the old days of peanut-sized hard disks, you’d have done something like fdisk /dev/sda to begin the process of partitioning the sda1 hard disk. Try that now, however, and you’ll be in trouble because (a) fdisk doesn’t like working with large hard drives and (b) /dev/sda isn’t the right device name! Instead, you work with the parted tool to set up partitions on (in my case) /dev/mapper/isw_ifdbedffj_safedata. It trips less easily off the fingers and keyboard, that’s for sure! But at least it will work. Here’s what I did:

sudo parted /dev/mapper/isw_ifdbedffj_safedata
mklabel gpt
Warning: The existing disk label on /dev/mapper/isw_ifdbedffj_safedata will be destroyed 
and all data on this disk will be lost. Do you want to continue?
Yes/No? yes
mkpart primary ext4 4 -1
align-check optimal 1
1 aligned
name 1 safedata
quit

The mklabel gpt command there causes this large volume to be created as a GUID-partition table drive (as opposed to a more-usual Master Boot Record one, which can’t cope with volume sizes much larger than 2TB). This is something we’re probably going to have to get used to now that 3TB disks are available for quite reasonable sums!

The other interesting command in that lot of gibberish was this one: mkpart primary ext4 4 -1. From the name, you can probably guess this is the command that is actually making or creating the partition. I wanted a single volume of 4TB in size, so I’m creating a single primary partition which will eventually use ext4 as its file system. The tricky bit is those last 2 numbers. They tell parted where the new partition should start and stop, expressed as offsets from the disk’s “inner track”, with ‘-1′ having the special meaning of ‘keep going until you run out of disk platter!’. My code, for example, says “start at the 4MB mark and continue until the end of the disk”. Which probably prompts the next obvious question: why start at 4MB? Why not at 0?

Well, here’s the message I got when I did start at 0:

(parted) mkpart primary ext4 0 -1                                       
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel?

I’m afraid we’re talking about that hoary old chestnut, partition boundary alignment. Your raid array has a stripe size; the volume is created of clusters; if the partition boundaries aren’t aligned right, then the one can cross over the other and have the effect of causing what ought to have been one I/O operation to become two. Windows suffers from the same thing, incidentally, and there’s even an article available on the issue (that probably explains it better than I just did!). Long story cut short, therefore: by skill and profound insight luck, I found that sacrificing the first 4MB of my hard disk allowed my partition boundaries to align correctly (and thus give me a substantial performance boost for nearly nothing). The number you’d have to sacrifice to achieve the same thing will depend entirely on your stripe size, cluster size and (probably) the wind direction that day… so experiment. The align-check command you see me do simply gets parted to confirm that the newly-created partition really is properly aligned.

Once parted has done its work, it’s relatively easy to format the new partition with a new file system. I say “relatively” there, only because the formatting options for the ext4 file system are a pain in the neck! Here’s the command I issued:

sudo mkfs -t ext4 -m 0 -O extents,uninit_bg,dir_index,filetype,has_journal,sparse_super -L safedata /dev/mapper/isw_ifdbedffj_safedata

Nice! The main parts of interest here is that the command mkfs is being applied to the correct device (i.e., /dev/mapper/isw_ifdbedffj_safedata); I’m giving the resulting file system a label (that’s the -L bit) of “safedata”, too; and I’m making sure the file system uses extents and a journal (extents makes it fast, a journal makes it safe). What the other options are doing… well, that’s what documentation is for!

Incidentally, when I first issued that command, I was told “/dev/mapper/isw_ifdbedffj_safedata is apparently in use by the system; will not make a filesystem here!” Quite how a disk volume with no file system could actually be in use by the system, I haven’t the faintest idea… but a reboot cured the problem and allowed me to format the thing without a problem. (I realise this is very much the Windows User approach to Linux difficulties, but there are times when switching the thing off and on again actually works!)

Finally, it’s time to mount the new file system -for which, of course, you need a mountpoint. I also like to ensure I assign ownership and permissions on the drive once it’s been mounted:

sudo mkdir /data
sudo mount /dev/mapper/isw_ifdbedffj_safedata /data
sudo chown -R hjr:users /data
sudo chmod -R 775 /data

And if that all works, you polish things off by editing /etc/fstab so that the new volume is re-mounted automatically every time the PC restarts. Fstab edits can get clever, sexy (sort-of) and convoluted… but I kept mine very short and to-the-point:

/dev/mapper/isw_ifdbedffj_safedata /data    ext4    defaults 0 0

Another reboot to check the thing actually does what it says on the tin, and we’re (finally!) sorted.

Quickie NAS

I never actually used Windows Home Server (WHS), but I thought about doing so often enough. It’s killer feature (for me)? The ability to plug in different disks, of different sizes, from different vendors (and even using different interfaces -I have a lot of old PATA drives kicking around!), and have them appear to the rest of the world as one large storage ‘pool’, with in-built redundancy. This was called ‘Driver Extender’… and has just been removed as a feature from the new Version 2 Home Server product. It seems a bit of a weird decision on Microsoft’s part, removing one of the two key product differentiators that made Home Server special. It wouldn’t surprise me to see the entire thing killed off, to be honest.

Anyway, the reason I no longer care too much about WHS is that I have my own way of doing networkable, extensible bulk storage: a Drobo with a WDTV Live media player.

My particular “first generation” Drobo only takes 4 hard disks -newer and more expensive ones can take up to 8. But using 2GB drives, that still means 6TB of usable storage (4 x 2TB, minus 2TB used for data protection), which is enough to be going on with. If 3TB and 4TB drives ever make an appearance, I’ll probably be a firmware update away from being able to increase my protected, usable storage space accordingly. Regardless, you can stick any combination of SATA drives into the Drobo you happen to have handy, and swap out smaller drives for bigger ones as your storage needs grow (and as your wallet finds it can cope). There’s no networking capability (you’ll need to pay stupid money for the Drobo Share, or the Drobo FS to get that), but you do get extensible, protected, set-and-forget storage that more-or-less just works (see below for the ‘more-or-less’ bit!).

The WDTV Live is a good media player. I had a plain old WDTV before the ‘Live’ version came out, and the upgrade gets you a networked media player. Set it up with an IP address, plug in an ethernet cable and it immediately makes itself visible as a Windows (well, Samba, anyway!) Share on the local network. Other than that capability and a slightly slicker front-end, there’s not a lot of difference: the thing is still capable of playing just about any media format you throw at it, has no problem with High Definition content, has a lovely “ten foot interface’ that anyone can drive within seconds… and just works, beautifully.

Stick these two products together, then, and what do you have? Basically, the Drobo just plugs into the WDTV Live via USB, is then seen as a single giant volume full of multimedia files… and the contents of that volume are then shared around the rest of the network, thanks to the Samba-sharing nature of the WDTV. When I rip a new CD on my PC in the study, therefore, it’s trivial to copy the output to the Drobo sitting under the TV on the other side of the house, despite the Drobo not having ‘intrinsic’ networking of its own. So what you actually end up with is a NAS that does excellent duty as a media server and player. Someone should design a product that includes both bits of functionality in the one box!

The networked Drobo FS costs about AU$850. The standalone Drobo Share costs AU$300. Neither would be able to play a bean on my TV! My non-networked Drobo cost AU$599, and the WDTV Live cost a further AU$189… so I end up with viewable, networked, extensible, protected storage for AU$788 instead of non-viewable, etc, etc for AU$850 – AU$900. (Hard disks cost extra, of course).

I’d thoroughly recommend the WDTV Live… it’s really plain sailing to use, and you couldn’t get a more capable, simpler media player. We ditched the original WDTV player a year or so ago for the joys of Windows Media Center running on a spare PC… but the usual Windows problems meant that experiment turned into something of a disaster (crashes, driver problems, forever updating etc etc). We were so pleased to be able to junk the complexities of Windows for the highly-functional simplicities of the WDTV once more!

I wish I could recommend Drobo quite so unreservedly. If you’d asked me three months ago, I would have done. But since then, I made the mistake of purchasing a new one for an elderly friend. I mention the ‘elderly’ bit because his requirements, above all, are for something that simply works, without fuss, bother or the need for constant fiddling and tweaking. He is a very non-technical person, and his movie collection needs to be safe without having to think about it. A pity, then, that the Drobo unit I purchased for him turned out to be defective: it didn’t work at first, it then worked long enough to copy a couple of terabytes onto, and then it decided not to work again once it had been plugged into a different PC. It would hang during its boot sequence; it would declare it couldn’t find some disks, then decide it could see those after all but now couldn’t see the ones it had no problems seeing before; it would not be detected by Windows 7 at all, and then it would be detected without a problem, until you rebooted the PC -after which it would revert to being undetectable. It was bonkers, frankly. Precisely what you don’t want when you buy ‘safe, protected, reliable storage’!

Naturally, you get the odd lemon turning up whenever you take the hardware-purchasing plunge, but I can tell you: getting one lemon makes you have second thoughts about the earlier purchase that has never put a foot wrong! It just undermines your confidence in the product as a whole, in short. And it doesn’t help that their “support desk” has the same senseless, robotic and dumbed-down attitude that all support desks seem to go for these days. All I wanted was a returns authorisation number. Instead, I get asked to produce a diagnostic log. Fair enough: I try to do that, and I can’t because the unit is completely unrepsonsive. I tell them this. They reply not with a ‘Jeez, it’s screwed then!’ but with a ‘well, can you try using the Firewire port instead of the USB one’! I don’t even have a Firewire port, I point out. Well, I’ll need to escalate this to the next level of support, I am told. No you won’t, I say… either authorise the return right now, or I start consulting lawyers. At which point, the return was authorised without further comment!

I don’t like the fact that I had to wrestle with them like that. The second I couldn’t produce a diagnostic log because the unit had hung, they should have authorised the return. The suggestion to plug it into a Firewire port reflects the fact, I think, that Drobos are very popular in Apple Mac circles… and I imagine the dumbed-down, treat-you-like-a-moron, have-you-tried-turning-it-back-off-and-on style of support is designed to cater to that particualr type of audience. It didn’t do anything to endear me to them or their product, though! (Can you tell??!)

Anyway, I’d like to say that the guy who actually sold me the thing couldn’t have been nicer or more solicitous: he’s gone out of his way (literally: he turned up at the office today to pick the defective unit up personally) to see me right with a new Drobo that works properly. Time will tell on that score, I guess, but I can’t really fault his efforts thus far. Meanwhile, my own, original Drobo sits there quietly under the telly doing sterling service without the slightest issue. So yes, on balance, I would still recommend it. Just make sure you get an excellent vendor -and don’t waste too much time with their useless technical “support”. My vendor, by the way, who comes highly recommended, is Ross at Ineedstorage.