Uptime

I feel moderately pleased with myself. The picture on the left is from a work system that is pretty heavily used: it has a database running on it with one table containing around 16 million rows, and those rows containing around 12GB of free-form text (via an Oracle Text index). That one table gets queried about five times a second and has response times of less than 0.5 seconds.

It probably counts as a bit of tiddler in the great scheme of things, but when I took it on, they had a quarter of the records with average response times of 26 seconds. I feel as if I’ve earned my money!

(Apropos nothing in particular, their uptime was originally less than 90 days… they were running Windows 2003. I’m just sayin’…)

It would have been longer, too, except the SysAdmin caused the rack to short one day about a year and a month ago (by plugging in a USB device somewhere. Go figure!)

Big RAID on Linux

My New Year resolutions finally started to come good recently, when I was allowed to buy 3 x 2TB drives for my desktop. I expected that fitting them into the PC case itself would be a pain (I never seem to have enough SATA power connectors, though SATA data cables are practically coming out of my ears! How can there possibly be such a mis-match??!), but I hadn’t expected it to be quite such a trial turning them into a RAID5 system that my Linux distro could recognise and use. But it was… or, at least, it felt like it at the time!

First, it’s “fake raid”. I’m too cheap to buy a real hardware RAID card -and besides, I’d never get it past the Household Budget Watch Committee of One. So, it’s there in the motherboard’s BIOS… a quick F1 on boot-up, a button-press here, and F10 there… bingo, I have 4TB of usable storage and I can afford a hard disk failure. Nice.

Now, boot up Linux and check with the Gparted tool: bummer. There are /dev/sdb, /dev/sdc and /dev/sdd, each identified separately as a 2TB drive (well, OK… 1.82TiB, but that’s inflation for you). But they’re all separate from each other, and there’s no apparent understanding that, actually, all three are doing teamwork now. Luckily for me, the problem is in Gparted, not my hard disks (or even my fake raid setup): it simply doesn’t “do” fake raid.

The good news, however, is that the tools which *can* do fake raid are available -and are, in fact, probably already installed on your distro. The key one is dmraid. If you just type sudo dmraid (or become root and type dmraid), you’ll know soon enough if it’s installed: you should get an error message complaining that no arguments or options have been given. If instead you are told “command not found”, then it’s not installed and you’ll have to install it using your distro’s package manager and (if I were doing it, a reboot afterwards). The other tool you’ll need to set things up properly is parted. That’s not Gparted, the graphical tool which doesn’t understand fake raid, but its command line cousin which does.

So: assuming that dmraid’s been installed and is running (it runs by default in all later editions of Ubuntu and its derivatives, for example), you’ll first need to know under what name your fake raid device has been detected. If you do ls /dev/mapper/*, you should see a weird device name listed there. Mine happened to be isw_ifdbedffj_safedata, and I recognised this to be my fake raid because the name “safedata” was one I’d assigned in the BIOS setup screen when creating the array in the first place.

Now that you know the device name, you can partition it. In the old days of peanut-sized hard disks, you’d have done something like fdisk /dev/sda to begin the process of partitioning the sda1 hard disk. Try that now, however, and you’ll be in trouble because (a) fdisk doesn’t like working with large hard drives and (b) /dev/sda isn’t the right device name! Instead, you work with the parted tool to set up partitions on (in my case) /dev/mapper/isw_ifdbedffj_safedata. It trips less easily off the fingers and keyboard, that’s for sure! But at least it will work. Here’s what I did:

sudo parted /dev/mapper/isw_ifdbedffj_safedata
mklabel gpt
Warning: The existing disk label on /dev/mapper/isw_ifdbedffj_safedata will be destroyed 
and all data on this disk will be lost. Do you want to continue?
Yes/No? yes
mkpart primary ext4 4 -1
align-check optimal 1
1 aligned
name 1 safedata
quit

The mklabel gpt command there causes this large volume to be created as a GUID-partition table drive (as opposed to a more-usual Master Boot Record one, which can’t cope with volume sizes much larger than 2TB). This is something we’re probably going to have to get used to now that 3TB disks are available for quite reasonable sums!

The other interesting command in that lot of gibberish was this one: mkpart primary ext4 4 -1. From the name, you can probably guess this is the command that is actually making or creating the partition. I wanted a single volume of 4TB in size, so I’m creating a single primary partition which will eventually use ext4 as its file system. The tricky bit is those last 2 numbers. They tell parted where the new partition should start and stop, expressed as offsets from the disk’s “inner track”, with ‘-1′ having the special meaning of ‘keep going until you run out of disk platter!’. My code, for example, says “start at the 4MB mark and continue until the end of the disk”. Which probably prompts the next obvious question: why start at 4MB? Why not at 0?

Well, here’s the message I got when I did start at 0:

(parted) mkpart primary ext4 0 -1                                       
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel?

I’m afraid we’re talking about that hoary old chestnut, partition boundary alignment. Your raid array has a stripe size; the volume is created of clusters; if the partition boundaries aren’t aligned right, then the one can cross over the other and have the effect of causing what ought to have been one I/O operation to become two. Windows suffers from the same thing, incidentally, and there’s even an article available on the issue (that probably explains it better than I just did!). Long story cut short, therefore: by skill and profound insight luck, I found that sacrificing the first 4MB of my hard disk allowed my partition boundaries to align correctly (and thus give me a substantial performance boost for nearly nothing). The number you’d have to sacrifice to achieve the same thing will depend entirely on your stripe size, cluster size and (probably) the wind direction that day… so experiment. The align-check command you see me do simply gets parted to confirm that the newly-created partition really is properly aligned.

Once parted has done its work, it’s relatively easy to format the new partition with a new file system. I say “relatively” there, only because the formatting options for the ext4 file system are a pain in the neck! Here’s the command I issued:

sudo mkfs -t ext4 -m 0 -O extents,uninit_bg,dir_index,filetype,has_journal,sparse_super -L safedata /dev/mapper/isw_ifdbedffj_safedata

Nice! The main parts of interest here is that the command mkfs is being applied to the correct device (i.e., /dev/mapper/isw_ifdbedffj_safedata); I’m giving the resulting file system a label (that’s the -L bit) of “safedata”, too; and I’m making sure the file system uses extents and a journal (extents makes it fast, a journal makes it safe). What the other options are doing… well, that’s what documentation is for!

Incidentally, when I first issued that command, I was told “/dev/mapper/isw_ifdbedffj_safedata is apparently in use by the system; will not make a filesystem here!” Quite how a disk volume with no file system could actually be in use by the system, I haven’t the faintest idea… but a reboot cured the problem and allowed me to format the thing without a problem. (I realise this is very much the Windows User approach to Linux difficulties, but there are times when switching the thing off and on again actually works!)

Finally, it’s time to mount the new file system -for which, of course, you need a mountpoint. I also like to ensure I assign ownership and permissions on the drive once it’s been mounted:

sudo mkdir /data
sudo mount /dev/mapper/isw_ifdbedffj_safedata /data
sudo chown -R hjr:users /data
sudo chmod -R 775 /data

And if that all works, you polish things off by editing /etc/fstab so that the new volume is re-mounted automatically every time the PC restarts. Fstab edits can get clever, sexy (sort-of) and convoluted… but I kept mine very short and to-the-point:

/dev/mapper/isw_ifdbedffj_safedata /data    ext4    defaults 0 0

Another reboot to check the thing actually does what it says on the tin, and we’re (finally!) sorted.

Scientifically Flash

Red Hat Enterprise Server 6 was released a couple of months back (in November 2010, if memory serves). I liked the look of it (basically, Fedora 12 with a lot of Enterprise-class stability added), but was looking forward to trying it out for free when Centos 6 was released. Two months later, however, and there’s still no real sign of an actual Centos 6 (though this post on the developer’s list suggests that there should be a beta available Real Soon Now).

Not wanting to wait any further, therefore, I installed Scientific Linux 6 the other day. It’s only available as an Alpha version (number 8 or 9, I believe) and I got my copy here. Alpha or not, it seems pretty stable to me, and I recommend it.

Scientific Linux is another one of those distros which are built from the original Red Hat source code, once various trademarks and logos have been removed. It’s therefore practically binary-equivalent to the “real thing”, but is made available for zero cost, and updates are available from standard repositories without payment. If you care about such things, the distro gets its name from the fact that CERN and Fermilab (amongst others) use it: true geekdom indeed!

Oracle installs on it fine, incidentally.

Getting Flash working is a bit of a trial (nothing new there, then). Basically, the download from Adobe is a 32-bit library (called libflashplayer.so) which you unzip and then copy (to /usr/lib/mozilla/plugins). But you also have to issue (as root) the following command:

yum install nspluginwrapper.i686 alsa-plugins-pulseaudio.i686

Once you re-start Firefox (version 3.6.13 after a yum update, if you’re wondering), you’ll be able to watch the videos on (for example) the BBC News website -my standard Flash test!- without a problem.

To install other packages to the thing which aren’t in the “standard” repositories, such as Stellarium, I simply followed the instructions here about adding the RPMForge repository. Binary compatibility is a wonderful thing -it means the instructions, though ostensibly meant for Centos, apply to Scientific Linux perfectly well, too.

Anyway: Scientific Linux. If you’re at all concerned that CentOS seems to be losing a bit of its mojo, it’s a viable alternative!

Handbrake Excellence!

Ripping Blu-ray discs has always been painful, despite the RipBot264 program doing its best to simplify things. It’s therefore excellent news to hear that version 0.9.5 of Handbrake has now been released -and that one of it’s key new features is that it finally understands what a Blu-ray disc is and how one is layed out. Being able to interpret the source disc correctly, it’s now able to encode a rip from one directly, too.

Handbrake itself is not the simplest software in the world to master, but it comes with a variety of “presets” that have long made it possible to rip good ol’ DVDs in as simple a manner as I can imagine. And once you did ‘master’ the thing, you had enormous control over the quality and speeds of your DVD rips. To have the same capabilities finally extended to Blu-ray discs has therefore made me very happy!

New Year’s (Spending) Resolutions

New year, new business plan to be put to The Other Half… here’s hoping.

1. 8GB of RAM is so last year -and makes it really tricky when you’re building a 4-node virtual RAC cluster on your desktop. Obviously, 16GB is the new black, and accordingly two of these ought to be considered an essential priority purchase. If it helps get it past the budget keeper, I could settle for just getting one set of the 2x4GB RAM sticks, thus bringing my current box up to a total of 12GB for half the cost. But you know the other 4GB will be needed pretty soon, so you might as well blow the full AU$300 up-front, it seems to me.

2. Yes, I know we purchased several 2TB hard disks this year. Most of them are sitting in the Drobo, though, and storing our music and movie collections, so that was really a ‘family purchase’, not a ‘Howard Rogers Personal Account’ one. So two more of these would be nice. If I put those Samsung 2TB hard disks together with the two I already have on my personal account, that would be 6TB of usable RAIDed storage, which is probably just about enough for all those RAC nodes and shared ASM disks -and there’d just about be room left over to backup the Drobo, too. At least untill the 3TB equivalents come out…

3. Yes, I know I bought a crappy 64GB first-gen solid state hard disk about 2 years ago and have been swearing about it ever since. Yup, I realise it was AU$560 down the drain, basically. But that was two years ago, and surely I have done enough washing up since to wash away that particular sin? If so, one of these would be nice. 120GB makes a nice boot drive without being too ostentatiously grandiose. Yes, it’s true I don’t reboot Linux very often. You are correct when I told you months ago that a lack of reboots was one of the especially good things about Linux. But we get a lot of thunderstorms, so power failures mean saving a couple of seconds once every few months is worth it. Trust me on this.

4. It is true that you very generously allowed me to get a 460W totally silent power supply for one of the PCs earlier this year. Now I’d like you to repeat your generosity and let me get a 850W near-silent one, too, for the other PC. The 460W was only ever intended for my Dual Core. My i7 remains as thunderously noisy as a noisy thunderstorm, which is ironic in that I use it to play music more than pretty much anything else. Silence is golden, and the purchase of a second near-silent power supply will return my study to the den of peace and quiet we always intended it should be.

5. I know I may have inadvertently mentioned wanting one of these 9-screen multi-monitor setups a few months back. Accordingly, I understand that you will believe that anything less than 9 monitors will be something I am doomed to be dissatisfied with in record short time. There is probably some truth in this line of reasoning. But even I know the difference between a fantasy nerd-drool and the real world, usually. So a second one of these would do me just fine, and give me a chance to retire that awful 22-inch thing I’ve been using as my second monitor for, oh… at least 18 months.

Thank you for your consideration, and if you could just leave the credit card on the study table, I’d be grateful.