Further to my recent post, the good news is that the ZFS team have released a version of ZFS which will work on the latest Fedora:
The 0.6.5.9 version supports kernels up to 4.10 (which is handy, as that’s the next kernel release I can upgrade my Fedora 25 to, so there’s some future-proofing going on there at last). And I can confirm that, indeed, ZFS installs correctly now.
I’m still a bit dubious about ZFS on a fast-moving distro such as Fedora, though, because just a single kernel update could potentially render your data inaccessible until the good folk at the ZFS development team decide to release new kernel drivers to match. But the situation is at least better than it was.
With a move to the UK pending, I am disinclined to suddenly start wiping terabytes of hard disk and dabbling with a new file system… but give it a few weeks and who knows?!
I’m referring to Churchill 1.7, a major overhaul of the Churchill framework and its accompanying documentation, making it work on RHCSL 6.8, for 12c only, with Enterprise Manager Cloud Control 13c and a complete overhaul of the bootstrap options available when kickstarting O/S installations.
At some point toward the end of January, it morphed into practically a complete re-write… and I thought seriously about calling it quits and declaring it to be version 2.0. But I’ve stuck with the incremental versioning for now. (I’ve been saving version 2 for when I get round to making it work with RHCSL 7.x distros).
I’m finished in another sense, though, too: the contract to purchase a house in Nottingham is ready to sign and it accordingly looks very much as though I’ll be becoming an ex-Aussie (or a re-Englishman, I suppose, depending on your point of view) on or around 6th March. I may not have much time to post much here given the packing, flight-booking, passport-checking, Internet banking, etc etc shenanigans that now ensue. If I can I will, but otherwise I’ll be back online toward the end of March, live from Nottingham 🙂
In modernising Churchill to work for Oracle 12c and the latest 6.x releases of RHCSL, I’ve encountered a bizarre bug (#19476913 if you’re able to check up on it), whereby startup of the cluster stack on a remote node fails if its hostname is longer than (or equal to) the hostname of the local node.
That is, if you are running the Grid Infrastructure installer from Alpher (6 characters) and pushing to Bethe (5 characters) then the CRS starts on Bethe just fine: local 6 is greater than remote 5. But if you are running the GI installer on Gamow (5 characters) and pushing to Dalton (6 characters) then the installer’s attempt to restart the CRS on Dalton will fail, since now local 5 is less than remote 6. Alpher/Bethe managed to dodge this bullet, of course -but only by pure luck.
The symptoms are that during the installation of Grid Infrastructure, all works well until the root scripts are run, at which point (and after a long wait), this pops up:
Poke around in the [Details] of that dialog and you’ll see this:
CRS-2676: Start of 'ora.cssdmonitor' on 'dalton' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'dalton'
CRS-2672: Attempting to start 'ora.diskmon' on 'dalton'
CRS-2676: Start of 'ora.diskmon' on 'dalton' succeeded
CRS-2676: Start of 'ora.cssd' on 'dalton' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'dalton'
CRS-2672: Attempting to start 'ora.ctssd' on 'dalton'
CRS-2883: Resource 'ora.ctssd' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors. 2017/02/18 10:21:41
CLSRSC-117: Failed to start Oracle Clusterware stack Died at /u01/app/12.1.0/grid/crs/install/crsinstall.pm line 914.
The installation log is not much more useful: it just documents everything starting nicely until it fails for no discernible reason when trying to start ora.ctssd.
Take exactly the same two nodes and do the installation from the Dalton node, though, and everything just works -so it’s not, as I first thought it might be, something to do with networks, firewalls, DNS names resolution or the myriad other things that RAC depends on being ‘right’ before it will work. It’s purely and simply a matter of whether the local node’s name is longer or shorter than the remote node’s!
The problem is fixed in PSU 1 for 184.108.40.206, but it’s inappropriate to mandate its use in Churchill, since that’s supposed to work with the vanilla software available from OTN (I assume my readers lack support contracts, so everything has to work as-supplied from OTN for free).
The obvious fix for Churchill, therefore, is to (a) either make the ‘Gamow’ name one character longer (maybe spell it incorrectly as ‘gammow’?); or find a ‘D’ name that is both a physicist and only 4 characters long or fewer; or (c) change both names ensuring that the second is shorter than the first.
Largely due to the distinct lack of short-named, D-named physicists, I’ve gone for the (c) option: Churchill 1.7 therefore builds its Data Guard cluster using hosts geiger and dirac. Paul Dirac (that’s him on the top-left) was an English theoretical physicist, greatly admired by Richard Feynman (which makes him something of a star in these parts) and invented the relativistic equation of motion for the wave function of the electron. He used his equation to predict the existence of the positron -and of anti-matter in general, something for which he won a share of the 1933 Nobel prize for physics. Geiger is a frankly much less distinguished physicist whose main claim to fame is that he invented (most of) the Geiger counter and wasn’t (apparently) a Nazi. He gets into the Churchill Pantheon by the skin of his initial letter and not much else, to be honest!
Short version then: Churchill 1.7 now uses Alpher/Bethe and Geiger/Dirac clusters, and both Gamow and Dalton are no more. Quite a bit of documentation needs updating to take account of this trivial change! Hopefully, I should have that sorted by the end of the day. And that will teach me to test all parts of Churchill before declaring that ‘it works with 12c’. (Oooops!)
Careful readers will note that I’ve re-jigged the look and feel of the place!
I am not entirely sure I like the results as yet, but it’s certainly a bit slicker and (I think) more functional.
If you spot any visual ‘anomalies’ arising, let me know in the comments… and if you can’t stand the new look (or feel it’s the best thing since sliced bread), let me know there, too. These things can always be reversed or tweaked 🙂
The main one is that I’ve removed a few dependencies on .i686 packages. That means the O/S installations can now all take place off the first DVD alone. No second DVD is prompted for, in other words.
This in turn brings about the biggest single benefit of the new release: it works with CentOS 6.8 (and Scientific Linux 6.8, too).
In fact, Churchill 1.7 now reverts to making CentOS 6.8 the default O/S assumed to be in use. (You can still always specify ‘redhat’, ‘sl’ or ‘oel’ if you prefer to use real Red Hat, Scientific Linux or Oracle Enterprise Linux, of course; and you can always specify an earlier distro version if you prefer to stick with (say) 6.3 -though I can’t think why you’d particularly want to).
The other big change is that the bootstrap lines are now trivially easy. Back in 2013 when I first released Churchill, it seemed like a good idea to make it as flexible as possible so that users could specify their own IP addresses and hostnames; but this just made for really lengthy bootstrap lines and confused the heck out of everybody!
So, the simplify brush has been daubed all over Churchill. You now must use the speed keys 1 to 4 as you build your nodes (or you can instead specify their corresponding hostnames). By specifying the speed key or hostname, you automatically define all the pesky details about IP address and interconnect IP address. It makes things a lot simpler and less confusing, I think. It also makes it a bit less flexible… but that’s the price you pay for simplicity. Ask the Gnome developers!
Other changes flow in consequence: the filecopy=y/n parameter is no longer required. If you are building nodes 1 or 3, file copying is assumed to be ‘yes’; if you are building nodes 2 or 4, it’s assumed to be ‘no’. Likewise, there’s no dg=y/n parameter any more: if you are building nodes 1 or 2, it’s assumed to be ‘no’; but build nodes 3 or 4, it’s assumed to be ‘yes’.
It is still possible to say sk=1&rac=n (or hostname=alpher&rac=n), though: that’s if you are building node 1 but want to use it in standalone mode.
As with the previous release (1.6), Churchill only works to build 12c standalone and RAC/Data Guard environments: there is no Oracle 11g support these days.
These are substantial changes and mean Churchill now works in ways quite different from before. That obviously affects the way it is documented. Those documentation changes are being made as I write and should be ‘live’ by the time you read this. Since the old versions of Churchill are kept available on the old site, I’ll keep the original documentation available on that old site too, at least for now.
On this site, however, only the 1.7 version will be available and documented.
Taking its cue from the news that the Chrome and Firefox browsers are, in their latest releases to start flagging non-https websites as ‘insecure’, the Dizwell menagerie of sites has now gone all-https.
I believe that anything linked to http will automatically re-direct to the correct https equivalent, but since I don’t have the age of the Universe to check, I can’t be 100% sure about it. Please let me know if you find something that doesn’t seem to be working that once did, particularly if it stops you using something like Atlas.
Frankly, there’s not much personal information on this site at the best of times, so there was no real point in making the move before now. That said, it’s also true that Google have been down-rating http links in their search results pages for quite a while, favouring https ones instead. Anything I can do to bump my pages up a notch at Google is a good thing, I guess 🙂
There are lots of suggestions around the web about how best to do the XFS formatting to achieve optimal performance.
Essentially, the optimzation process revolves around setting stripe unit and stripe width settings. What the “correct” values are for those things can vary according to who you read (and, more pertinently, what you expect to store on your array -lots of little files, or a preponderance of very big files (like virtual machine images, movies and so on).
On the other hand, there’s a school of thought that says XFS is reasonably smart and if you just let it default to automatically-determined values, things will probably be ok.
So I thought I’d check it out.
First, I created my array and let mdadm complete the initial ‘recovery’, so no background I/O was taking place. Then I formatted it as follows:
And then I benchmarked the array using Gnome’s built-in disks utility:
So not too shabby: 145MB/sec read, 69 MB/sec write and an access time of 15.5msec.
Then I reformatted the array like so:
mkfs.xfs -L bulkdata /dev/md0
Re-measuring the performance in the same way as before, these were the results:
This time, reads of 165MB/s, writes of 74MB/s and access times of 15.5msec. So, significantly faster reads, moderately faster writes and access times about the same: I’d say letting XFS work it out for itself is probably as good a strategy as any -for my hardware, at any rates.
I repeated the tests multiple times, and I also introduced different formatting options with different values for the stripe unit and width and blocksize: I could certainly get worse results, but I was never able to improve on the ‘just let it work it out’ results.
Your mileage may vary, of course. But me: I’m leaving my XFS array well alone and letting it sort itself out!
I didn’t know this site existed until today, but it’s fair to say that since I found it (courtesy of a link on the Peppermint Linux website), I am having difficulties keeping my wallet closed.
ToH will not approve of the extravagance of Linux embroidery on my t-shirts, however, so it might (unfortunately) be easier to hold fire than I had first thought… but who knows how long I shall be able to hold out?!
Sadly, they don’t do a Fedora one. Or a Manjaro one.
Mercifully, they don’t do an Oracle Enterprise one either (I guess the t-shirt would have to be tomato red if they did).
The SUSE chameleon one is really nice… but I don’t run that these days, so wearing it would just make me a fraud. Same goes for the Debian swirl-only…
So I figure I may have to settle for a simple Tux logo.
Unless I change my main desktop distro again. Hmmm…
You will immediately notice from the second screenshot that version 0.6.5.8 of whatever it is which happens to have been screenshotted only supports up to 4.8 kernels, whereas the first screenshot shows that a Fedora 25 installation is using kernel version 4.9. Clearly that Fedora installation won’t be able to run whatever is being referred to in the second screenshot.
So what is it that I’ve taken that second screen shot of? This:
Ooops. It happens to be a screenshot of the current stable release number of the world’s greatest file system for Linux.
Put together, and in plain English, the combination of the two version numbers means: I can’t install ZFS on Fedora.
Or rather, I could have done so when Fedora 25 was freshly installed, straight off the DVD (because it ships with a 4.8 kernel, so the 0.6.5.8 version of ZFS would have worked just fine on that). ZFS on 4.8-kernel-using-Fedora 25 works fine, therefore.
But if I had, say, copied 4.8TB of data onto a freshly created zpool and then updated Fedora, I would now not be able to access my 4.8TB of data at all (because the relevant ZFS kernel modules won’t be able to load into the newly-installed 4.9 kernel). Which sort of makes the ZFS file system a bit less than useful, no?!
Of course, once they release version 0.7 version of ZFS (which is currently at release candidate 2 state), then we’re back in business -because ZFS 0.7 supports 4.9 kernels. Unless Fedora go and update themselves to using kernel 4.10, of course… in which case it’s presumably back to being inaccessible once more. And so, in cat-and-mouse fashion, ad infinitum…
But here’s the thing: Fedora is, by design, bleeding edge, cutting edge… you name your edge, Fedora is supposed to be on it! So it is likely to be getting new kernel releases every alternate Thursday afternoon, probably. What chance the ZFS developers will match that release cadence, do you think… given that their last stable release is now 4 months old?
About zilch I’d say. Which gives rise to a certain ‘impedance mismatch’, no? Try running ZFS on Fedora, it seems to me, and you’ll be consigning yourself to quite regularly not being able to access your data at all for weeks or months on end, several times a year. (Point releases of the 4.x kernel have been coming every two or three months since 4.0 was unleashed in April 2015, after all).
It strikes me that ZFS and Fedora are, in consequence, not likely to be good bed-fellows, which is a shame.
Perhaps it is time to investigate the data preservative characteristics of Btrfs at last?!
Incidentally, try installing ZFS on a 4.9-kernel-using-Fedora 25 whilst the 0.6.5.8 version of ZFS is the latest-and-greatest on offer and the error you’ll get is this:
The keywords to look for are ‘Bad return status’ and ‘spl-dkms scriptlet failed’. Both mean that the spl-dkms package didn’t get installed, and the net effect of that is the ZFS kernel modules don’t get loaded. In turn, this means trying to issue any ZFS-related commands will fail:
Of course, you will think that you should then do as the error message tells you: run ‘/sbin/modprobe zfs’ manually. It’s only when you try to do so you see the more fundamental problem:
And there’s no coming back from that. 🙁
No practical ZFS for a distro? That’s a bit of a deal-breaker for me these days.
I nearly fell off my chair this week. In my recent and fairly exhaustive trawl through over two dozen distros and their variants, I found one I liked a lot. Which maybe isn’t so chair-topplingly surprising, in and of itself.
The real surprise was that the distro in question was (drum roll, please…): Fedora.
And I nearly fell off my chair backwards when I further found that the desktop environment I liked most in the Fedora context was… Gnome.
It’s clean and uncluttered (especially compared to the busy-ness that is the Fedora KDE spin). It is responsive. And once one has discovered Gnome Shell extensions such as Dash-to-Dock; or used Gnome Tweak Tool to add back Maximize and Minimize buttons to the window decorations… it turns out to be quite highly usable and productive.
All of which surprised me a lot: I have been avoiding anything to do with Gnome in general for quite a few years (since the whole Gnome Shell debacle in 2011) and anything to do with Fedora specifically for some more years than that. But both would appear to have made stealthy progress that impresses this particular traveller from foreign lands (i.e., KDE and Manjaro!) no end. Fedora even looks typographically sane these days. Who would have thought that possible, given their cavalier approach to all things font-y in the past?
The one (quite big) black spot is the lack of an extension that allows Gnome’s windows to wobble or a Desktop Cube to spin (there is one for wobbly windows, but it doesn’t work very well). Can I live without wobbly windows? Possibly, given other things the Gnome environment provides (such as Boxes, which potentially means no more VirtualBox or VMware, though it’s by no means a perfect replacement as yet; and -most impressively- excellent integration with cloud services like Google Drive).
In consequence whereof, I think my days of feeling forced to use KDE might be behind me. I have a spare laptop or two that will become guinea pigs in a ‘transition to Fedora’ experiment this week. We’ll see how it goes…