Dizwell Informatics

News from Nowhere

The Dizwell Blog

What were the odds?

In modernising Churchill to work for Oracle 12c and the latest 6.x releases of RHCSL, I’ve encountered a bizarre bug (#19476913 if you’re able to check up on it), whereby startup of the cluster stack on a remote node fails if its hostname is longer than (or equal to) the hostname of the local node.

That is, if you are running the Grid Infrastructure installer from Alpher (6 characters) and pushing to Bethe (5 characters) then the CRS starts on Bethe just fine: local 6 is greater than remote 5. But if you are running the GI installer on Gamow (5 characters) and pushing to Dalton (6 characters) then the installer’s attempt to restart the CRS on Dalton will fail, since now local 5 is less than remote 6. Alpher/Bethe managed to dodge this bullet, of course -but only by pure luck.

The symptoms are that during the installation of Grid Infrastructure, all works well until the root scripts are run, at which point (and after a long wait), this pops up:

Poke around in the [Details] of that dialog and you’ll see this:

CRS-2676: Start of 'ora.cssdmonitor' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cssd' on 'dalton' 
CRS-2672: Attempting to start 'ora.diskmon' on 'dalton' 
CRS-2676: Start of 'ora.diskmon' on 'dalton' succeeded 
CRS-2676: Start of 'ora.cssd' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'dalton' 
CRS-2672: Attempting to start 'ora.ctssd' on 'dalton' 
CRS-2883: Resource 'ora.ctssd' failed during Clusterware stack start. 
CRS-4406: Oracle High Availability Services synchronous start failed. 
CRS-4000: Command Start failed, or completed with errors. 2017/02/18 10:21:41 
CLSRSC-117: Failed to start Oracle Clusterware stack Died at /u01/app/12.1.0/grid/crs/install/crsinstall.pm line 914.

The installation log is not much more useful: it just documents everything starting nicely until it fails for no discernible reason when trying to start ora.ctssd.

Take exactly the same two nodes and do the installation from the Dalton node, though, and everything just works -so it’s not, as I first thought it might be, something to do with networks, firewalls, DNS names resolution or the myriad other things that RAC depends on being ‘right’ before it will work. It’s purely and simply a matter of whether the local node’s name is longer or shorter than the remote node’s!

The problem is fixed in PSU 1 for 12.1.0.2, but it’s inappropriate to mandate its use in Churchill, since that’s supposed to work with the vanilla software available from OTN (I assume my readers lack support contracts, so everything has to work as-supplied from OTN for free).

The obvious fix for Churchill, therefore, is to (a) either make the ‘Gamow’ name one character longer (maybe spell it incorrectly as ‘gammow’?); or find a ‘D’ name that is both a physicist and only 4 characters long or fewer; or (c) change both names ensuring that the second is shorter than the first.

Largely due to the distinct lack of short-named, D-named physicists, I’ve gone for the (c) option: Churchill 1.7 therefore builds its Data Guard cluster using hosts geiger and dirac. Paul Dirac (that’s him on the top-left) was an English theoretical physicist, greatly admired by Richard Feynman (which makes him something of a star in these parts) and invented the relativistic equation of motion for the wave function of the electron. He used his equation to predict the existence of the positron -and of anti-matter in general, something for which he won a share of the 1933 Nobel prize for physics. Geiger is a frankly much less distinguished physicist whose main claim to fame is that he invented (most of) the Geiger counter and wasn’t (apparently) a Nazi. He gets into the Churchill Pantheon by the skin of his initial letter and not much else, to be honest!

Short version then: Churchill 1.7 now uses Alpher/Bethe and Geiger/Dirac clusters, and both Gamow and Dalton are no more. Quite a bit of documentation needs updating to take account of this trivial change! Hopefully, I should have that sorted by the end of the day. And that will teach me to test all parts of Churchill before declaring that ‘it works with 12c’. (Oooops!)

A lick of paint…

Careful readers will note that I’ve re-jigged the look and feel of the place!

I am not entirely sure I like the results as yet, but it’s certainly a bit slicker and (I think) more functional.

If you spot any visual ‘anomalies’ arising, let me know in the comments… and if you can’t stand the new look (or feel it’s the best thing since sliced bread), let me know there, too. These things can always be reversed or tweaked 🙂

Churchill Changes

I’ve just published version 1.7 of Churchill. It contains a lot of changes.

The main one is that I’ve removed a few dependencies on .i686 packages. That means the O/S installations can now all take place off the first DVD alone. No second DVD is prompted for, in other words.

This in turn brings about the biggest single benefit of the new release: it works with CentOS 6.8 (and Scientific Linux 6.8, too).

In fact, Churchill 1.7 now reverts to making CentOS 6.8 the default O/S assumed to be in use. (You can still always specify ‘redhat’, ‘sl’ or ‘oel’ if you prefer to use real Red Hat, Scientific Linux or Oracle Enterprise Linux, of course; and you can always specify an earlier distro version if you prefer to stick with (say) 6.3 -though I can’t think why you’d particularly want to).

The other big change is that the bootstrap lines are now trivially easy. Back in 2013 when I first released Churchill, it seemed like a good idea to make it as flexible as possible so that users could specify their own IP addresses and hostnames; but this just made for really lengthy bootstrap lines and confused the heck out of everybody!

So, the simplify brush has been daubed all over Churchill. You now must use the speed keys 1 to 4 as you build your nodes (or you can instead specify their corresponding hostnames). By specifying the speed key or hostname, you automatically define all the pesky details about IP address and interconnect IP address. It makes things a lot simpler and less confusing, I think. It also makes it a bit less flexible… but that’s the price you pay for simplicity. Ask the Gnome developers!

Other changes flow in consequence: the filecopy=y/n parameter is no longer required. If you are building nodes 1 or 3, file copying is assumed to be ‘yes’; if you are building nodes 2 or 4, it’s assumed to be ‘no’. Likewise, there’s no dg=y/n parameter any more: if you are building nodes 1 or 2, it’s assumed to be ‘no’; but build nodes 3 or 4, it’s assumed to be ‘yes’.

It is still possible to say sk=1&rac=n (or hostname=alpher&rac=n), though: that’s if you are building node 1 but want to use it in standalone mode.

As with the previous release (1.6), Churchill only works to build 12c standalone and RAC/Data Guard environments: there is no Oracle 11g support these days.

These are substantial changes and mean Churchill now works in ways quite different from before. That obviously affects the way it is documented. Those documentation changes are being made as I write and should be ‘live’ by the time you read this. Since the old versions of Churchill are kept available on the old site, I’ll keep the original documentation available on that old site too, at least for now.

On this site, however, only the 1.7 version will be available and documented.

The Invisible Man

Taking its cue from the news that the Chrome and Firefox browsers are, in their latest releases to start flagging non-https websites as ‘insecure’, the Dizwell menagerie of sites has now gone all-https.

I believe that anything linked to http will automatically re-direct to the correct https equivalent, but since I don’t have the age of the Universe to check, I can’t be 100% sure about it. Please let me know if you find something that doesn’t seem to be working that once did, particularly if it stops you using something like Atlas.

Frankly, there’s not much personal information on this site at the best of times, so there was no real point in making the move before now. That said, it’s also true that Google have been down-rating http links in their search results pages for quite a while, favouring https ones instead. Anything I can do to bump my pages up a notch at Google is a good thing, I guess 🙂

Leave well alone


ZFS not being a viable option on Fedora, I wanted to create a RAID5 array using mdadm, formatted with XFS.

There are lots of suggestions around the web about how best to do the XFS formatting to achieve optimal performance.

Essentially, the optimzation process revolves around setting stripe unit and stripe width settings. What the “correct” values are for those things can vary according to who you read (and, more pertinently, what you expect to store on your array -lots of little files, or a preponderance of very big files (like virtual machine images, movies and so on).

On the other hand, there’s a school of thought that says XFS is reasonably smart and if you just let it default to automatically-determined values, things will probably be ok.

So I thought I’d check it out.

First, I created my array and let mdadm complete the initial ‘recovery’, so no background I/O was taking place. Then I formatted it as follows:

mkfs.xfs -f -b size=4096 -d sunit=512,swidth=1024 -L bulkdata /dev/md0

And then I benchmarked the array using Gnome’s built-in disks utility:

So not too shabby: 145MB/sec read, 69 MB/sec write and an access time of 15.5msec.

Then I reformatted the array like so:

mkfs.xfs -L bulkdata /dev/md0

Re-measuring the performance in the same way as before, these were the results:

This time, reads of 165MB/s, writes of 74MB/s and access times of 15.5msec. So, significantly faster reads, moderately faster writes and access times about the same: I’d say letting XFS work it out for itself is probably as good a strategy as any -for my hardware, at any rates.

I repeated the tests multiple times, and I also introduced different formatting options with different values for the stripe unit and width and blocksize: I could certainly get worse results, but I was never able to improve on the ‘just let it work it out’ results.

Your mileage may vary, of course. But me: I’m leaving my XFS array well alone and letting it sort itself out!

Must… Lock… Wallet… Away…

I didn’t know this site existed until today, but it’s fair to say that since I found it (courtesy of a link on the Peppermint Linux website), I am having difficulties keeping my wallet closed.

ToH will not approve of the extravagance of Linux embroidery on my t-shirts, however, so it might (unfortunately) be easier to hold fire than I had first thought… but who knows how long I shall be able to hold out?!

Sadly, they don’t do a Fedora one. Or a Manjaro one.

Mercifully, they don’t do an Oracle Enterprise one either (I guess the t-shirt would have to be tomato red if they did).

The SUSE chameleon one is really nice… but I don’t run that these days, so wearing it would just make me a fraud. Same goes for the Debian swirl-only…

So I figure I may have to settle for a simple Tux logo.

Unless I change my main desktop distro again. Hmmm…

Fed Up, Zed Up?

Spot the problem shown in these two screen grabs:

You will immediately notice from the second screenshot that version 0.6.5.8 of whatever it is which happens to have been screenshotted only supports up to 4.8 kernels, whereas the first screenshot shows that a Fedora 25 installation is using kernel version 4.9. Clearly that Fedora installation won’t be able to run whatever is being referred to in the second screenshot.

So what is it that I’ve taken that second screen shot of? This:

Ooops. It happens to be a screenshot of the current stable release number of the world’s greatest file system for Linux.

Put together, and in plain English, the combination of the two version numbers means: I can’t install ZFS on Fedora.

Or rather, I could have done so when Fedora 25 was freshly installed, straight off the DVD (because it ships with a 4.8 kernel, so the 0.6.5.8 version of ZFS would have worked just fine on that). ZFS on 4.8-kernel-using-Fedora 25 works fine, therefore.

But if I had, say, copied 4.8TB of data onto a freshly created zpool and then updated Fedora, I would now not be able to access my 4.8TB of data at all (because the relevant ZFS kernel modules won’t be able to load into the newly-installed 4.9 kernel). Which sort of makes the ZFS file system a bit less than useful, no?!

Of course, once they release version 0.7 version of ZFS (which is currently at release candidate 2 state), then we’re back in business -because ZFS 0.7 supports 4.9 kernels. Unless Fedora go and update themselves to using kernel 4.10, of course… in which case it’s presumably back to being inaccessible once more. And so, in cat-and-mouse fashion, ad infinitum…

But here’s the thing: Fedora is, by design, bleeding edge, cutting edge… you name your edge, Fedora is supposed to be on it! So it is likely to be getting new kernel releases every alternate Thursday afternoon, probably. What chance the ZFS developers will match that release cadence, do you think… given that their last stable release is now 4 months old?

About zilch I’d say. Which gives rise to a certain ‘impedance mismatch’, no? Try running ZFS on Fedora, it seems to me, and you’ll be consigning yourself to quite regularly not being able to access your data at all for weeks or months on end, several times a year. (Point releases of the 4.x kernel have been coming every two or three months since 4.0 was unleashed in April 2015, after all).

It strikes me that ZFS and Fedora are, in consequence, not likely to be good bed-fellows, which is a shame.

Perhaps it is time to investigate the data preservative characteristics of Btrfs at last?!

Incidentally, try installing ZFS on a 4.9-kernel-using-Fedora 25 whilst the 0.6.5.8 version of ZFS is the latest-and-greatest on offer and the error you’ll get is this:

The keywords to look for are ‘Bad return status’ and ‘spl-dkms scriptlet failed’. Both mean that the spl-dkms package didn’t get installed, and the net effect of that is the ZFS kernel modules don’t get loaded. In turn, this means trying to issue any ZFS-related commands will fail:

Of course, you will think that you should then do as the error message tells you: run ‘/sbin/modprobe zfs’ manually. It’s only when you try to do so you see the more fundamental problem:

And there’s no coming back from that. 🙁

No practical ZFS for a distro? That’s a bit of a deal-breaker for me these days.

Recliner Rocker Shocker!

I nearly fell off my chair this week. In my recent and fairly exhaustive trawl through over two dozen distros and their variants, I found one I liked a lot. Which maybe isn’t so chair-topplingly surprising, in and of itself.

The real surprise was that the distro in question was (drum roll, please…): Fedora.

And I nearly fell off my chair backwards when I further found that the desktop environment I liked most in the Fedora context was… Gnome.

It’s clean and uncluttered (especially compared to the busy-ness that is the Fedora KDE spin). It is responsive. And once one has discovered Gnome Shell extensions such as Dash-to-Dock; or used Gnome Tweak Tool to add back Maximize and Minimize buttons to the window decorations… it turns out to be quite highly usable and productive.

All of which surprised me a lot: I have been avoiding anything to do with Gnome in general for quite a few years (since the whole Gnome Shell debacle in 2011) and anything to do with Fedora specifically for some more years than that. But both would appear to have made stealthy progress that impresses this particular traveller from foreign lands (i.e., KDE and Manjaro!) no end. Fedora even looks typographically sane these days. Who would have thought that possible, given their cavalier approach to all things font-y in the past?

The one (quite big) black spot is the lack of an extension that allows Gnome’s windows to wobble or a Desktop Cube to spin (there is one for wobbly windows, but it doesn’t work very well). Can I live without wobbly windows? Possibly, given other things the Gnome environment provides (such as Boxes, which potentially means no more VirtualBox or VMware, though it’s by no means a perfect replacement as yet; and -most impressively- excellent integration with cloud services like Google Drive).

In consequence whereof, I think my days of feeling forced to use KDE might be behind me. I have a spare laptop or two that will become guinea pigs in a ‘transition to Fedora’ experiment this week. We’ll see how it goes…

A Universal Pre-Installer

It is the new year, and nearly my birthday. So I thought I would treat myself to a streamlined and modular way of installing Oracle 12c onto practically any Linux distro I fancied.

Say ‘hi’ to Atlas, a single script that shoulders the burden of doing all the preparatory work needed to get Oracle running nicely.

No matter what distro you’re running, you just download Atlas; you chmod it to make it executable, and then you run it. It sorts out everything else after that for you.

Atlas therefore replaces the menagerie of per-distro scripts I developed over the past year (eg, Kirk for CentOS; Mandela for Ubuntu; Mercury for Manjaro and so on). Where those per-distro scripts worked to get 11g installed, I’ll keep them (because Atlas is 12c-only), though I won’t maintain them further. But if the distro-specific script only did 12c, it now disappears: Atlas is its complete functional replacement.

I’ve put together a landing page, explaining what specific distros Atlas has been tested with (at the last count somewhere north of 20) and the details of how it works and how to use it.

Whilst I’ve got the thing working on all distros mentioned on that page, distro-specific documentation will take a bit of time to arrive. That for Debian is already done. The others are coming, hopefully before the week is out.

Recompile with -fPIC

Let me start by wishing a happy New Year to all my readers, complete with fireworks from our local council’s display!

And then let’s swiftly move on to the bad news!

If you are interested in installing Oracle onto non-approved Linux distros, you are very soon going to have to contend with this sort of error message:

/usr/bin/ld: /u01/app/oracle/product/12.1.0/db_1/lib//libpls12.a(pci.o): relocation R_X86_64_32 against `.rodata.str1.4' can not be used when making a shared object; recompile with -fPIC

This will be found in the Oracle installer’s log immediately after the “linking phase” of the Oracle installation starts.

Unfortunately, the error message dialog that appears at this point looks like this:

…and that particular error message has long been familiar from 12.1.0.1 installs on assorted distros. The workarounds then were to add various compilation flags to assorted makefiles.

But in this case, the graphical error dialog deceives: for a start, this is happening on 12.1.0.2, and although the dialog text is the same as back in the 12.1.0.1 days, the underlying cause is completely different. It’s only when you inspect the installActions log do you (eventually!) see the error text I showed above, which tells you that this is no “ordinary” compilation problem.

Welcome to the world of position-independent code.

Putting it as simply as I know how, the basic idea of position-independent code is that it allows execution of code regardless of its absolute memory address. It’s thus a ‘good thing’, on the whole.

Trouble is, if objects within the code you’re trying to compile haven’t themselves been compiled to be position-independent, then you aren’t yourself allowed to compile the code that references them into shared libraries.

As the error message above says, since “pci.o” isn’t position-independent, you can’t compile references to it into the libpls12 library. Note that the error message does not mean that your attempt to compile libpls12 should use fPIC: if it meant that, you could do something about it. No: it’s telling you that pci.o was compiled by Oracle without fPIC. Only if they re-compile that object with the fPIC compiler option switched on would you then be able to compile it into the libpls12 library successfully.

If you’re best mates with Larry, then, perhaps you’ll be able to get him to do the necessary recompilations for you! Mere mortals, however, are stuck with the unhappy fact that vast swathes of the Oracle 12c code-base has not been compiled to be position-independent… and there’s nothing you can do to fix that when your version of gcc insists that it should be.

The problem doesn’t manifest itself on all distros: Ubuntu 16.04, for example, has no problem installing 12c at all (apart from the usual ones associated with not using a supported distro, of course!) But Ubuntu 16.10 does have this fatal problem. Similarly, Debian 8 is fine with Oracle 12c, but Debian 9 (the testing branch, so not yet ready for mainstream release) fails. And whereas Manjaro didn’t have a problem earlier in the year when I first released my mercury pre-installer script, it does now.

This, of course, gives us a clue: there’s clearly some component of these distros which is being upgraded over time so that later releases fail where earlier ones didn’t. So what’s the upgraded component causing all the trouble?

Perhaps unsurprisingly, given that it’s a compilation error that shows you there’s a problem in the first place, it turns out that the gcc compiler is the culprit.

If you do a fresh install of Ubuntu 16.04 (the long-term support version, so still very much current and relevant), whilst making sure NOT to update anything as part of the installation process itself, issuing the command gcc -v will show you that version 5.4.0 is in use. Do the same thing on Ubuntu 16.10, however, and you’ll discover you’re now using gcc version 6.2.0.

A fresh Debian 8.6 install, subjected to an apt-get install gcc command,  ends up running gcc version 4.9.2. The same thing done to a fresh Debian 9 install results in a gcc version of 6.2.1.

Manjaro is a rolling release, of course, so it’s software components are forever being incrementally upgraded: it makes finding out what gcc version was in use at the start of the year rather tricky! So I don’t have hard evidence for the gcc version shift there -but my main desktop is currently reporting version 6.2.1, so I’ll stick my neck out and say that I would lay odds that, had I checked back in January 2016, I think I would have found it to be around version 5.something.

In short, for all three distros currently under my microscope, a shift from gcc 4- or 5-something to 6-something has taken place… and broken Oracle’s installation routine in the process.

It means that all distros will eventually come across this compilation problem as they eventually upgrade their gcc versions. Expect Fedora to keel over in short order, for example, when their version 26 is released next April (assuming they go for a late-release version of gcc 6.something, which I expect they will). No doubt we’ll have all moved on to installing Oracle 12c Release 2 by then, which is probably suitably position-independent throughout… so maybe no-one will ever have to worry about this issue again. But in the meantime… the constantly changing nature of gcc is a problem.

So, what’s to be done if you want Oracle 12c installed on these distros with their fancy new gcc versions? Nothing I could really think of, except to ensure that the old, functional versions of gcc and related development tools are installed …and that can be easier said than done!

On Debian 9 (‘testing’), for example, instead of just saying apt-get install gcc, you need to now say apt-get install gcc-5, which ensures the ‘right’ compiler version is installed, with which the Oracle installer can live. Thus can this screenshot be taken:

…which shows me happily querying the EMP table on 12.1.0.2 whilst demonstrating that I’m running Debian Testing (codenamed “Stretch”). That’s only possible by careful curating of your development tool versions.

The same sort of ‘install old gcc version and make it the default’ trick is required to get 12c running on Ubuntu 16.10 too:

-though I had to specifically install gcc-4.9 rather than “gcc-5”, since the compilation error still arose when ‘gcc-5’ was installed. These things get tricky!

Anyway: there it is. Gcc’s constant version increments create havoc with Oracle 12c installations. Coming to a distro near you, soonish!