Timing…

If the secret of comedy is timing, then Oracle must be the funniest corporation around!

With less than 24 hours to go before my flight back to the UK, they make 12c Release 2 available for general use.

Naturally, all my PCs and servers are packed, so I only have a feeble airplane-ready spare to do anything on. I have managed a 12cR2 install using Atlas on CentOS 7, so I can at least confirm the new version doesn’t really change anything much as far as the installation process is concerned and hardly anything about Atlas needs to be modified to deal with it. But I haven’t tested all the other distros Atlas claims to work with, so I’m not releasing a 12cR2-ready version of Atlas yet. First order of business when I reach Blighty’s shores, I guess.

The major drama as far as 12cR2 is concerned, from my point of view, is that the EMP table doesn’t exist by default any more 🙁 (but rdbms/admin/utlsampl.sql will create it if needed, as it has always done). A sad day indeed, then.

Plus there are thunderstorms forecast for Sydney tomorrow. It’ll be a bumpy take-off…

Finished

Finished! And only three weeks late!

I’m referring to Churchill 1.7, a major overhaul of the Churchill framework and its accompanying documentation, making it work on RHCSL 6.8, for 12c only, with Enterprise Manager Cloud Control 13c and a complete overhaul of the bootstrap options available when kickstarting O/S installations.

At some point toward the end of January, it morphed into practically a complete re-write… and I thought seriously about calling it quits and declaring it to be version 2.0. But I’ve stuck with the incremental versioning for now. (I’ve been saving version 2 for when I get round to making it work with RHCSL 7.x distros).

I’m finished in another sense, though, too: the contract to purchase a house in Nottingham is ready to sign and it accordingly looks very much as though I’ll be becoming an ex-Aussie (or a re-Englishman, I suppose, depending on your point of view) on or around 6th March. I may not have much time to post much here given the packing, flight-booking, passport-checking, Internet banking, etc etc shenanigans that now ensue. If I can I will, but otherwise I’ll be back online toward the end of March, live from Nottingham 🙂

What were the odds?

In modernising Churchill to work for Oracle 12c and the latest 6.x releases of RHCSL, I’ve encountered a bizarre bug (#19476913 if you’re able to check up on it), whereby startup of the cluster stack on a remote node fails if its hostname is longer than (or equal to) the hostname of the local node.

That is, if you are running the Grid Infrastructure installer from Alpher (6 characters) and pushing to Bethe (5 characters) then the CRS starts on Bethe just fine: local 6 is greater than remote 5. But if you are running the GI installer on Gamow (5 characters) and pushing to Dalton (6 characters) then the installer’s attempt to restart the CRS on Dalton will fail, since now local 5 is less than remote 6. Alpher/Bethe managed to dodge this bullet, of course -but only by pure luck.

The symptoms are that during the installation of Grid Infrastructure, all works well until the root scripts are run, at which point (and after a long wait), this pops up:

Poke around in the [Details] of that dialog and you’ll see this:

CRS-2676: Start of 'ora.cssdmonitor' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cssd' on 'dalton' 
CRS-2672: Attempting to start 'ora.diskmon' on 'dalton' 
CRS-2676: Start of 'ora.diskmon' on 'dalton' succeeded 
CRS-2676: Start of 'ora.cssd' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'dalton' 
CRS-2672: Attempting to start 'ora.ctssd' on 'dalton' 
CRS-2883: Resource 'ora.ctssd' failed during Clusterware stack start. 
CRS-4406: Oracle High Availability Services synchronous start failed. 
CRS-4000: Command Start failed, or completed with errors. 2017/02/18 10:21:41 
CLSRSC-117: Failed to start Oracle Clusterware stack Died at /u01/app/12.1.0/grid/crs/install/crsinstall.pm line 914.

The installation log is not much more useful: it just documents everything starting nicely until it fails for no discernible reason when trying to start ora.ctssd.

Take exactly the same two nodes and do the installation from the Dalton node, though, and everything just works -so it’s not, as I first thought it might be, something to do with networks, firewalls, DNS names resolution or the myriad other things that RAC depends on being ‘right’ before it will work. It’s purely and simply a matter of whether the local node’s name is longer or shorter than the remote node’s!

The problem is fixed in PSU 1 for 12.1.0.2, but it’s inappropriate to mandate its use in Churchill, since that’s supposed to work with the vanilla software available from OTN (I assume my readers lack support contracts, so everything has to work as-supplied from OTN for free).

The obvious fix for Churchill, therefore, is to (a) either make the ‘Gamow’ name one character longer (maybe spell it incorrectly as ‘gammow’?); or find a ‘D’ name that is both a physicist and only 4 characters long or fewer; or (c) change both names ensuring that the second is shorter than the first.

Largely due to the distinct lack of short-named, D-named physicists, I’ve gone for the (c) option: Churchill 1.7 therefore builds its Data Guard cluster using hosts geiger and dirac. Paul Dirac (that’s him on the top-left) was an English theoretical physicist, greatly admired by Richard Feynman (which makes him something of a star in these parts) and invented the relativistic equation of motion for the wave function of the electron. He used his equation to predict the existence of the positron -and of anti-matter in general, something for which he won a share of the 1933 Nobel prize for physics. Geiger is a frankly much less distinguished physicist whose main claim to fame is that he invented (most of) the Geiger counter and wasn’t (apparently) a Nazi. He gets into the Churchill Pantheon by the skin of his initial letter and not much else, to be honest!

Short version then: Churchill 1.7 now uses Alpher/Bethe and Geiger/Dirac clusters, and both Gamow and Dalton are no more. Quite a bit of documentation needs updating to take account of this trivial change! Hopefully, I should have that sorted by the end of the day. And that will teach me to test all parts of Churchill before declaring that ‘it works with 12c’. (Oooops!)

Churchill Changes

I’ve just published version 1.7 of Churchill. It contains a lot of changes.

The main one is that I’ve removed a few dependencies on .i686 packages. That means the O/S installations can now all take place off the first DVD alone. No second DVD is prompted for, in other words.

This in turn brings about the biggest single benefit of the new release: it works with CentOS 6.8 (and Scientific Linux 6.8, too).

In fact, Churchill 1.7 now reverts to making CentOS 6.8 the default O/S assumed to be in use. (You can still always specify ‘redhat’, ‘sl’ or ‘oel’ if you prefer to use real Red Hat, Scientific Linux or Oracle Enterprise Linux, of course; and you can always specify an earlier distro version if you prefer to stick with (say) 6.3 -though I can’t think why you’d particularly want to).

The other big change is that the bootstrap lines are now trivially easy. Back in 2013 when I first released Churchill, it seemed like a good idea to make it as flexible as possible so that users could specify their own IP addresses and hostnames; but this just made for really lengthy bootstrap lines and confused the heck out of everybody!

So, the simplify brush has been daubed all over Churchill. You now must use the speed keys 1 to 4 as you build your nodes (or you can instead specify their corresponding hostnames). By specifying the speed key or hostname, you automatically define all the pesky details about IP address and interconnect IP address. It makes things a lot simpler and less confusing, I think. It also makes it a bit less flexible… but that’s the price you pay for simplicity. Ask the Gnome developers!

Other changes flow in consequence: the filecopy=y/n parameter is no longer required. If you are building nodes 1 or 3, file copying is assumed to be ‘yes’; if you are building nodes 2 or 4, it’s assumed to be ‘no’. Likewise, there’s no dg=y/n parameter any more: if you are building nodes 1 or 2, it’s assumed to be ‘no’; but build nodes 3 or 4, it’s assumed to be ‘yes’.

It is still possible to say sk=1&rac=n (or hostname=alpher&rac=n), though: that’s if you are building node 1 but want to use it in standalone mode.

As with the previous release (1.6), Churchill only works to build 12c standalone and RAC/Data Guard environments: there is no Oracle 11g support these days.

These are substantial changes and mean Churchill now works in ways quite different from before. That obviously affects the way it is documented. Those documentation changes are being made as I write and should be ‘live’ by the time you read this. Since the old versions of Churchill are kept available on the old site, I’ll keep the original documentation available on that old site too, at least for now.

On this site, however, only the 1.7 version will be available and documented.

A Universal Pre-Installer

It is the new year, and nearly my birthday. So I thought I would treat myself to a streamlined and modular way of installing Oracle 12c onto practically any Linux distro I fancied.

Say ‘hi’ to Atlas, a single script that shoulders the burden of doing all the preparatory work needed to get Oracle running nicely.

No matter what distro you’re running, you just download Atlas; you chmod it to make it executable, and then you run it. It sorts out everything else after that for you.

Atlas therefore replaces the menagerie of per-distro scripts I developed over the past year (eg, Kirk for CentOS; Mandela for Ubuntu; Mercury for Manjaro and so on). Where those per-distro scripts worked to get 11g installed, I’ll keep them (because Atlas is 12c-only), though I won’t maintain them further. But if the distro-specific script only did 12c, it now disappears: Atlas is its complete functional replacement.

I’ve put together a landing page, explaining what specific distros Atlas has been tested with (at the last count somewhere north of 20) and the details of how it works and how to use it.

Whilst I’ve got the thing working on all distros mentioned on that page, distro-specific documentation will take a bit of time to arrive. That for Debian is already done. The others are coming, hopefully before the week is out.

Recompile with -fPIC

Let me start by wishing a happy New Year to all my readers, complete with fireworks from our local council’s display!

And then let’s swiftly move on to the bad news!

If you are interested in installing Oracle onto non-approved Linux distros, you are very soon going to have to contend with this sort of error message:

/usr/bin/ld: /u01/app/oracle/product/12.1.0/db_1/lib//libpls12.a(pci.o): relocation R_X86_64_32 against `.rodata.str1.4' can not be used when making a shared object; recompile with -fPIC

This will be found in the Oracle installer’s log immediately after the “linking phase” of the Oracle installation starts.

Unfortunately, the error message dialog that appears at this point looks like this:

…and that particular error message has long been familiar from 12.1.0.1 installs on assorted distros. The workarounds then were to add various compilation flags to assorted makefiles.

But in this case, the graphical error dialog deceives: for a start, this is happening on 12.1.0.2, and although the dialog text is the same as back in the 12.1.0.1 days, the underlying cause is completely different. It’s only when you inspect the installActions log do you (eventually!) see the error text I showed above, which tells you that this is no “ordinary” compilation problem.

Welcome to the world of position-independent code.

Putting it as simply as I know how, the basic idea of position-independent code is that it allows execution of code regardless of its absolute memory address. It’s thus a ‘good thing’, on the whole.

Trouble is, if objects within the code you’re trying to compile haven’t themselves been compiled to be position-independent, then you aren’t yourself allowed to compile the code that references them into shared libraries.

As the error message above says, since “pci.o” isn’t position-independent, you can’t compile references to it into the libpls12 library. Note that the error message does not mean that your attempt to compile libpls12 should use fPIC: if it meant that, you could do something about it. No: it’s telling you that pci.o was compiled by Oracle without fPIC. Only if they re-compile that object with the fPIC compiler option switched on would you then be able to compile it into the libpls12 library successfully.

If you’re best mates with Larry, then, perhaps you’ll be able to get him to do the necessary recompilations for you! Mere mortals, however, are stuck with the unhappy fact that vast swathes of the Oracle 12c code-base has not been compiled to be position-independent… and there’s nothing you can do to fix that when your version of gcc insists that it should be.

The problem doesn’t manifest itself on all distros: Ubuntu 16.04, for example, has no problem installing 12c at all (apart from the usual ones associated with not using a supported distro, of course!) But Ubuntu 16.10 does have this fatal problem. Similarly, Debian 8 is fine with Oracle 12c, but Debian 9 (the testing branch, so not yet ready for mainstream release) fails. And whereas Manjaro didn’t have a problem earlier in the year when I first released my mercury pre-installer script, it does now.

This, of course, gives us a clue: there’s clearly some component of these distros which is being upgraded over time so that later releases fail where earlier ones didn’t. So what’s the upgraded component causing all the trouble?

Perhaps unsurprisingly, given that it’s a compilation error that shows you there’s a problem in the first place, it turns out that the gcc compiler is the culprit.

If you do a fresh install of Ubuntu 16.04 (the long-term support version, so still very much current and relevant), whilst making sure NOT to update anything as part of the installation process itself, issuing the command gcc -v will show you that version 5.4.0 is in use. Do the same thing on Ubuntu 16.10, however, and you’ll discover you’re now using gcc version 6.2.0.

A fresh Debian 8.6 install, subjected to an apt-get install gcc command,  ends up running gcc version 4.9.2. The same thing done to a fresh Debian 9 install results in a gcc version of 6.2.1.

Manjaro is a rolling release, of course, so it’s software components are forever being incrementally upgraded: it makes finding out what gcc version was in use at the start of the year rather tricky! So I don’t have hard evidence for the gcc version shift there -but my main desktop is currently reporting version 6.2.1, so I’ll stick my neck out and say that I would lay odds that, had I checked back in January 2016, I think I would have found it to be around version 5.something.

In short, for all three distros currently under my microscope, a shift from gcc 4- or 5-something to 6-something has taken place… and broken Oracle’s installation routine in the process.

It means that all distros will eventually come across this compilation problem as they eventually upgrade their gcc versions. Expect Fedora to keel over in short order, for example, when their version 26 is released next April (assuming they go for a late-release version of gcc 6.something, which I expect they will). No doubt we’ll have all moved on to installing Oracle 12c Release 2 by then, which is probably suitably position-independent throughout… so maybe no-one will ever have to worry about this issue again. But in the meantime… the constantly changing nature of gcc is a problem.

So, what’s to be done if you want Oracle 12c installed on these distros with their fancy new gcc versions? Nothing I could really think of, except to ensure that the old, functional versions of gcc and related development tools are installed …and that can be easier said than done!

On Debian 9 (‘testing’), for example, instead of just saying apt-get install gcc, you need to now say apt-get install gcc-5, which ensures the ‘right’ compiler version is installed, with which the Oracle installer can live. Thus can this screenshot be taken:

…which shows me happily querying the EMP table on 12.1.0.2 whilst demonstrating that I’m running Debian Testing (codenamed “Stretch”). That’s only possible by careful curating of your development tool versions.

The same sort of ‘install old gcc version and make it the default’ trick is required to get 12c running on Ubuntu 16.10 too:

-though I had to specifically install gcc-4.9 rather than “gcc-5”, since the compilation error still arose when ‘gcc-5’ was installed. These things get tricky!

Anyway: there it is. Gcc’s constant version increments create havoc with Oracle 12c installations. Coming to a distro near you, soonish!

Try it, Sam. Try Fedora.

bluefedora02Fedora 25, the latest release of the Fedora project, was released on 22nd November. Unfortunately, I was busy celebrating Benjamin Britten’s birthday and thus missed the whole thing.

I’d already tested my Bogart script (to help install Oracle 12c on Fedora) with the first beta of the release, so I wasn’t expecting any problems with the eventual production release -and there aren’t any. The thing works as advertised.

Fedora remains one of my ‘bargepole’ distros (as in, “I wouldn’t touch it with a…”), but I know a lot of people that like it for some reason or other, so Bogart is for them!

My Dear Watson…

I was idly browsing through Distrowatch today and noticed that at Number 5 on their list of popular distros for the past six months was one called Elementary OS.

It’s not one I’ve ever heard of before, but it’s Ubuntu-derived, and I got to wondering (a) what it was like and (b) would it run Oracle?

In regards to question (a)… it’s a fairly weird desktop and I don’t particularly like it. The default browser is Epiphany, which is about as non-mainstream as you can get and sets the tone for much else. Fancy changing the colour theme of the default terminal? Well, you can’t (at least, there’s no obvious place to do it that I could find). Fancy writing a letter? Or doing a quick bit of spreadsheeting? Tough, then… because there’s no default Office suite installed (you can add one, obviously. But the point remains that the distro appears to be lacking much of what you’d expect to be included in a modern Linux distro). Windows are closed by clicking on the left-hand side of their top bar, where most “normal” folk are used to clicking in the top right-hand corner. You can’t minimise windows, either: there’s one button that acts as a generic ‘resizer’. Click once to maximise and click twice to resize back to its previous non-maximised size. If you want actual minimization, you have to right-click the app’s title bar and select that option from the context menu… which doesn’t seem terribly sensible to me.

It’s mostly all remediable, I suppose, but you’re in for a lot of configuration and downloading …and I can’t frankly see why you’d go to the effort.

On the other hand, I found the distro very fast (to start, to reboot, to open apps, to close them). It seemed extremely responsive… but I still don’t think that makes the lack of normality a sensible trade-off, personally.

So it’s not a distro I’m going to be doing much with, albeit after a fairly cursory look at it. (Read this article for lots more screenshots and what seems to me a reasonable overall assessment). But I can see that someone wanting a fast, straightforward distro that doesn’t invite a lot of tinkering, would find a lot in it that pleases. (I can imagine parents finding its interface useful for young kids, for example).

I guess that means I needn’t worry too much about whether Oracle runs on it or not, then: there aren’t going to be too many young kids clamouring for it, after all!

But it does. Its Ubuntu heritage means you can download my Mandela script, and make one alteration to it: change line 122 to read:

if [ "$VERCHECK" != "Description: elementary OS 0.4 Loki" ]; then

(i.e., alter the text that the command lsb_release -d is tested to return). Once that’s done, run the script and reboot …and then proceed as if the distro were actually Ubuntu (as described in this article) and you’ll have 12c Enterprise Edition running on it in no time at all:

virtualbox_holmes_28_11_2016_17_41_10

Anyway: for a bit of fun on a wet Wednesday (or Monday!) afternoon, it’s worth a dabble, but I’m frankly surprised the distro rates as highly as it does on Distrowatch and I wouldn’t be losing sleep over it if you’ve never tried it. 🙂

New openSUSE

opensuse-64-bit-2016-11-19-16-03-09I was a bit busy this week, what with having my second cataract removed and all… so I missed the release of a new version of openSUSE on 16th November.

However, I’ve caught up now and as the thumbnail to the left attests, my quatro script works on the new release just fine to facilitate easy Oracle Enterprise Edition installations.

See the Quatro main page for screen-shotted walk-through details of how to perform the installation.

State of Play revisited

churchill150I mentioned some time ago that Churchill was looking pretty sick since no RHCSL 6.x distro with a version number greater than 6.6 seemed to deliver its software on a single DVD any more, and since multi-DVD installs break Churchill, Churchill was doomed.

But I was (happily) quite wrong on that point. It turns out that, in fact, CentOS is the only RHCSL distro that has chosen to package its distro in such a way that it breaks Churchill.

I’ve checked OEL 6.6, 6.7 and 6.8; and then I checked Scientific 6.6., 6.7 and 6.8; and I even checked genuine Red Hat 6.6, 6.7 and 6.8… and every single one of them works fine with Churchill.

So, as I mentioned last time, that now means Scientific Linux is the distro that Churchill will assume is in use if you use its ‘speed keys’ to build your Oracle server nodes. (Rather more bluntly, it means I’ve stopped using CentOS at all, for anything). You can still use OEL or Red Hat if you prefer, by spelling out your requirements on the boot strap line. But use a speed key to save some typing, and SL will be assumed.

And, of course, if you want to use CentOS 6.7 or 6.8 to build your Oracle servers using Churchill, forget it: because they just don’t work.

Why is the new default Scientific Linux and not, say, OEL? Simply because it’s (a) free to obtain without signing up for anything and (b) it’s not as butt-ugly as OEL when you boot it up (white text on a red background is neither pretty nor particularly legible).

Other than this issue of CentOS stopping working, the latest version of Churchill (we’re now up to version 1.6) introduces a rather more radical change of tack: no more support for Oracle 11g installations, of any sort, whatsoever. Of course, version 1.5 is still available which does still do 11g quite nicely -but 11g is so long in the tooth now, and completely out of support unless you happen to be using 11.2.0.4, for which an expensive paid support agreement is needed before it can even be downloaded, that it seemed a waste of time to keep 11g support in newer Churchill releases. So, 12c only it is.

Which of course means that Churchill’s hardware requirements have just gone through the roof to match Oracle’s own. A 2GB VM just doesn’t cut it any more. The new hardware minima are: 5GB RAM and a 40GB hard disk. So building a 2-node RAC and a 2-node Active Data Guard setup on your desktop now basically demands a 32GB desktop… and no laptops need apply.

All of which in turn means that I think it might be time to re-write the Churchill documentation (especially as it currently exists only as pages on the old version of this site): take the opportunity to spruce it up, refresh it, make it part of the native WordPress site and so on. Expect something soon-ish.