Catch Up Delayed…

Sadly, my uncle died on Easter Sunday, at the (for our family!) grand age of 82. My aunt is not taking it particularly well which, given they were married for 59 years, is not entirely surprising.

I will say that getting 12cR2 installed on Ubuntu-based distros is not just a question of altering the gcc version in use, as I had hoped in my last post: I tried that without success. Meaning that we are still without 12cR2 working on any Ubuntu-based distro and with no obvious, simple fixes.

And, I’m afraid, given the need to organise family stuff and deal with funeral attendance etc., it’s unlikely I’m going to get much further with Ubuntu and 12cR2 for some weeks to come.

I’ll get back to it when I can. Meantime, please forgive and understand the posting silence that will now ensue for some time.

Atlas and 12cR2

Continuing the saga of “catching up” with software developments whilst I was en route to the UK from Australia, the question next to be addressed is simply: does Atlas work to allow Oracle 12c Release 2 to be installed onto the many and various distros it claims to support.

The answer is a bit mixed, I’m afraid!

First thing to say: I deliberately coded Atlas to specifically declare 12cR2 ‘didn’t work yet’ when I first released it. So, for it to work for 12cR2 at all, that bit of code has to be removed -and that means Atlas straightaway gets bumped to version 1.2.

Once that code has gone, the good news is that Atlas will successfully help 12cR2 be installed on the majority of the distros it works with (15 out of 19, as it happens). But the bad news is that if your distro is Ubuntu or even just Ubuntu-based, then the linking phase of the Oracle 12c Release 2 installation fails irreparably after Atlas has been run.

The complete list of distros and their results with Atlas for 12c Release 2 is therefore as follows:

Debian 8.2+ ............................ Works fine
Linux Mint 18+ ......................... Fails
Mint Debian Edition 2+ ................. Works fine
Red Hat ES 7.0+ ........................ Works fine
Scientific Linux 7.0+ .................. Works fine
CentOS 7.0+ ............................ Works fine
OpenSuse Leap 42+ ...................... Works fine
Antergos  2016.11+ ..................... Works fine
elementary OS 0.4+ ..................... Fails
Mageia 5+ .............................. Works fine
Korora 25+ ............................. Works fine
Zorin Core 12 .......................... Fails
Ubuntu 16+ ............................. Fails
Manjaro 15+ ............................ Works fine 
Fedora 23+ ............................. Works fine
Peppermint Linux 7+ .................... Fails
GeckoLinux Static 422+ ................. Works fine
Chapeau Linux 24+ ...................... Works fine
PCLinuxOS 2016+ ........................ Works fine

As I say, the pattern is pretty obvious and I suspect the problem is that the gcc versioning tricks I had to pull to get 12cR1 to compile properly on any Ubuntu-based distro back in January are now the cause of woes for 12cR2. The failure always manifests itself in the following manner:

There’s no recovering from that as yet, but I hope to get it sorted within the week.

On the other hand, for any distro listed as working fine in the above list, you can expect the usual plain sailing:

That’s specifically Oracle 12.2.0.1 running on GeckoLinux, which is a spin of openSUSE Tumbleweed, but you get the same outcome for all non-Ubuntu distros.

Whilst testing all this, I discovered a couple of distros which had incremented their version strings since January (Manjaro, for instance, now reports itself as version ’17.something’, so the part of Atlas where it checked for version strings containing the numbers 15 or 16 obviously needed updating). Those sorts of versioning updates are now also included in Atlas 1.2.

There is very little updating of the Atlas doco to do, happily. For a start, you will obtain the new Atlas version just by running exactly the same wget command as was previously documented: the URL alias simply points to the latest version, but the URL itself doesn’t change.

When you install Oracle 12cR2 onto any of these non-standard distros (except the RHCSL ones, of course), you will get this dire-looking warning:

It’s better than the 12cR1 equivalent, which was to say ‘your system is inadequate’! Anyway, for all the distros for which Atlas works at all, it’s perfectly OK to say ‘yes, I want to continue’. The installation will succeed anyway.

Have fun… and wish me luck whilst battling with Ubuntu later this week 🙂

Atlas, Ubuntu 17 and 12cR1

As I mentioned last post, one major issue that has arisen since I left Australia is this: does Atlas run on the freshly-released Ubuntu 17.04 to make Oracle 12c Release 1 (12.1.0.2) installations a piece of cake? It was working fine on 16.04 and 16.10; but would things break with the fresh 17.04 release?

Here’s the answer:

To be fair, I had to alter the Atlas script in one tiny respect: it originally tested for the existence of ’16’ as one component of the distro’s version string. That was obviously a bit restrictive! I’ve now added ’17’ as a test, too… and that means Atlas’s version bumps to 1.1.

You don’t need to do anything different than was originally documented, though: the ‘wget http://bit.do/dizatlas -O atlas.sh‘ instruction works as it always did. It’s just that it now downloads the newer Atlas script instead of the original.

Catch Up

Well, it’s been a rather longer break than I had expected, but I now found myself living in my new 1930s house in sunny Nottingham. The sale finally completed on March 24th; I moved in on April 1st; the furniture from Australia doesn’t get here until early June. In the meantime, I have one sofa bed, 2 plates, 2 cups, 3 pairs of knives and forks and a kettle to live with. A comfortable transition it is not!

But I have proper Internet (50Mb/sec seems like luxury to me, anyway) and the cats arrived safely from Down Under just in time for Easter. They seem to have survived the journey without mishap and don’t appear to be too put out about it. Finding an English cat food they actually like is taking time and many trips to different supermarkets, however!

Short version: private life is still a bit on-hold and will be until at least June, after which we can think of renovating the place properly and turning it into the Art Deco palace we think it could be.

But back to techie stuff: there are two major items of interest that require some catching up, I think. The first is the fact that Ubuntu 17.04 was released on 13th April. Does Atlas work with that as it did for 16.10?

Secondly, and much more substantially: does Atlas work with the new Oracle 12c Release 2 version, for any or all of the distros that it was working with for 12cR1?

This being Day 2 of having a functioning Internet, and all my servers still in a cargo ship’s container somewhere between Sydney and Suez, it’s going to take a while before I can definitively address these issues. But I am at least back on the job… whenever the cats will get off my keyboard, at least.

Timing…

If the secret of comedy is timing, then Oracle must be the funniest corporation around!

With less than 24 hours to go before my flight back to the UK, they make 12c Release 2 available for general use.

Naturally, all my PCs and servers are packed, so I only have a feeble airplane-ready spare to do anything on. I have managed a 12cR2 install using Atlas on CentOS 7, so I can at least confirm the new version doesn’t really change anything much as far as the installation process is concerned and hardly anything about Atlas needs to be modified to deal with it. But I haven’t tested all the other distros Atlas claims to work with, so I’m not releasing a 12cR2-ready version of Atlas yet. First order of business when I reach Blighty’s shores, I guess.

The major drama as far as 12cR2 is concerned, from my point of view, is that the EMP table doesn’t exist by default any more 🙁 (but rdbms/admin/utlsampl.sql will create it if needed, as it has always done). A sad day indeed, then.

Plus there are thunderstorms forecast for Sydney tomorrow. It’ll be a bumpy take-off…

Fedora ZFS

Further to my recent post, the good news is that the ZFS team have released a version of ZFS which will work on the latest Fedora:

The 0.6.5.9 version supports kernels up to 4.10 (which is handy, as that’s the next kernel release I can upgrade my Fedora 25 to, so there’s some future-proofing going on there at last). And I can confirm that, indeed, ZFS installs correctly now.

I’m still a bit dubious about ZFS on a fast-moving distro such as Fedora, though, because just a single kernel update could potentially render your data inaccessible until the good folk at the ZFS development team decide to release new kernel drivers to match. But the situation is at least better than it was.

With a move to the UK pending, I am disinclined to suddenly start wiping terabytes of hard disk and dabbling with a new file system… but give it a few weeks and who knows?!

Finished

Finished! And only three weeks late!

I’m referring to Churchill 1.7, a major overhaul of the Churchill framework and its accompanying documentation, making it work on RHCSL 6.8, for 12c only, with Enterprise Manager Cloud Control 13c and a complete overhaul of the bootstrap options available when kickstarting O/S installations.

At some point toward the end of January, it morphed into practically a complete re-write… and I thought seriously about calling it quits and declaring it to be version 2.0. But I’ve stuck with the incremental versioning for now. (I’ve been saving version 2 for when I get round to making it work with RHCSL 7.x distros).

I’m finished in another sense, though, too: the contract to purchase a house in Nottingham is ready to sign and it accordingly looks very much as though I’ll be becoming an ex-Aussie (or a re-Englishman, I suppose, depending on your point of view) on or around 6th March. I may not have much time to post much here given the packing, flight-booking, passport-checking, Internet banking, etc etc shenanigans that now ensue. If I can I will, but otherwise I’ll be back online toward the end of March, live from Nottingham 🙂

What were the odds?

In modernising Churchill to work for Oracle 12c and the latest 6.x releases of RHCSL, I’ve encountered a bizarre bug (#19476913 if you’re able to check up on it), whereby startup of the cluster stack on a remote node fails if its hostname is longer than (or equal to) the hostname of the local node.

That is, if you are running the Grid Infrastructure installer from Alpher (6 characters) and pushing to Bethe (5 characters) then the CRS starts on Bethe just fine: local 6 is greater than remote 5. But if you are running the GI installer on Gamow (5 characters) and pushing to Dalton (6 characters) then the installer’s attempt to restart the CRS on Dalton will fail, since now local 5 is less than remote 6. Alpher/Bethe managed to dodge this bullet, of course -but only by pure luck.

The symptoms are that during the installation of Grid Infrastructure, all works well until the root scripts are run, at which point (and after a long wait), this pops up:

Poke around in the [Details] of that dialog and you’ll see this:

CRS-2676: Start of 'ora.cssdmonitor' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cssd' on 'dalton' 
CRS-2672: Attempting to start 'ora.diskmon' on 'dalton' 
CRS-2676: Start of 'ora.diskmon' on 'dalton' succeeded 
CRS-2676: Start of 'ora.cssd' on 'dalton' succeeded 
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'dalton' 
CRS-2672: Attempting to start 'ora.ctssd' on 'dalton' 
CRS-2883: Resource 'ora.ctssd' failed during Clusterware stack start. 
CRS-4406: Oracle High Availability Services synchronous start failed. 
CRS-4000: Command Start failed, or completed with errors. 2017/02/18 10:21:41 
CLSRSC-117: Failed to start Oracle Clusterware stack Died at /u01/app/12.1.0/grid/crs/install/crsinstall.pm line 914.

The installation log is not much more useful: it just documents everything starting nicely until it fails for no discernible reason when trying to start ora.ctssd.

Take exactly the same two nodes and do the installation from the Dalton node, though, and everything just works -so it’s not, as I first thought it might be, something to do with networks, firewalls, DNS names resolution or the myriad other things that RAC depends on being ‘right’ before it will work. It’s purely and simply a matter of whether the local node’s name is longer or shorter than the remote node’s!

The problem is fixed in PSU 1 for 12.1.0.2, but it’s inappropriate to mandate its use in Churchill, since that’s supposed to work with the vanilla software available from OTN (I assume my readers lack support contracts, so everything has to work as-supplied from OTN for free).

The obvious fix for Churchill, therefore, is to (a) either make the ‘Gamow’ name one character longer (maybe spell it incorrectly as ‘gammow’?); or find a ‘D’ name that is both a physicist and only 4 characters long or fewer; or (c) change both names ensuring that the second is shorter than the first.

Largely due to the distinct lack of short-named, D-named physicists, I’ve gone for the (c) option: Churchill 1.7 therefore builds its Data Guard cluster using hosts geiger and dirac. Paul Dirac (that’s him on the top-left) was an English theoretical physicist, greatly admired by Richard Feynman (which makes him something of a star in these parts) and invented the relativistic equation of motion for the wave function of the electron. He used his equation to predict the existence of the positron -and of anti-matter in general, something for which he won a share of the 1933 Nobel prize for physics. Geiger is a frankly much less distinguished physicist whose main claim to fame is that he invented (most of) the Geiger counter and wasn’t (apparently) a Nazi. He gets into the Churchill Pantheon by the skin of his initial letter and not much else, to be honest!

Short version then: Churchill 1.7 now uses Alpher/Bethe and Geiger/Dirac clusters, and both Gamow and Dalton are no more. Quite a bit of documentation needs updating to take account of this trivial change! Hopefully, I should have that sorted by the end of the day. And that will teach me to test all parts of Churchill before declaring that ‘it works with 12c’. (Oooops!)

A lick of paint…

Careful readers will note that I’ve re-jigged the look and feel of the place!

I am not entirely sure I like the results as yet, but it’s certainly a bit slicker and (I think) more functional.

If you spot any visual ‘anomalies’ arising, let me know in the comments… and if you can’t stand the new look (or feel it’s the best thing since sliced bread), let me know there, too. These things can always be reversed or tweaked 🙂

Churchill Changes

I’ve just published version 1.7 of Churchill. It contains a lot of changes.

The main one is that I’ve removed a few dependencies on .i686 packages. That means the O/S installations can now all take place off the first DVD alone. No second DVD is prompted for, in other words.

This in turn brings about the biggest single benefit of the new release: it works with CentOS 6.8 (and Scientific Linux 6.8, too).

In fact, Churchill 1.7 now reverts to making CentOS 6.8 the default O/S assumed to be in use. (You can still always specify ‘redhat’, ‘sl’ or ‘oel’ if you prefer to use real Red Hat, Scientific Linux or Oracle Enterprise Linux, of course; and you can always specify an earlier distro version if you prefer to stick with (say) 6.3 -though I can’t think why you’d particularly want to).

The other big change is that the bootstrap lines are now trivially easy. Back in 2013 when I first released Churchill, it seemed like a good idea to make it as flexible as possible so that users could specify their own IP addresses and hostnames; but this just made for really lengthy bootstrap lines and confused the heck out of everybody!

So, the simplify brush has been daubed all over Churchill. You now must use the speed keys 1 to 4 as you build your nodes (or you can instead specify their corresponding hostnames). By specifying the speed key or hostname, you automatically define all the pesky details about IP address and interconnect IP address. It makes things a lot simpler and less confusing, I think. It also makes it a bit less flexible… but that’s the price you pay for simplicity. Ask the Gnome developers!

Other changes flow in consequence: the filecopy=y/n parameter is no longer required. If you are building nodes 1 or 3, file copying is assumed to be ‘yes’; if you are building nodes 2 or 4, it’s assumed to be ‘no’. Likewise, there’s no dg=y/n parameter any more: if you are building nodes 1 or 2, it’s assumed to be ‘no’; but build nodes 3 or 4, it’s assumed to be ‘yes’.

It is still possible to say sk=1&rac=n (or hostname=alpher&rac=n), though: that’s if you are building node 1 but want to use it in standalone mode.

As with the previous release (1.6), Churchill only works to build 12c standalone and RAC/Data Guard environments: there is no Oracle 11g support these days.

These are substantial changes and mean Churchill now works in ways quite different from before. That obviously affects the way it is documented. Those documentation changes are being made as I write and should be ‘live’ by the time you read this. Since the old versions of Churchill are kept available on the old site, I’ll keep the original documentation available on that old site too, at least for now.

On this site, however, only the 1.7 version will be available and documented.