A Miscellany

It’s been one of those periods of ‘nothing remakarkable ever happens’. So, in desperation, I decided to try to find a blog post or two in the unremarkable instead.

Let’s start with my little HP Folio 13, my near-two-year-old notebook, of which I said in a recent blog piece, “the Folio only has 4GB RAM, so running multiple simultaneous VMs is not really an option: this Oracle will have to run on the physical machine or not at all”

Absolutely accurate as it stands, in that the thing does indeed ship with only 128GB hard disk and 4GB RAM, which is not enough to hold a decent party, let alone run a decent database.

However, I had reckoned without these guys. Their web site tools found me this:

It’s a 250GB mSATA hard drive (mSATA essentially being the innards of an ordinary solid state hard drive without the fancy external casing). At a stroke, and for relatively modest outlay, I was able to double my disk capacity and its speed. Virtualisation on such a storage platform becomes distinctly do-able.

My second purchase was this:

For a mere AU$100, that 8GB stick of laptop RAM doubles the laptop’s existing capacity -and, again at a stroke, makes it more than capable of hosting a 3-machine Oracle RAC.

Fitting these goodies was not a piece of cake, I have to say, what we me being blessed with fingers that are as dainty as a french Boulangerie’s Baguette-Rex. For the most part, I followed the instructions provided by this kind Internet soul without incident, though I still managed to rip out the connector ribbons that make minor details like the keyboard and monitor work in my heavy-handed case opening attempts. I’m pleased to report, however, that the relevant connectors appear to have been designed with complete Klutzes in mind, so I was able to reconnect them when required and the laptop is now operating normally once more.

So now I am blessed with a 16GB, 1.5TB SSHD monster of a Toshiba laptop for running anything serious (for example, a 2-node RAC and 2-node Data Guard setup, practicing for patches, failovers and switchovers). It is technically portable, and so I can brace my neck and arms and lug into work on the train if I have to.

But with the peanut-sized hardware upgrades mentioned here, however clumsily fitted by yours truly, I am now additionally blessed with an 8GB, 250GB SSHD svelte, bare-noticeable HP ultrabook that I can carry around for hours and not mind… and it’s good enough to run a Windows virtual machine with SQL Server and a 2-node Oracle RAC, so practicsing patching, SQL Server→Oracle replication and such database-y things is trivially easy, without breaking my neck or upper arms.

It’s nice to have rescued a near-two-year-old ultrabook from oblivion, too, because with the additional hardware has not only extended the original machine’s technical capacity, it’s just about doubled its useful lifetime, too.

Flushed with my new hardware capabilities, then, I recently decided to dry-rehearse the update of an Oracle 11.2.0.3.0 RAC to 11.2.0.3.9 (i.e., by applying the January 2014 CPU patchset to it, which for Grid+RAC purposes is patch 17735354). It didn’t go awfully well, to be honest -and the reason it didn’t go very well was instructive!

The basic process of applying a Grid+RAC patch to a node is:

  1. Copy the patchfile to an empty directory owned by the oracle user (I used /home/oracle/patches), and unzip it there
  2. Make sure the /u01/app/grid/OPatch and /u01/app/oracle/product/11.2.0/db_1/OPatch directories on all nodes are wiped and replaced with the latest unzipped p6880880 download (that gets your patching binaries right)
  3. Create an ‘ocm response file’ by issuing the command /u01/app/grid/OPatch/ocm/bin/emocmrsp -no_banner -output /home/oracle/ocm.rsp (on all nodes)
  4. Become the root user, set your PATH to include /u01/app/grid/OPatch and then launch opatch auto /home/oracle/patches -ocmrf /home/oracle/ocm.rsp

After you launch the patch application utility at Step 4, it’s all supposed to be smooth sailing. Unfortunately, whenever I did this on Gamow (the primary node of my standby site and thus the first site to be patched in a ‘standby first’ scenario), I got this result:

2014-02-17 12:56:45: Starting Clusterware Patch Setup
Using configuration parameter file: /u01/app/grid/crs/install/crsconfig_params

Stopping RAC /u01/app/oracle/product/11.2.0/db_1 ...
Stopped RAC /u01/app/oracle/product/11.2.0/db_1 successfully

patch /home/oracle/patches/17592127/custom/server/17592127  apply successful for home  /u01/app/oracle/product/11.2.0/db_1 
patch /home/oracle/patches/17540582  apply successful for home  /u01/app/oracle/product/11.2.0/db_1 

Stopping CRS...
Stopped CRS successfully

patch /home/oracle/patches/17592127  apply failed  for home  /u01/app/grid

Starting CRS...
CRS-4123: Oracle High Availability Services has been started.
Failed to patch QoS users.

Starting RAC /u01/app/oracle/product/11.2.0/db_1 ...
Started RAC /u01/app/oracle/product/11.2.0/db_1 successfully

opatch auto succeeded.

If you read it fast enough, you might just glance at the last line there and think everything is tickety-boo: “opatch auto succeeded”, after all! You might even scan through some of the lines shown getting to that point which say happy things like, “17592127 apply successful for home /u01/app/oracle/product/11.2.0/db_1” and conclude that all’s well. But a keener eye is needed to notice that *one* line says “17592127 apply failed for home /u01/app/grid” and another mentions something about having “Failed to patch QoS users” . So what’s going on: is opatch being successful or not?

The answer lies in the log file which it tells you it’s created. Mine had this sort of stuff in it:

2014-02-17 13:06:51: Successfully removed file: /tmp/fileS5bCZV
2014-02-17 13:06:51: /bin/su exited with rc=1

2014-02-17 13:06:51: Error encountered in the command /u01/app/grid/bin/qosctl -autogenerate
>  Syntax Error: Invalid usage
>
>  Usage: qosctl <username> <command>
>
>    General
>      username - JAZN authenticated user. The users password will always be prompted for.
>
>    Command are:
>      -adduser <username> <password> |
>      -checkpasswd <username> <password> |
>      -listusers |
>      -listqosusers |
>      -remuser <username> |
>      -setpasswd <username> <old_password> <new_password> |
>      -help 
>
>  End Command output
2014-02-17 13:06:51: Running as user oracle: /u01/app/grid/bin/crsctl start resource ora.oc4j
2014-02-17 13:06:51: s_run_as_user2: Running /bin/su oracle -c ' /u01/app/grid/bin/crsctl start resource ora.oc4j '
2014-02-17 13:07:06: Removing file /tmp/file102UrG
2014-02-17 13:07:06: Successfully removed file: /tmp/file102UrG
2014-02-17 13:07:06: /bin/su successfully executed

Again, that last line shows opatch has a nasty habit of declaring success at the drop of a hat! It may distract you from seeing that there’s been a syntactical problem: the patch tool was trying to execute qosctl -autogenerate and encountered a syntax error instead. Clearly, the qosctl program didn’t like “autogenerate” as a command switch. Perhaps at this point you think, “Another fine Oracle stuff-up, but as I don’t use Quality of Service features anyway, this won’t be of significance to me”.

Unfortunately, it will -because the syntax error here is not really what you’re supposed to be looking at. The syntax error is the clue: this autogenerate command would be syntactically correct if the qosctl binaries had been patched to 11.2.0.3.9 (because the autogenerate switch was introduced somewhere around 11.2.0.3.5). So it can only be a syntactical error if the binaries haven’t been patched successfully. And if this particular qosctl binary wasn’t patched, there’s a very good chance that some other binaries that you do make use of will have been skipped too.

But to see evidence for whether that’s a problem or not, you have to look upwards in the patching log, and keep a sharp eye out for this:

2014-02-17 13:05:22: The apply patch output is Oracle Interim Patch Installer version 11.2.0.3.6
 Copyright (c) 2013, Oracle Corporation.  All rights reserved.

 Oracle Home       : /u01/app/grid
 Central Inventory : /u01/app/oraInventory
    from           : /u01/app/grid/oraInst.loc
 OPatch version    : 11.2.0.3.6
 OUI version       : 11.2.0.3.0
 Log file location : /u01/app/grid/cfgtoollogs/opatch/opatch2014-02-17_13-05-18PM_1.log

 Verifying environment and performing prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(6601.28MB) is not available.
 UtilSession failed:
 Prerequisite check "CheckSystemSpace" failed.
 Log file location: /u01/app/grid/cfgtoollogs/opatch/opatch2014-02-17_13-05-18PM_1.log

 OPatch failed with error code 73

2014-02-17 13:05:22: patch /home/oracle/patches/17592127  apply failed  for home  /u01/app/grid

So this comes from about 1 minute before the qosctl syntax error report… and is clearly the source of the original ‘failed to apply’ error that was displayed as part of opatch’s screen output. And the cause for that error is now apparent: the patch failed because a ‘CheckSystemSpace’ prerequisite failed. Or, in plain English, I haven’t got enough free disk space to apply this patch.

If you’re like me, that will surprise you. My file system has a reasonable amount of free space, after all:

[[email protected] db_1]$ df -h
Filesystem         Size  Used Avail Use% Mounted on
/dev/sda2           21G   15G  5.3G  74% /
tmpfs              1.9G  444M  1.5G  24% /dev/shm
balfour:/griddata   63G  3.1G   57G   6% /gdata
balfour:/dbdata     63G  3.1G   57G   6% /ddata

5.3GB of free space is not exactly generous, but it’s non-trivial, too… and yet it seems not to be enough for this patch to feel comfortable.

Anyway, to cut a long story short(er): never just focus on the bleeding obvious errors reported by OPatch. Dig deeper, look harder …you’ll probably find something which explains that the obscure-stated “failed to patch QoS users” is actually just a plea for more disk space.

I’ll wrap this blog piece up to say that I deliberately create my RAC nodes with only 25GB hard disks (it says so in the instructions!). I wondered after this experience whether I’d need to modify my Salisbury and Asquith articles to specify a larger hard disk size than that…. but actually, it turns out not to be necessary. Instead, make sure you delete the contents of the /osource directory before you start patching (that means wiping out the biinaries needed for installing Oracle and Grid… by now, you need neither, of course). If you do this, therefore:

[[email protected] osource]$ cd grid
[[email protected] grid]$ rm -rf *
[[email protected] grid]$ cd ..
[[email protected] osource]$ cd database
[[email protected] database]$ rm -rf *
[[email protected] database]$ df -h
Filesystem         Size  Used Avail Use% Mounted on
/dev/sda2           21G   12G  8.2G  59% /
tmpfs              1.9G  444M  1.5G  24% /dev/shm
balfour:/griddata   63G  3.1G   57G   6% /gdata
balfour:/dbdata     63G  3.1G   57G   6% /ddata

…then I can promise you that 8.2GB of free space is adequate and the 11.2.0.3.9 PSU will be applied without error, second time of asking.

Of course, you may prefer simply to increase the size of the hard disk you’re working on so that there’s loads of free space, regardless of whether you delete things or not. That’s the approach I first took, too… and I ran into all sorts of problems when I tried it. But that’s a story for another blog piece, I think!