The Crackle of Static

I’ve been off-air for quite a while of late, I realise, and the sound of the static crackle in place of my words of wit and wisdom must be getting quite deafening for some. :-)

Well, it’s just because I’ve been incredibly busy at work. I am the development DBA, because there was supposed to be an entirely separate team of operational DBAs. They never materialized and instead, a plan was hatched to outsource operational DBA responsibilities to an entirely separate company. But that never materialized either. Which has left yours truly the development AND operations DBA …for an organization whose primary database runs on Solaris, RAC+Active Data Guard on ASM, uses Virutal Private Database and has 6TB of disk space allocated to it to match the 32CPUs and 64GB RAM. It’s a non-trivial database environment, in other words, and if I screw it up, millions of people will notice and questions might reasonably be asked in Parliament. :-( (I speak quite literally, too! This database is running the public transport system for the whole of Sydney: if it stuffs up, a million commuters will be screaming about it on the front page of the Sydney Morning Herald the next day!)

Oh, and I am also the SQL Server DBA. Indeed, I have morphed into being something of the world-wide corporate expert on SQL Server-to-Oracle replication.

I barely get time to think, basically, let alone maintain a blog.

So, when I can, I will; but often of late, I can’t.

One little nugget I will just drop here (because there’s no other place where it would make a lot of sense!): last Saturday night, I spent 6 hours repeatedly trying to patch our Standby site from 11.2.0.3.5 to 11.2.0.3.9, before declaring it a failure and rolling it back. The next Monday, a colleague took a look at the logs I’d spent a long time poring over to no effect and within seconds had said, “That’s your problem right there: what directory did you run opatch from?”

I’ll back up a little at this point: the standard Oracle doco on RAC installation on ASM makes provision for splitting the ownership of the database software from the ownership of the Grid Infrastructure software. Thus, it proposes creating users generically called “oracle” and “grid” to own each bit respectively. I’ll call this the ‘split ownership’ model of Oracle software installation. It’s standard, well-documented, but optional.

Anyone who has had a go creating their own Salisbury or Asquith RAC and/or Dataguard setups will know, however, that I don’t really like the split ownership model: my installation instructions tell you to create an oracle user and have him own both the database and Grid Infrastructure software alike. What we might call the “unitary ownership model”. That’s perfectly standard, too, and fully supported. I’d have said it’s more common, too, than split ownership -which is really intended for vast organisations that have large, separate teams of Sysadmins and DBAs (who generally don’t talk much to each other).

Unfortunately, if your mindset is expecting unitary ownership, as mine is, you get into the habit of applying Oracle RAC & Grid patches by becoming root, travelling to the /home/oracle directory and issuing the necessary opatch auto commands. I say “unfortunately” because this won’t work in a split ownership environment!

Why not? Because although you run the patch in as the root user, it works only because the patching script issues commands to be run as the oracle and grid users at different times. And that means both the oracle and grid users have to have equal read/write access to the directory from which you launch the patch, otherwise one or other of them (or both if you try running it out of /root, for example) will not be able to determine their current working directory properly …and it’s downhill all the way from that point on.

So my six hours of frustration on Saturday arose entirely from the fact that I was sitting in /home/oracle when I ran the opatch command. When the time came for the grid user to issue his commands, he had no read/write privileges on oracle’s private home directory, and so the thing failed miserably. I didn’t recognise the symptoms for what they were saying (in fairness, it was late and my eyes had somewhat glazed over); my wiser, genius System Admin colleague spotted it at once.

Last night I made another attempt to run the same patch in. This time, I launched the opatch command from /u01/app -a directory to which both the grid and oracle users had equivalent access. Forty minutes later, patching completed with success.

So, the short version: if you do split ownership of your RACs, watch out for where you run your patches from! You have to find a directory to which both owners have equivalent access. Private home directories won’t cut it. Yes, it’s probably documented a million times in the official doco… but I got burnt, nonetheless.