Atlas and Debian 9

Debian 9 (“Stretch”) was released on Sunday 18th June.

There are lots of changes under the hood, but few that make a major difference to the desktop experience (unless you count the death of ‘Ice Weasle’ and the return of vanilla ‘Firefox’ major).

Unfortunately, as I mentioned way back in January, the Atlas-assisted Oracle installation fails completely when run on the finished Debian 9 because of problems with the latest version of gcc and g++, resulting in an ‘error in invoking target links proc gen_pcscfg procob’ failure as soon as the linking phase starts.

Happily, it’s fixable by forcing the use of an older version of the gcc compiler, using the commands (run as root, before attempting the Oracle installation at all):

apt-get install gcc-4.9 g++-4.9
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 50
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50
update-alternatives --install /usr/bin/cpp cpp-bin /usr/bin/cpp-4.9 50
update-alternatives --set g++ /usr/bin/g++-4.9
update-alternatives --set gcc /usr/bin/gcc-4.9
update-alternatives --set cpp-bin /usr/bin/cpp-4.9

When you are about to launch the database/runInstaller, check that your environment has picked up the ‘old’ compiler version by typing gcc -v… you should see a great pile of techie stuff, ending in the lines:

Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)

If your oracle user sees something like that, then the Oracle installation will succeed:

The problem affects 12cR2 just as much as 12cR1. Fortunately, the same fix applies equally well to both Oracle versions, too:

I haven’t ported this fix into Atlas as yet: it’s something you have to do manually at the moment, but when time permits, I’ll automate this as I automate everything else!

Atlas now includes the fix for Debian 9 and therefore now correctly automates 12cR1 and 12cR2 installations on that distro without error.

Atlas and 12cR2

Continuing the saga of “catching up” with software developments whilst I was en route to the UK from Australia, the question next to be addressed is simply: does Atlas work to allow Oracle 12c Release 2 to be installed onto the many and various distros it claims to support.

The answer is a bit mixed, I’m afraid!

First thing to say: I deliberately coded Atlas to specifically declare 12cR2 ‘didn’t work yet’ when I first released it. So, for it to work for 12cR2 at all, that bit of code has to be removed -and that means Atlas straightaway gets bumped to version 1.2.

Once that code has gone, the good news is that Atlas will successfully help 12cR2 be installed on the majority of the distros it works with (15 out of 19, as it happens). But the bad news is that if your distro is Ubuntu or even just Ubuntu-based, then the linking phase of the Oracle 12c Release 2 installation fails irreparably after Atlas has been run.

The complete list of distros and their results with Atlas for 12c Release 2 is therefore as follows:

Debian 8.2+ ............................ Works fine
Linux Mint 18+ ......................... Fails
Mint Debian Edition 2+ ................. Works fine
Red Hat ES 7.0+ ........................ Works fine
Scientific Linux 7.0+ .................. Works fine
CentOS 7.0+ ............................ Works fine
OpenSuse Leap 42+ ...................... Works fine
Antergos  2016.11+ ..................... Works fine
elementary OS 0.4+ ..................... Fails
Mageia 5+ .............................. Works fine
Korora 25+ ............................. Works fine
Zorin Core 12 .......................... Fails
Ubuntu 16+ ............................. Fails
Manjaro 15+ ............................ Works fine 
Fedora 23+ ............................. Works fine
Peppermint Linux 7+ .................... Fails
GeckoLinux Static 422+ ................. Works fine
Chapeau Linux 24+ ...................... Works fine
PCLinuxOS 2016+ ........................ Works fine

As I say, the pattern is pretty obvious and I suspect the problem is that the gcc versioning tricks I had to pull to get 12cR1 to compile properly on any Ubuntu-based distro back in January are now the cause of woes for 12cR2. The failure always manifests itself in the following manner:

There’s no recovering from that as yet, but I hope to get it sorted within the week.

On the other hand, for any distro listed as working fine in the above list, you can expect the usual plain sailing:

That’s specifically Oracle 12.2.0.1 running on GeckoLinux, which is a spin of openSUSE Tumbleweed, but you get the same outcome for all non-Ubuntu distros.

Whilst testing all this, I discovered a couple of distros which had incremented their version strings since January (Manjaro, for instance, now reports itself as version ’17.something’, so the part of Atlas where it checked for version strings containing the numbers 15 or 16 obviously needed updating). Those sorts of versioning updates are now also included in Atlas 1.2.

There is very little updating of the Atlas doco to do, happily. For a start, you will obtain the new Atlas version just by running exactly the same wget command as was previously documented: the URL alias simply points to the latest version, but the URL itself doesn’t change.

When you install Oracle 12cR2 onto any of these non-standard distros (except the RHCSL ones, of course), you will get this dire-looking warning:

It’s better than the 12cR1 equivalent, which was to say ‘your system is inadequate’! Anyway, for all the distros for which Atlas works at all, it’s perfectly OK to say ‘yes, I want to continue’. The installation will succeed anyway.

Have fun… and wish me luck whilst battling with Ubuntu later this week 🙂

Recompile with -fPIC

Let me start by wishing a happy New Year to all my readers, complete with fireworks from our local council’s display!

And then let’s swiftly move on to the bad news!

If you are interested in installing Oracle onto non-approved Linux distros, you are very soon going to have to contend with this sort of error message:

/usr/bin/ld: /u01/app/oracle/product/12.1.0/db_1/lib//libpls12.a(pci.o): relocation R_X86_64_32 against `.rodata.str1.4' can not be used when making a shared object; recompile with -fPIC

This will be found in the Oracle installer’s log immediately after the “linking phase” of the Oracle installation starts.

Unfortunately, the error message dialog that appears at this point looks like this:

…and that particular error message has long been familiar from 12.1.0.1 installs on assorted distros. The workarounds then were to add various compilation flags to assorted makefiles.

But in this case, the graphical error dialog deceives: for a start, this is happening on 12.1.0.2, and although the dialog text is the same as back in the 12.1.0.1 days, the underlying cause is completely different. It’s only when you inspect the installActions log do you (eventually!) see the error text I showed above, which tells you that this is no “ordinary” compilation problem.

Welcome to the world of position-independent code.

Putting it as simply as I know how, the basic idea of position-independent code is that it allows execution of code regardless of its absolute memory address. It’s thus a ‘good thing’, on the whole.

Trouble is, if objects within the code you’re trying to compile haven’t themselves been compiled to be position-independent, then you aren’t yourself allowed to compile the code that references them into shared libraries.

As the error message above says, since “pci.o” isn’t position-independent, you can’t compile references to it into the libpls12 library. Note that the error message does not mean that your attempt to compile libpls12 should use fPIC: if it meant that, you could do something about it. No: it’s telling you that pci.o was compiled by Oracle without fPIC. Only if they re-compile that object with the fPIC compiler option switched on would you then be able to compile it into the libpls12 library successfully.

If you’re best mates with Larry, then, perhaps you’ll be able to get him to do the necessary recompilations for you! Mere mortals, however, are stuck with the unhappy fact that vast swathes of the Oracle 12c code-base has not been compiled to be position-independent… and there’s nothing you can do to fix that when your version of gcc insists that it should be.

The problem doesn’t manifest itself on all distros: Ubuntu 16.04, for example, has no problem installing 12c at all (apart from the usual ones associated with not using a supported distro, of course!) But Ubuntu 16.10 does have this fatal problem. Similarly, Debian 8 is fine with Oracle 12c, but Debian 9 (the testing branch, so not yet ready for mainstream release) fails. And whereas Manjaro didn’t have a problem earlier in the year when I first released my mercury pre-installer script, it does now.

This, of course, gives us a clue: there’s clearly some component of these distros which is being upgraded over time so that later releases fail where earlier ones didn’t. So what’s the upgraded component causing all the trouble?

Perhaps unsurprisingly, given that it’s a compilation error that shows you there’s a problem in the first place, it turns out that the gcc compiler is the culprit.

If you do a fresh install of Ubuntu 16.04 (the long-term support version, so still very much current and relevant), whilst making sure NOT to update anything as part of the installation process itself, issuing the command gcc -v will show you that version 5.4.0 is in use. Do the same thing on Ubuntu 16.10, however, and you’ll discover you’re now using gcc version 6.2.0.

A fresh Debian 8.6 install, subjected to an apt-get install gcc command,  ends up running gcc version 4.9.2. The same thing done to a fresh Debian 9 install results in a gcc version of 6.2.1.

Manjaro is a rolling release, of course, so it’s software components are forever being incrementally upgraded: it makes finding out what gcc version was in use at the start of the year rather tricky! So I don’t have hard evidence for the gcc version shift there -but my main desktop is currently reporting version 6.2.1, so I’ll stick my neck out and say that I would lay odds that, had I checked back in January 2016, I think I would have found it to be around version 5.something.

In short, for all three distros currently under my microscope, a shift from gcc 4- or 5-something to 6-something has taken place… and broken Oracle’s installation routine in the process.

It means that all distros will eventually come across this compilation problem as they eventually upgrade their gcc versions. Expect Fedora to keel over in short order, for example, when their version 26 is released next April (assuming they go for a late-release version of gcc 6.something, which I expect they will). No doubt we’ll have all moved on to installing Oracle 12c Release 2 by then, which is probably suitably position-independent throughout… so maybe no-one will ever have to worry about this issue again. But in the meantime… the constantly changing nature of gcc is a problem.

So, what’s to be done if you want Oracle 12c installed on these distros with their fancy new gcc versions? Nothing I could really think of, except to ensure that the old, functional versions of gcc and related development tools are installed …and that can be easier said than done!

On Debian 9 (‘testing’), for example, instead of just saying apt-get install gcc, you need to now say apt-get install gcc-5, which ensures the ‘right’ compiler version is installed, with which the Oracle installer can live. Thus can this screenshot be taken:

…which shows me happily querying the EMP table on 12.1.0.2 whilst demonstrating that I’m running Debian Testing (codenamed “Stretch”). That’s only possible by careful curating of your development tool versions.

The same sort of ‘install old gcc version and make it the default’ trick is required to get 12c running on Ubuntu 16.10 too:

-though I had to specifically install gcc-4.9 rather than “gcc-5”, since the compilation error still arose when ‘gcc-5’ was installed. These things get tricky!

Anyway: there it is. Gcc’s constant version increments create havoc with Oracle 12c installations. Coming to a distro near you, soonish!

New openSUSE

opensuse-64-bit-2016-11-19-16-03-09I was a bit busy this week, what with having my second cataract removed and all… so I missed the release of a new version of openSUSE on 16th November.

However, I’ve caught up now and as the thumbnail to the left attests, my quatro script works on the new release just fine to facilitate easy Oracle Enterprise Edition installations.

See the Quatro main page for screen-shotted walk-through details of how to perform the installation.

SQL Server Replication to Oracle 12c

No version of SQL Server supports replicating to Oracle 12c out of the box.

This took us a little by surprise at work, since things had been replicating just fine in our equally-12c test environment. The difference in Production that arose this week, however, was that somebody decided to re-initialise the SQL Server publish-and-subscribe replication in that environment. This caused SQL Server to drop the replication-controlling table from Oracle and to re-create it from scratch -and only at that point does the lack of 12c support reveal itself.

The symptoms are easy to describe: if you check your replication monitor in the SQL Server Management Studio, you’ll see this error (or variations on it):

sqlserverrepl01

The cause of this is also quite easily found. In the SQL Server msdb database (under Programmability > Stored Procedures > System Stored Procedures, if you’re browsing in the Management Studio), you’ll find a procedure called sys.sp_MSrepl_createdatatypemappings. It contains this code snippet:

IF OBJECT_ID(N'dbo.MSdbms_datatype', 'U') IS NULL
 BEGIN
 print 'Creating table MSdbms_datatype'

create table MSdbms_datatype
 (
 datatype_id int NOT NULL IDENTITY,
 dbms_id int NOT NULL,
 type sysname NOT NULL,
 createparams int NOT NULL DEFAULT 0,
 CONSTRAINT pk_MSdbms_datatype PRIMARY KEY (datatype_id),
 CONSTRAINT fk_MSdbms_datatype_dbms_id FOREIGN KEY (dbms_id) REFERENCES MSdbms (dbms_id)
 )
 
 exec dbo.sp_MS_marksystemobject 'MSdbms_datatype'

--
 -- Define default dbms data types
 --
 exec sys.sp_MSrepl_MSSQLdatatypes 'MSSQLServer'
 exec sys.sp_MSrepl_DB2datatypes 'DB2'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '8'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '9'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '10'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '11'
 exec sys.sp_MSrepl_SASdatatypes 'SYBASE'
 END

You’ll note that it’s creating a table to govern the ‘data type mappings’ between SQL Server and various other databases -in other words, and as an example, where SQL Server says ‘INT’ or ‘BIGINT’, Oracle will say ‘NUMBER’. SQL Server needs to be able to translate its data types to those used by ‘foreign’ databases, and this code sets that translation mechanism up.

Unfortuately, you’ll also note that whilst the code goes on to populate this data mapping table with translation entries for Oracle versions 8, 9, 10 and 11… there’s nothing at all for Oracle 12! It’s this lack of a description as to which SQL Server data types should become which Oracle 12c ones that explains replication’s inability to create its MSREPL7 table correctly.

If, despite receiving that earlier error when first attempting to create a new subscription to an Oracle database, you describe the MSREPL7 table that will actually have been created in Oracle, you’ll see this sort of thing:

sqlserverrepl02

Notice all those “INTERVAL DAY(2) TO SECOND(6)” columns? They’re actually supposed to be NUMBER(10) and NUMBER(3) columns. Those BINARY FILE LOB columns are all supposed to be RAW(16), too.

MSREPL7 is the ‘governor’ for SQL Server replication: it records what tables are supposed to be replicating and what transaction they’re up to in the destination (Oracle) database. When the SQL Server source database next wants to send transactions to Oracle, it’s this table which tells it which transactions have already been sent and which have not. So when this ‘governor’ table is screwed up, you won’t get very far with any replication!

As an aside, things are slightly more complicated if you upgrade to 12c from a previously-working 11g. In that case, you’d already have had replication working nicely, even after completing the 12c upgrade process, and thus the MSREPL7 table would have all the right data types and there’d be no impediment to replication continuing to work. The other tables you’re replicating will also already exist in your newly-upgraded 12c database and they’ll have all the right columns too. So they will replicate nicely as well. So, you will think that all is well… because it is!

It’s only when someone decides to reinitialize  replication that it will all go pear-shaped, because it’s only at that point that SQL Server will drop the tables being replicated and attempt to re-create them… with all the wrong data types! When you then attempt to push a number from SQL Server into a column in Oracle that’s been freshly-created as an INTERVAL data type, that’s when sparks will fly.

This is why we didn’t realise that upgrading to 12c would be an issue in our various test environments: no-one had attempted to re-initialize the replication; and with many multi-hundred-million rows to be pushed across into Oracle, it was a fair bet that no-one would be daft enough to do that in Production either! Unfortunately, it turned out that my gambling instincts were wrong on this occasion: someone was daft enough to hit the wrong menu options, and thus the trouble began.

I couldn’t find much about this problem on Google, which was rather surprising (if one major database has trouble replicating to another major database, I would have expected to hear a lot about it by now!)

I did find enough to make me look at the column types in MSREPL7, and notice that all sorts of odd data types were in use. That got me hunting down any references to how SQL Server knows what data type maps to what in Oracle. And finally, that got me to the stored procedure the code from which appears above.

Looking at that code, it’s just begging for an extra line to be added to it, no?! I mean, if it read:

exec sys.sp_MSrepl_ORAdatatypes 'Oracle'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '8'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '9'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '10'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '11'
 exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '12'

…then maybe all the troubles would go away, right?

But you can’t just run that sort of thing in the standard SQL Server Management Studio. For this, you need to run it in DAC (or ‘dedicated administrative connection’) mode. To do that, fire up SSMS as normal and connect to your SQL Server instance as normal. Click New Query to start a new query window. Finally, click the ‘change connection’ button, just to the left of the database selector:

sqlserverrepl03

That will pop-up a new dialog box to allow you to make a fresh connection to your SQL Server instance:

sqlserverrepl04

But notice this time that I’ve stuck ADMIN: in front of my usual ‘server name’ connection string. That will connect you in the right ‘god mode’. You can’t really tell anything is different from a non-DAC connection, apart from the ‘admin:’ identifiers scattered all round the SSMS interface if you look in all the right places!

Anyway, once you’ve got a DAC connection, you can type in this lot of SQL and run it:

USE master

GO

exec sys.sp_MSrepl_ORAdatatypes 'Oracle', '12'

GO

exec sys.sp_MSrepl_MSSQL_ORA_datatypemappings @source_dbms = N'MSSQLSERVER', @destination_dbms = N'ORACLE', @destination_version = '12'

GO

If all goes according to plan, you should see this sort of response:

sqlserverrepl05

Once the data types have been created for Oracle 12, drop the MSREPL7 table from your Oracle database and then try re-initializing your replication subscription once again. This time, it should just work:

sqlserverrepl06

That shows you me dropping MSREPL7 in a SQL*Plus window in blue; then I went and re-initialized the replication in SQL Server (which you can’t see me doing). You then see me describing the structure of MSREPL7 in the blue window: notice (a) the table has been re-created despite just being dropped; and (b) it’s been created with NUMBER and RAW data types for its columns, not INTERVALS and BLOBs. The Synchronization Status progress window finally shows that replication transactions were successfully delivered to the Oracle database.

You can also issue the required fixup SQL in the command-line sqlcmd tool. Connect to your database in DAC mode with the “-A” switch:

sqlcmd -S winsql\winsqldb -E -A

The -S takes the form it does here because it describes the server AND the instance to connect to (I use named instances). Hence “winsql” is the server name and “winsqldb” is the instance to connect to running on that server. The -E switch means “use Windows authentication”, because that’s what I do! If you use named users, you’d use the -U switch and specify the username (and the -P switch if you fancied typing his password in clear text on the command line!). The -A switch means, as I’ve mentioned, connect in DAC mode.

Note that DAC mode is only available locally: that is, you have to be working on the actual server that’s hosting the SQL Server instance. If you insist on running the fixup SQL remotely, you can enable remote DAC access by right-clicking the instance in SSMS, clicking Facets, selecting Surface Area Configuration from the drop-down menu and then setting RemoteDacEnabled to True. You also need the SQL Browser service to be running (which you can switch on from the normal services utility, or by running the SQL Server Configuration Manager, clicking the SQL Server Services item and then the SQL Browser item in the right-hand pane (right-click it and click Start).

Incidentally: you may have thought (as I innocently did) that upgrading your SQL Server from (say) 2008 R2 to (say) 2016 would help. All I can tell you is that the code snippet which appears at the top of this piece is as identically 12c-deficient in SQL Server 2016 as it is in any previous version. I found this surprising and completely inexplicable: I’m prepared to forgive SQL Server vintage 2008 for not knowing about Oracle 12c; I find it much harder to understand why SQL Server 2016 is silent on the subject!

So, as far as I know, this trick of manually adding a reference to 12c in DAC mode is the only way to get any version of SQL Server replicating to Oracle at all.

Whether bolting on a ’12’ to the end of a bit of code and running it is actually supported by Microsoft, I can’t say: I’ve looked for any indication it’s a properly-supported technique in all the usual places and turned up nothing either way.

But if you’re replicating SQL Server to Oracle 11g and thinking of upgrading to 12c any time soon, it might be useful to bear in mind that you’re going to need this technique should you ever need to re-initialize your subscriptions 🙂

State of Play revisited

churchill150I mentioned some time ago that Churchill was looking pretty sick since no RHCSL 6.x distro with a version number greater than 6.6 seemed to deliver its software on a single DVD any more, and since multi-DVD installs break Churchill, Churchill was doomed.

But I was (happily) quite wrong on that point. It turns out that, in fact, CentOS is the only RHCSL distro that has chosen to package its distro in such a way that it breaks Churchill.

I’ve checked OEL 6.6, 6.7 and 6.8; and then I checked Scientific 6.6., 6.7 and 6.8; and I even checked genuine Red Hat 6.6, 6.7 and 6.8… and every single one of them works fine with Churchill.

So, as I mentioned last time, that now means Scientific Linux is the distro that Churchill will assume is in use if you use its ‘speed keys’ to build your Oracle server nodes. (Rather more bluntly, it means I’ve stopped using CentOS at all, for anything). You can still use OEL or Red Hat if you prefer, by spelling out your requirements on the boot strap line. But use a speed key to save some typing, and SL will be assumed.

And, of course, if you want to use CentOS 6.7 or 6.8 to build your Oracle servers using Churchill, forget it: because they just don’t work.

Why is the new default Scientific Linux and not, say, OEL? Simply because it’s (a) free to obtain without signing up for anything and (b) it’s not as butt-ugly as OEL when you boot it up (white text on a red background is neither pretty nor particularly legible).

Other than this issue of CentOS stopping working, the latest version of Churchill (we’re now up to version 1.6) introduces a rather more radical change of tack: no more support for Oracle 11g installations, of any sort, whatsoever. Of course, version 1.5 is still available which does still do 11g quite nicely -but 11g is so long in the tooth now, and completely out of support unless you happen to be using 11.2.0.4, for which an expensive paid support agreement is needed before it can even be downloaded, that it seemed a waste of time to keep 11g support in newer Churchill releases. So, 12c only it is.

Which of course means that Churchill’s hardware requirements have just gone through the roof to match Oracle’s own. A 2GB VM just doesn’t cut it any more. The new hardware minima are: 5GB RAM and a 40GB hard disk. So building a 2-node RAC and a 2-node Active Data Guard setup on your desktop now basically demands a 32GB desktop… and no laptops need apply.

All of which in turn means that I think it might be time to re-write the Churchill documentation (especially as it currently exists only as pages on the old version of this site): take the opportunity to spruce it up, refresh it, make it part of the native WordPress site and so on. Expect something soon-ish.

Minty Fresh

Mint2Having just finished a push for domestic consistency by installing Ubuntu 16.04 on practically every machine I control (which therefore definitely excludes a certain Significant Other’s Windows PC!), I now see that the Linux Mint crew have just released a new version of their green and fragrant distro.

Never one to let the opportunity of a new distro installation pass me by if I can help it, I have accordingly just installed the Mate version onto my ancient(circa 2009) laptop… and it’s running nicely. Pretty slow, of course, but hardly unbearable. I probably don’t want to be doing Blender renders on it any time soon, but it’s fine for a bit of web editing, browsing, video playing and photo up-touching.

It also has its windows controls (maximise, minimise and close) on the right side of the window title bar (which is to say, yes it’s on the right-hand side, but it’s also the correct side as far as I’m concerned!) It is a small cosmetic change from vanilla Ubuntu, I guess; but it’s enough that it might persuade me to deploy it on my main PC instead of the current Ubuntu install. Time will tell.

I noticed in passing that though I had written simple installation scripts for Oracle 12c for a lot of other distros I’ve dabbled with over the years, I had curiously neglected to write one for Linux Mint, of whatever vintage.

So I’ve rectified that. On the Oracle articles page is a new one for installing 12.1.0.2 on Linux Mint 18. I’ve run out of inspiration for my script names now, so this one is just called Mentha -which, as we all know, is the Latin for ‘mint’.

I tested it on both the Cinnamon and Mate spins, but the screenshots in the article are all from Mate (for absolutely no reason at all, other than I happened to have it handy at the time the screenshots needed to be taken).

I should perhaps mention that I’m trialling the use of new ‘slideshow’ technology in this new article: it means the bazillion screenshots don’t take up nearly so much room on the screen and fade neatly from one to the other as you step through the slideshow. The image captions provide the instructions. It looks good -but there are at least two issues with it. One, if you’re using script blockers, the slideshow won’t work properly. And two, the plain-text version of the article (obtainable by clicking the ‘print’ link at the very end of the it) displays all the screenshots sans instructions, which is somewhat less than useful! If that is an issue for you, let me know and I’ll re-think… I am not entirely sure how many people bother with the plain-text versions of articles, so I don’t know how much of a deal-breaker this is. As I say, tell me (in the comments) if it is one for you…

Oracle on Ubuntu 16.04

mandelaWith Ubuntu 16.04 as my new desktop, it was inevitable that I would need to cook up one of my pre-installer scripts to help automate the installation of Oracle 12c onto it.

Like the pre-installers for various other distros and O/Ses, I’ve given my script a name… and “Mandela” was the obvious choice. Ubuntu is a South African word that means ‘oneness of humanity’ (very roughly!). Mandela was intimately aware of it and was a great exponent of it. It also meant a lot to Mark Shuttleworth, the founder of Canonical, developer of Ubuntu, the O/S. It’s the reason why he called his distro ‘Ubuntu’ in the first place.

So anyway: I’ve put together an article on how you use mandela (the Oracle preinstaller) to make installation of 12c on Ubuntu relatively painless. The script is available for download either here or here.

Additionally, I found I wanted to install SQL Developer on Ubuntu, so there’s now an article for that, too.

Heffalumps and Woozles

heffalumpsThe article about using clooney.sh to configure Solaris as a platform on which to run Oracle 11gR2 or 12c is finally complete.

Which means you have the perfect environment in which to replicate my recent bug-hunting expedition!

Take an 11gR2 (i.e., 11.2.0.3 or 11.2.0.4) database running on Solaris (or Linux or Windows, come to that), and create a non-SYS user with DBA privileges:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Solaris: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

SQL> create user hjr identified by hjr;

User created.

SQL> grant dba to hjr;

Grant succeeded.

Now, connected as that new non-SYS user, here is a little script to create and populate two tables, one partitioned and one not (so you need the Enterprise Edition for this, otherwise you can’t use partitioning):

create table ABC (key  NUMBER(10) not null);
insert into abc values (10);
insert into abc values (11);
insert into abc values (9);

create table DEF (key  NUMBER(10) not null)
partition by range (key)
(partition p0 values less than (10),
 partition p1 values less than (11),
 partition p2 values less than (12),
 partition p3 values less than (13));

alter table ABC add primary key (key) using index;
create index DEF_IX on DEF (KEY) local;
 
insert into def values (10);
insert into def values (11);
commit;

Table ABC contains three rows; table DEF contains 2 -and they happen to share values with two of the rows in the ABC table. There’s an index on both tables, too: on the partitioned table, the index is locally partitioned.

Now, create a little procedure that does nothing more than join the two tables and select for values greater than or equal to something:

create or replace procedure codetest (v_k IN number)
is
  v_txt varchar2(1000);
begin
  select
  abc.key
    into v_txt
    from abc, def
  where abc.key = def.key
    and abc.key >= v_k
    and rownum < 2;
  dbms_output.put_line(v_txt);
end;
/

So you can execute that procedure like so:

SQL> set serveroutput on
SQL> exec codetest(10);
10

PL/SQL procedure successfully completed.

SQL> exec codetest(11);
11

PL/SQL procedure successfully completed.

So both values 10 and 11 work fine, and the little select statement contained within the procedure correctly fetches both values as required. So far, so normal.

Now do this:

alter index DEF_IX MODIFY PARTITION p1 unusable;

That renders the p1 partition of the index on the DEF table unusable. Note that partition p1 would hold values “less than 11 (and greater than 9)”, so no way, no how is value 11 involved in that partition. Now run the procedure again, selecting for value 11:

SQL> exec codetest(11);
BEGIN codetest(11); END;

*
ERROR at line 1:
ORA-01008: not all variables bound
ORA-06512: at "SYS.CODETEST", line 5
ORA-06512: at line 1

Though partition p1 cannot be relevant for the query, the query fails declaring that “not all variables bound” …a classic error that you often get when submitting SQL queries using bind or substitution variables (along the lines of “select * from emp where ename=:B1”), and when you’ve forgotten to assign a value to one of the variables.

But there are no bind or subtitution variables in this code, except in the sense that v_k is one… and that definitely has had a value assigned to it. So why on Earth is the ORA-01008 being raised?

When this happened on one of my production databases, Artem (my esteemed co-DBA) and I puzzled about it for ages. We suspected corruption (but DBMS_REPAIR said all was fine). We re-wrote the select part of the PL/SQL code every which way we could think of, just in case some syntactical issue was tripping up the optimizer. We did this, we did that… but then Artem did the following:

SQL> alter index DEF_IX REBUILD PARTITION p1;

Index altered.

SQL> select status from dba_ind_partitions where index_name='DEF_IX';

STATUS
--------
USABLE
USABLE
USABLE
USABLE

(So we know that all the index partitions are fine once more). And:

SQL> alter index DEF_IX MODIFY PARTITION p1 unusable;

Index altered.

SQL> exec codetest(10);
10

PL/SQL procedure successfully completed.

SQL> exec codetest(11);
11

PL/SQL procedure successfully completed.

So this time, we’ve marked the same partition as before as unusable before we do anything else, and then we run the procedure selecting for value 10… which is involved in the index partition we’ve just invalidated. And yet the code still works! And, even more surprisingly, if the procedure worked fine on value 10, it goes on to work fine on value 11 too -the same value it had previously failed at in the presence of the same unusable index partition.

So then we immediately did this:

SQL> alter system flush shared_pool;

System altered.

SQL> exec codetest(11);
BEGIN codetest(11); END;

*
ERROR at line 1:
ORA-01008: not all variables bound
ORA-06512: at "SYS.CODETEST", line 5
ORA-06512: at line 1

Just by flushing the shared pool, the code reverts back to its previous behaviour, where it raises an ORA-01008 for value 11 …the same value it just worked for!

I’ve no way of knowing precisely what is going on, but it seems pretty apparent we have a bug here -but one that took a lot of hunting down! The ingredients needed to make the bug reveal itself would seem to be:

  • Joining a non-partitioned table to a partitioned table
  • The non-partitioned table has values less than any of the values in the partitioned tables
  • The two tables share at least two values
  • You query to retrieve the higher of the two values in the partitioned table that overlap/match with values in the non-partitioned table. In this case, the overlapping values were 10 and 11 -query for 10 and the ORA-01008 never arises. Query for 11 in the presence of unusable index partitions and the non-existence of a previously-compiled shareable cursor and the ORA-01008 errors come thick and fast.
  • And your query must use “>=”. If it selects for “<=” or “=”, then the index invalidation never causes the ORA-01008 to appear, no matter what index partitions are unusable or what value you select for.
  • You have to have the initialization parameter skip_unusable_indexes set to TRUE (which is the default in 11g). If it’s set to FALSE, then the existence of unusable index partitions never causes an ORA-01008 to be raised under any circumstances.
  • You’re not doing this as SYS. SYS seems to be immune from whatever code-path causes the ORA-01008. Non-SYS users can be victims of it!

If you get all those conditions met, then marking an index partition unusable seems to cause the internal, recursive SQL (that’s generated under the hood whenever you make use of the database) to be using a bind variable to which no value has been assigned. The ORA-01008 is therefore “correct” in the sense that there’s genuinely a bind variable somewhere that’s missing a value, but it’s a problem with Oracle’s own internal SQL, not the code you’re running.

If you happen to query for value 10, you don’t encounter the problem in the presence of an unusable index partition -and the cursor that’s created (and stored in the library cache) when your query is successfully parsed is then available for use by the procedure when you then make it query for value 11. So if you’ve successfully queried for value 10, you can successfully query for value 11 (or any other value), because a parsed cursor is already at your disposal. But flush the shared pool, and that cursor is wiped -so that when you now select for value 11, a new cursor has to be compiled… and the problem with the recursive SQL needed to make that happen then rears its ugly head once more.

Happily, the bug would seem to have been fixed in 12.1.0.2, but 11.2.04 remains supported for a good few years yet! Therefore I’ll be raising an SR soon enough, and we’ll see what Oracle has to say about it. I’ve already raised an Oracle forums question about it, and got some useful suggestions which got us on the right track (though quite a few more comments about the quality of my code than were strictly warranted, I feel! Why do people look at a forest and see trees quite so often?)

There are so many variables that make the difference between the execution of the procedure fail or succeed, though, that this truly did feel like Pooh and Piglet felt when hunting Woozles. And at the end, I felt as if I’d fallen in a Heffalump trap for Poohs (or Piglets):

“I have been Foolish and Deluded,” said he, “and I am a Bear of No Brain at All.”
“You’re the Best Bear in All the World,” said Christopher Robin soothingly.
“Am I?” said Pooh hopefully. And then he brightened up suddenly.

Gone

As promised, Salisbury and Asquith have been “retired” and have accordingly disappeared from the front page. They can still be reached from the Downloads page, though, should anyone still need them.

Churchill is now very nearly completely documented and replaces both. The only thing still missing is the description of how to create a Wilson server to act as an Enterprise Manager Cloud Control, and that should be finished by the end of the week.

I’ve also set up my own “ownCloud” hosting service and am hosting the Churchill ISO from there rather than from Dropbox. I think it’s all working and the necessary files are available to all, but if you run into any problems, just drop me a line and I’ll get it sorted ASAP.