Dizwell Informatics

News from Nowhere

Build Active Data Guard

1.0 Introduction

The last part of the Churchill framework is the construction of a second cluster, with hosts named Geiger and Dirac. This cluster looks to Attlee for its shared storage, in the same way that Alpher/Bethe find theirs on Churchill. For Geiger/Dirac to work at all, you therefore need Attlee built and running in the background.

We install Oracle 12c Grid Infrastructure and Database software onto Geiger/Dirac, of course; but we don’t let the Oracle installer create a database for us. That’s because we’re going to use Geiger/Dirac as a standby clone of the database already running on Alpher/Bethe (so, at that point, both those servers and their Churchill ‘sponsor’ need to be running in the background too!) This all means that you need hefty physical resources to be running the complete Churchill/Alpher/Bethe and Attlee/Geiger/Dirac infrastructure: you needn’t apply if your physical PC doesn’t have at least 32GB RAM, basically.

In this article, I shall skim how you create the Geiger/Dirac virtual machines: they are built identically to Alpher/Bethe and you should know how to build them by now. Similarly, I shall treat lightly the subject of performing the Grid Infrastructure and Database software installations: you should be comfortable with doing both those by now, too. Instead, I’ll concentrate on the process of how we clone the orcl database from Alpher/Bethe to Geiger/Dirac and how we then go on to make Alpher/Bethe the primary site and Geiger/Dirac the active data guard standby (i.e., it’s running in read-only mode whilst constantly being updated with redo shipped from Alpher/Bethe). This is what Oracle calls their Maximum Availability Architecture (or MAA) and is pretty much the pinnacle of what you can do with Oracle’s RDBMS. It therefore fully warrants the time I’ll spend describing it.

2.0 Hardware Configuration

To build this part of the Churchill framework, you need two new virtual machines built with:

  • 2 x CPUs
  • 5120MB RAM
  • 40GB hard disk
  • 2 x host-only network interfaces
  • 1 x CD-ROM drive

The only tricky bit in that lot is to make sure that whilst both network interfaces use host-only networking, one should be attached to the same subnet that Churchill (or Attlee) uses; the other should be connected to the second host-only interface that Alpher/Bethe use as their private interconnect. The former is generally called vboxbnet0 and the latter vboxnet1 if you’re using VirtualBox for your virtualization; on VMware, you get to choose your interface names, so I can’t be prescriptive about them here. See Section 3 of this earlier article if you need reminding about the significance of the network interfaces in a Churchill context.

Here’s a screenshot of how I’ve built my Geiger VM:

Note the key points already mentioned: 5120MB of base memory; 2 virtual processors; 2 network adapters, both host-only, but each on a different subnet; a single ‘optical drive’, fitted with an ISO of CentOS 6.8; and a 40GB hard disk.

In case it’s not obvious, I’m using CentOS 6.8 in that screenshot because that’s what I built my entire Churchill framework out of. If you opted to use (say) Scientific Linux 6.6 when you built Churchill, then you are required to use it for every other component of the Churchill framework, too. Geiger and Dirac cannot, in other words, use a different O/S than any of their siblings.

3.0 Booting the new servers

As is usual in the Churchill framework, you boot each of your new servers in turn, press <TAB> when the first boot menu appears and then type a bootstrap line onto the end of whatever text is already present: this allows the installation to then be automated from that point on, thanks to whatever Kickstart script your bootstrap line invokes. All bootstrap lines always begin ks=http://churchill/ks.php? …and then you bolt on appropriate other components to make things suit each VM being built.

For Geiger, the bootstap line key component is simply ‘sk=3‘; for Dirac, it’s just ‘sk=4‘. Since your VMs have two possible network interfaces, you need to add ‘ksdevice=eth0’ onto the end of the bootstrap line, too. In addition, if you are building your VMs using anything other than CentOS 6.8, you need to say what you are using as an OS, with appropriate values for a ‘distro=’ and a ‘version=’ bootstrap parameter.

Remember that in Churchill, the split ownership model is used by default. That is, the Grid Infrastructure software is owned by someone called ‘grid’, whilst the database software is owned by someone different, called ‘oracle’. If you prefer to use the unitary ownership model, where one user (oracle) owns both software installations, then you must specify an additional bootstrap parameter of split=n. Your ownership model has to be consistent across the Churchill framework, though: if Alpher/Bethe used split ownership, you must not now say that Geiger/Dirac should use unitary ownership, nor vice versa.

Here is me about to boot Geiger, for example:

Since there is no mention there of distro=<something>&version=<somethingelse>, you can tell just from the bootstrap line itself that I must be using CentOS 6.8. Had I not been doing so, though, you might have seen this sort of thing instead:

You will see that if you are required to specify distro and version, the bootstrap line may wrap onto the next line. That doesn’t matter, so long as you don’t introduce any spaces and that each parameter is bolted directly onto the line with a “&” character. The only space allowed is between the last bootstrap parameter and the ‘ksdevice’ item. The order of the parameters also doesn’t matter. It is as legitimate to use ‘distro=sl&sk=3&version=67’ as it would be to use ‘sk=3&version=67&distro=sl’, for example: provided all the necessary ‘bits and pieces’ are mentioned somewhere in the bootstrap line, their position doesn’t matter.

The only other thing to remind you before you start booting and building your new VMs: make sure Churchill AND Attlee servers are both built and running in the background before you start. Churchill provides the Kickstart script to the new VMs, plus the distro software, plus the Oracle Grid and Database software; Attlee provides the fake ASM disks that the new VMs will use as their cluster storage. So both need to be present and working before either new VM build can succeed.

Once you press [Enter] on the completed bootstrap line, your O/S install will begin and complete entirely automatically: the next thing you’ll see is this:

…followed fairly soon after by:

…at which point, you should do as requested and press [Enter] again to make the VM reboot. Make sure it boots back up from its hard disk, not the O/S installation disk.

You will find upon successful reboot that you can log on to your new VMs as user root (password oracle) or as user oracle (password also oracle). If you didn’t over-ride the default behaviour of using split ownership, your new VMs will also have a user account for user grid (password also oracle…spot the pattern!). On Geiger, you should find that the /osource directory is populated with Oracle software shipped over from Churchill. No similar software will have been copied to Dirac -and this means that your subsequent Oracle installations will have to be driven from Geiger, who is the ‘principle’ server of the Geiger/Dirac pair.

Remote SSH connections to Geiger are possible using IP address 192.168.8.103. Those to Dirac need to use IP address: 192.168.8.104.

4.0 Installing Grid Infrastructure

To perform the Grid Infrastructure installation, therefore, you now connect to Geiger from your physical desktop like so:

ssh -X [email protected]

That’s the ‘public’ IP address which Churchill assigns to Geiger on our behalf. I’m connecting there as the user grid, because I let the split-ownership model apply (it’s the default and I didn’t make a point of switching it off). If you said split=n when building Geiger (and Dirac), you’d connect as the user oracle instead. In all cases, you specify the ‘-X’ switch when connecting so that you can have applications running on Geiger ‘paint’ their graphical screens onto your physical desktop. That way, Geiger can run a graphical software installer despite not having any graphics capabilities of its own!

Be clear that at this point, you not only need Geiger up to connect with, but you also need Churchill, Attlee and Dirac up and running in the background too. Churchill is still the primary provider of network services to the entire framework; Attlee is the provider of shared storage to the Geiger/Dirac cluster… and, of course, you can’t build a Geiger/Dirac cluster unless both constituent nodes are available at the same time!

So, if all four servers are running, and you’ve successfully connected to Geiger via ssh with X forwarding enabled, you now invoke the Grid Infrastructure install by typing the command:

/osource/grid/runInstaller

Geiger, and your desktop, will respond accordingly:

Now, as I mentioned in the introduction, I am not going to walk you through every last screen you’ll meet in this wizard: you should probably know how to perform a Grid Infrastructure install by now! But I’ll mention some of the more complex bits of the process, where there might be doubt about how best to proceed.

For starters, on that first screenshot, you do want to take the option to install Grid Infrastructure for a Cluster. It will be a standard cluster and you’ll need to do an ‘Advanced Installation’. You’ll then come to this screen:

You don’t have a choice in the name of your Cluster, your SCAN Name or your SCAN Port: they have to be filled in as you see here, because only these values have been configured on Churchill and Attlee to be DNS-resolvable. So: dgscan must be the name for both the cluster and the scan, and 1521 is the default port number and needs to be accepted for use. Switch of GNS, too, because we don’t use that in the Churchill framework.

This screen starts off by only knowing about the geiger.dizwell.home host (because it’s the local server). You need to click [Add] to add details of Dirac: the Public hostname is just ‘dirac.dizwell.home’ and the Virtual hostname is ‘dirac-vip.dizwell.home’. After that, click the [SSH connectivity] button and type in the password for the user shown. In unitary ownerships, this will be the password for user ‘oracle’; in split ownerships, it’s the password for user ‘grid’: in either case, the password is ‘oracle’ unless you went to the effort of changing it beforehand. Click [Setup] to have the ssh connection tested and verified, then [Next] to move on when it has been.

This screen identifies what network interfaces exist and to what use each should be put:

Unfortunately, the use for the the second interface starts off wrong and you need to correct it to read ‘private’ as shown here. You end up with the 192…. interface being used for public communication and the 10…. interface used for ‘private’ communications between the two nodes of the cluster.

When asked, you are using Standard ASM.

When you eventually see this screen:

You first need to click the [Change Discovery Path] button and then type in /u01/app/oradata as the path to the ASM disks. You should also switch to using ‘External’ redundancy. When you finally get a list of 8 candidate disks displayed in the central part of the screen, click 5 of them to be used. You can click more or less if you prefer, but 5 definitely works and leaves 3 other disks to be played with later on, so it’s a good balance. You definitely do not want to select disks 7 or 8 though: steps we take later expect to be able to use those two disks for creating an ARCHIVE disk group, and if you’ve already nabbed them for DATA, things will start failing badly. So… leave disks 7 and 8 out of it for now, please!

Do not use Intelligent Platform Management Interface when prompted; and there’s no cloud control to register with (you may by now have built Wilson and so actually have a Cloud Control you could use at that point, but I would recommend you not to do so, though ultimately it’s your choice). The OS groups and file paths can be accepted as they are, without change. I recommend you do let the root scripts be run automatically (the password for root is, of course, ‘oracle’).

The only pre-requisite check that should fail is this one:

If nothing else is listed as a problem, you’re fine to click the Ignore All option and proceed regardless.

File copying proceeds as normal and then you’re prompted to execute the root scripts:

Click [Yes] there and the installation will move into its concluding phase. After a long effort, you should eventually see this:

At which point, congratulations: you have a Geiger/Dirac cluster.

5.0 Installing Database Software

Now you have a working cluster, it’s time to install the Oracle database software onto it. Importantly, this time, we’re only installing the database software: we are specifically not creating a database. So, once again, I won’t go into every last detail of how you install the Oracle 12c software, since you should be used to doing it by now, but I’ll highlight the particularly important screens you’ll encounter.

You should first connect to Geiger as the oracle user (regardless of whether you’ve adopted the split or unitary ownership model, that’s going to mean using the username ‘oracle’). You need to make that connection from your physical desktop with X-forwarding enabled, like so:

ssh -X [email protected]

The Oracle database software installer is then invoked with the command:

/osource/database/runInstaller

After confirming that you don’t want to receive security updates, you’ll meet one of the most important screens of the entire install:

The default option is to create a database as well as install the software -but for Geiger/Dirac, we need the ‘install database software only’ option instead. Despite that, however, you are performing a RAC-software install, so the response to this screen:

…is to make sure the ‘RAC database installation’ option is selected before clicking [Next].

If you went for the unitary ownership model, then you will already have established ssh connectivity between Geiger and Dirac when performing the initial Grid Infrastructure installation. In the split ownership model, however, that GI install was done as ‘grid’ and this one is being done as ‘oracle’ -and so you will need to make sure that the oracle user can ssh between the nodes as well as the grid user can.

You can then confirm you’re installing 12c Enterprise Edition; when prompted for various file paths, accept the default suggestions in all cases. The same goes for operating system groups: all the defaults will already be correct, thanks to Churchill’s automation efforts. There won’t be any prerequisite errors to deal with.

Installation proceeds as you’d expect:

…until you are prompted to run the root script:

At this point, simply up a new ssh connection to each node in turn as the root user and type in (or copy/paste in) the path and filename of the root.sh script shown in the dialog:

You’ll notice from the above screenshot that the script prompts you for ‘the location of the local bin directory’ -and even offers a default suggestion of  ‘/usr/local/bin’ by way of an answer. It’s not a bad default on the whole -and therefore it’s quite OK to just press [Enter] at that prompt to accept it.

Don’t forget to run the same script on Dirac.

Once the script has been run on both nodes, click [OK] to close the initial pop-up, at which point you’ll be told the installation of the database software was successful in the main dialog. Click [Close] there to finish the installation off.

It is not essential to do so, but I would strongly recommend that you use your Virtualization technology to create a new snapshot of your entire Churchill framework at this point. That means taking near-simultaneous snapshots of Churchill, Alpher, Bethe, Attlee, Geiger and Dirac -in as quick a succession as you can manage. If you want to be truly sure of creating a baseline that you can revert back to at any point, confident that the database was not in flux when the snapshots were taken at different times, shut the orcl database down first (with the command srvctl stop database -d orcl issued by the oracle user) and bring it back up after all snapshots are complete (with the command srvctl start database -d orcl, also issued by the oracle user). The reason for taking these snapshots is straightforward enough: what follows makes substantial changes to the database and getting it back to its original state should anything go wrong would be complex. Reverting to a previously-taken snapshot, though, is a piece of cake. Your choice, of course, but I’d do it, for sure!

6.0 Preparing for Data Guard

At this point, you have a functioning Geiger/Dirac cluster. It’s not actually running a database as yet, but the software to do so is in-place and ready to go. The plan now is therefore to clone the existing database running on Alpher/Bethe across to the new-but-bare cluster on Geiger/Dirac. However, there are a few things that need preparing in both clusters before that can be done. For example, Alpher and Bethe are probably (if you’ve followed the instructions!) not running in Archivelog mode -but it’s a requirement for Data Guard that they should be. Similarly, Alpher/Bethe currently permit ‘nologging’ operations, which would break a Data Guard setup if left unchanged, so switching on ‘forced logging’ is a necessary prerequisite.

And so on. There are lots of niggly little things which need switching on at Alpher/Bethe or Geiger/Dirac or, indeed, on both sides.

You could do it all by hand of course, but one little typo can ruin things: and typing lots of stuff is not the ‘Churchill way’!

If you log on to root onto each of the Alpher, Bethe, Geiger and Dirac nodes in that order, you’ll find a script called dgprep.sh already sitting there, ready to run. All you need do is invoke it with the command:

cd /root
./dgprep.sh

If you try to cheat and run it as anyone other than root, it won’t let you, but will warn you and quit. If you do run it as root, however, it will do all the Primary/Standby prerequisite-setting for you. It will do different things on different nodes: on Alpher and Geiger, for example, it will create an entirely new ASM disk group for you, which takes a bit of time. Once it’s been done on the first node of each cluster, it needn’t be done on the second (the first nodes share their storage with the second, after all), so when the script runs on Bethe and Dirac, it will skip that particular step.

So, run the script as root four times on the four different nodes, one at a time, waiting for it to finish on one node before moving on to the next and doing the four nodes in order (A, B, G, D).

Here’s me running it on a ssh session remotely-connected to Alpher, for example:

Notice how the script reports what it’s doing and then, at the end, tells you to press a key to really finish things off. As soon as you do that, the shell script deletes itself -because re-running it would not be a good idea! It makes substantial changes to the way the database works and it’s the sort of thing you only want to do once. In that screenshot, too, you’ll see that I’ve lined up connections to the other three nodes, in order, ready for the script to run on them each in turn. Here’s a screenshot of my next run on Bethe, for example:

So it’s the same sort of thing -and the same ‘press a key and I’ll delete myself’ ending, but you’ll note that the script performs much less work on Bethe than on Alpher. As I mentioned, this is because in a cluster, a lot of things you only need to do on one node before they are automatically visible to the second, at which point it becomes redundant to do them on the second node too.

If you want to see a detailed explanation of what that script does and why, I’ve written about it in this separate article. You are welcome to read that and issue all the commands by hand, if you prefer. The point of the dgprep.sh script is simply to make trivially easy to execute that which would otherwise require large amounts of typing in which (if you are as fat-fingered as I am!) you are probably going to make a mistake, somewhere along the line!

7.0 Cloning the Database

Once you’ve completed the preparatory steps successfully on all four nodes, you switch back to Alpher and issue these commands:

su - oracle
srvctl config database -db orcl

This is an instruction to reveal all that the cluster registry knows about how the orcl database is configured. You’ll get a response similar to this:

Database unique name: orcl
Database name: orcl
Oracle home: /u01/app/oracle/product/12.1.0/db_1
Oracle user: oracle
Spfile: +DATA/ORCL/PARAMETERFILE/spfile.288.936300889
Password file: +DATA/ORCL/PASSWORD/pwdorcl.276.936300685
Domain: dizwell.home
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA
Mount point paths: 
Services: 
Type: RAC
Start concurrency: 
Stop concurrency: 
OSDBA group: dba
OSOPER group: 
Database instances: orcl1,orcl2
Configured nodes: alpher,bethe
Database is administrator managed

Note, especially, the bit in bold/red above: it’s the location of the database’s password file. You’ll note that it’s actually stored inside an ASM diskgroup, which is a new feature of 12c. This complicates things a bit, because we need both clusters to have copies of this file and it’s tricky copying stuff between different ASM diskgroups stored on servers that are remote from each other. So, we’re going to have to go about it bit-by-bit as you’ll shortly see. Note, too, the bit in bold/purple above: that’s the location of the spfile and we’ll need to know that for later, too.

So here’s how we go about copying that password file to our Standby node:

su - grid
asmcmd
cp +DATA/ORCL/PASSWORD/pwdorcl.276.936300685 /tmp/orapworcldg1
mkalias +DATA/orcl/parameterfile/spfile.288.936300889 +DATA/orcl/spfileorcl.ora
exit
chmod 777 /tmp/orapworcldg1
exit

I’m becoming the grid user (because I’m using split ownership; you’d do everything as the oracle user if you used unitary ownership, switching ORACLE_HOME and ORACLE_SID to the correct values for working with ASM) and launching the asmcmd utility. That lets us copy the password file (whose path and name we know because of the earlier srvctl command -if yours is different, you obviously type in the path you saw with that srvctl command rather than the specific one I use here) to the /tmp directory: it’s now an ordinary file. We happen also to make an alias for the spfile, so that we can later refer to it by the simplified name “spfileorcl.ora” instead of its ‘proper’ name, which includes lots of numbers and dots and is a pain to type correctly (and again, use the right numbers and dots, obtained by srvctl earlier, not just the ones shown here)! Finally, we quit asmcmd and make the copy of the password file in the /tmp directory accessible to anyone and everyone.

Now, we become the oracle user again (or continue as the oracle user if you’re using unitary ownership!):

su - oracle
scp /tmp/orapworcldg1 geiger:$ORACLE_HOME/dbs/orapworcldg1

Here, I’m copying the password file to just Geiger, using the scp utility. Although the password file is not ASM-based on Geiger, its existence anywhere will allow us to create an instance called “orcldg1” and to use that to clone the full database from Alpher/Bethe. Be warned that you will be prompted with a pile of scary-looking stuff when making your first scp connection between servers in this way:

Just confirm ‘yes’ when prompted and then type in the oracle user’s password on the remote nodes (which is ‘oracle’, of course). The file copying will start immediately you do so, as you can see.

Once the password file is across, you switch to Geiger and start a single instance up, as follows:

su - oracle
sqlplus / as sysdba
startup nomount
exit

We startup in the ‘nomount’ state because, of course, at this stage of the game there is no controlfile to read/write, nor any other part of a physical database to make use of: we can only create the memory structures and background processes needed to interact with a database which doesn’t exist yet. You should see confirmation that an SGA is running and has memory allocated, at which point you can exit from SQL*Plus:

With that done, it’s now the moment of truth. Back on Alpher, as the oracle user, you launch the RMAN utility and instruct it to clone the live, running orcl database across to the new node, as follows:

su - oracle
rman target [email protected] [email protected] << EOF
 run {
 allocate channel d1 type disk;
 allocate channel d2 type disk;
 allocate channel d3 type disk;
 allocate channel d4 type disk;
 allocate auxiliary channel d5 type disk;
 duplicate database for standby from active database nofilenamecheck dorecover
 spfile
 parameter_value_convert 'orcl','orcldg'
 set db_unique_name='orcldg'
 set db_file_name_convert='/orcl/','/orcldg/'
 set log_file_name_convert='/orcl/','/orcldg/'
 set fal_server='orcl'
 set fal_client='orcldg'
 set standby_file_management='AUTO'
 set log_archive_config='dg_config=(orcldg,orcl)'
 set log_archive_dest_1='LOCATION=use_db_recovery_file_dest', 'valid_for=(ALL_ROLES,ALL_LOGFILES)'
 set log_archive_dest_2='SERVICE=orcl LGWR ASYNC NOAFFIRM delay=0 reopen=300 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=orcl'
 set log_archive_dest_state_2='enable'
 set remote_listener='dgscan:1521'
 set instance_number='1';}
EOF

Cut-and-paste that lot in, all in one chunk: it launches RMAN, connects to both the Alpher/Bethe orcl database and the Geiger/Dirac instance, sets a bunch of parameters over on the Geiger/Dirac instance as part of the operation and then begins ‘duplicating the active database’ between the clusters. It looks complicated, because it is. It will either work entirely or collapse in a heap half-way through! If you’ve followed all the instructions up to this point, it ought to work just fine! (Just note, that last ‘EOF’ needs to be left-aligned in the text you paste. If it’s spaced away from the left margin at all, it won’t work properly. If you would prefer getting hold of a script that is known to work, rather than cut-and-paste from a web page that may contain all sorts of formatting artifacts, download it here).

Quite a lot of stuff will scroll by as the cloning process does its work:

So long as you see it ‘starting media recovery’ and then declaring ‘Recover Manager complete’, all is well: the thing has worked as intended.

8.0 Checking it out

You now have a working database running on Geiger. We should switch there now and check out what we’ve actually got for our efforts. As the oracle user, connect to the running instance as follows:

sqlplus / as sysdba

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /u01/app/oracle/product/12.1.0
                                                 /db_1/dbs/spfileorcldg1.ora
SQL> select status from v$instance;

STATUS
------------
MOUNTED

SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PHYSICAL STANDBY

SQL> select name from v$controlfile;

NAME
--------------------------------------------------------------------------------
+DATA/ORCLDG/CONTROLFILE/current.277.936374515

SQL> select count(*) from gv$instance;

COUNT(*)
----------
 1

SQL> show parameter cluster

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean TRUE
cluster_database_instances integer 2
cluster_interconnects string

So this tells us that we’ve created a database which knows it is to act as a standby database; that it’s been left in the MOUNT state; that it’s not actually clustered (it’s instance count is 1) but it’s cluster-able (cluster_database is set to TRUE). We can also see that RMAN has been smart enough to know to place the control file within an appropriate ASM disk group -but that it’s also saddled us with an spfile which is sitting on Geiger’s local hard disk (and is thus not shareable with Dirac).

We can issue one more command, too, once we’ve exited out of SQL*Plus back to the command prompt:

[email protected]:~ [orcldg1]$ srvctl config database -db orcldg
PRCD-1120 : The resource for database orcldg could not be found.
PRCR-1001 : Resource ora.orcldg.db does not exist

RMAN is good -but it’s not that good! It hasn’t configured the cluster registry for this database, so there’s no way we can manage this database using srvctl commands.

What we see, therefore, is that RMAN has done maybe 90% of the job for us -but an important 10% remains to be done by hand. That’s the final stage of creating an active standby database in the Churchill framework -and what we’ll tackle next.

9.0 Turning Clone into Standby

We have to tackle a number of issues to complete our job. The first is to get that locally-stored spfile moved into the ASM disk group so that Dirac can see and use it too. A similar thing has to be done for the password file (which we copied to our nodes as local files earlier, but now need to be stored on shared storage).

9.1 Shared Spfile

On Geiger, as the oracle user, issue these commands:

sqlplus / as sysdba
create pfile='/tmp/orcldg.pfile' from spfile;
create spfile='+DATA/orcldg/spfileorcl.ora' from pfile='/tmp/orcldg.pfile';
exit
rm $ORACLE_HOME/dbs/spfile*
rm $ORACLE_HOME/dbs/initorcldg1.ora
echo "SPFILE='+DATA/orcldg/spfileorcl.ora'" > $ORACLE_HOME/dbs/initorcldg1.ora

In SQL*Plus, we use standard commands to create a temporary copy of the in-use spfile, then we create a new spfile inside the ASM diskgroup from that temporary copy. Back at the command line, we remove all existing local copies of spfiles and init.oras, in case they confuse things. Finally, we create a new instance-specific init.ora which contains the single instruction to look in the DATA disk group for the actual spfile.

We now repeat the same sort of thing on Dirac (also as the oracle user):

rm $ORACLE_HOME/dbs/spfile*
rm $ORACLE_HOME/dbs/initorcldg1.ora
echo "SPFILE='+DATA/orcldg/spfileorcl.ora'" > $ORACLE_HOME/dbs/initorcldg2.ora

We obviously don’t need to re-create the spfile again: that’s now in ASM and is thus already shared between both nodes, even though it happened to have been created on Geiger. But we do need to clear out any local copies of init files that Dirac may possess (it probably doesn’t possess any and those first two commands will therefore just error with a ‘no such file’ message, but it pays to be certain!) and again create a local, instance-specific init.ora which contains the pointer to where the shared spfile can be found.

9.2 Register with the Cluster Registry

Back on Geiger, as the oracle user, issue these commands:

srvctl add database -d orcldg -o $ORACLE_HOME -r physical_standby
srvctl add instance -d orcldg -i orcldg1 -n geiger
srvctl add instance -d orcldg -i orcldg2 -n dirac

That tells the CRS that a new database exists, called orcldg, which is to act as a standby database. We then say that two separate instances should be configured as being related to that database: one should run on Geiger, called orcldg1; and one should run on Dirac, called orcldg2.

9.3 Sharing the Password File

Once the database and its instances have been declared registered in the Cluster Registry, we can make the Geiger-private password file (which we copied across from Alpher earlier on) an ASM-stored and thus Geiger/Dirac-shared password file, as follows:

#As oracle user on Geiger
#------------------------
chmod 777 $ORACLE_HOME/dbs/orapworcldg1
mv $ORACLE_HOME/dbs/orapworcldg1 /tmp
#As grid user on Geiger (or, in unitary environments, as oracle user but with +ASM as the SID)
#------------------------
asmcmd 
mkdir +DATA/ORCLDG/PASSWORD
pwmove --dbuniquename orcldg /tmp/orapworcldg1 +DATA/ORCLDG/PASSWORD/orapworcldg 
exit
#As oracle user on Geiger
#------------------------
rm /tmp/orapworcldg1
srvctl modify database -db orcldg -pwfile +DATA/ORCLDG/PASSWORD/orapworcldg

Again, it’s a lot of faffing about, but in essence, as the oracle user you’re copying the password file to a location that’s available to everyone so that you can then become the ASM user and use the “pwmove” command in ASMCMD to take that password file off the file system and write it to an ASM disk (you may get an error that the original file couldn’t be deleted from /tmp, but that’s not a problem: so long as it got written to the PASSWORD directory, things will work). You then invoke srvctl as the oracle user to tell the Cluster Registry about the existence of the new password file. You should probably go around your file system at this point, too, and delete any and all copies of the password file which are hanging about the place!

9.4 Finishing it off

As the oracle user, on Geiger, at the command prompt, issue these commands:

sqlplus / as sysdba
shutdown immediate
exit

srvctl start instance -d orcldg -i orcldg1

That is, you start by shutting down the instance that RMAN created for you, directly within SQL*Plus. That’s because it is an unregistered instance (i.e., its creation wasn’t registered with the CRS) and its continued existence would confuse things badly. Once it’s been shutdown, you can then use srvctl to bring up just the first instance now properly registered with the “orcldg” database in the CRS.

Just a note of warning: I frequently have to issue that command more than once before it works. On a heavily-loaded system, there seems to be a delay in ASM making all its disk groups available at once and the control file often gets ‘lost’ as a consequence. Don’t fret if your first attempt fails, therefore: just give it a minute to catch breath and then try again. Second time around usually nails it for me.

Once the first instance is up, you do this:

sqlplus / as sysdba
alter system set instance_number=2 scope=spfile sid='orcldg2';
exit

That just gets the instance number parameter value correct for any instance attempting to join the cluster that is called ‘orcldg2’. Once that’s done, you can truly get the thing finished:

srvctl stop database -d orcldg
srvctl start database -d orcldg

…which is to say, you kill off everything that’s currently associated with “orcldg” and then you start the entire database up afresh -and if you done everything I’ve described before, that should mean starting instances on both Geiger and Dirac:

If you get a count of 2, that means you now have a clustered standby database. Except for one minor detail: the database isn’t actually acting as a standby at this stage, because it’s not processing any of the redo which the Alpher/Bethe system is sending it. That requires one final SQL*Plus command:

sqlplus / as sysdba
alter database recover managed standby database using current logfile disconnect from session;

This will launch the Managed Recovery Process (MRP) on Geiger, whose job it is to intercept redo sent from Alpher or Bethe and process it on the cloned standby, thus keeping the standby ‘in sync’ with the primary.

Once that command returns (it takes a few seconds), you should be able to do this on Alpher or Bethe (in SQL*Plus):

create table dizwell as select table_name from dba_tables;

…and this on Dirac (or Geiger):

select count(*) from dizwell;

You may need to wait a few seconds for the second command to return anything (because you’re probably running 6 virtual machines on a single, over-heating and struggling desktop PC at this point!), but if you give it a few seconds at most, you should get a meaningful count of records from Dirac for a table you just created over on Alpher… meaning that Alpher/Bethe is sending redo to Geiger/Dirac, which is processing it (eventually!) and thus keeping itself in-sync with its primary:

The fact that you can query stuff in the Geiger/Dirac system tells you, too, that this is true MAA (Maximum Availability Architecture): a RAC system over on primary is redundant (because either Alpher or Bethe could blow up and still leave the other instance running and available to users); it’s sending its redo to Geiger/Dirac in near-real-time, thus ensuring transactions are ‘off site’ and protected from Alpher and Bethe both blowing up at the same time; the Geiger/Dirac system is itself a RAC, so if either Geiger or Dirac blows up, the remaining instance is able to act as a standby all on its own; and users can connect to Geiger/Dirac to at least query data (even if, as a read-only standby, it’s not able to be used to modify that data). It’s multiply-redundant, and therefore horribly complex… which explains why it can be so satisfying when it all finally ‘just works’!!

10. Conclusion

Phew! It doesn’t get much harder than this in Oracle… and articles don’t get much longer than this, either! It’s long, because it’s complex and there are a lot of moving parts to keep an eye on. But if you’ve been able to do work on Alpher/Bethe and see the results appear almost instantaneously on Gamow/Dalton, you’ve successfully achieved something close to the pinnacle of Oracle DBA-ing. (There’s always Exadata, of course, but we will move on swiftly…)

In this article, you’ve built two new Oracle servers; turned them into a cluster by installing Grid Infrastructure; installed Oracle database software on both; and then cloned the existing database from Alpher/Bethe over onto the new cluster, all the while using ASM as our cluster storage technology. By careful manipulation of the init.ora on both ends of the connection, you’ve arranged for Alpher/Bethe to send redo in near-real-time over to Gamow/Dalton and thus created an Active Data Guard configuration, providing maximum availability.

Now you have your 2-node/2-node setup, you are in a position to experiment with things like: failover, switchover and setting up Data Guard Broker. But these are complex subjects best left for another article!