Read Oracle RMAN 11g Backup and Recovery Online
Authors: Robert Freeman
Real Application Clusters: Unique Backup
Challenges
Before we dig any deeper, it’s helpful to consider the architectural nature of a RAC. Essentially, you have at least two different servers, each with its own memory and local disks, and each connected to a shared disk array. Oracle uses this hardware by creating two instances, one on each node, with their own SGA/PGA memory areas. Each instance has its own redo logs, but they exist on the shared disk and are accessible by the other nodes. All control files and datafiles are shared between the two instances, meaning there is only one database, with two threads accessing and updating the data simultaneously. Figure 21-1 provides an oversimplified look at RAC.
Chapter 21: RMAN and Real Application Clusters
503
FIGURE 21-1
RAC at its most basic
From the RMAN perspective, this architecture creates interesting challenges for taking backups.
First of all, multiple instances are running, but RMAN can connect to only a single node. This shouldn’t pose any problems for backing up the datafiles, but we do have a problem when it comes to archive logs. Each instance is archiving its own redo logs to a local drive rather than to the shared disks, so the issue is how we get to those other archive logs. But let’s start by considering the datafile backups.
Datafile Backups
Datafile backups in a RAC environment are pretty much the same as datafile backups in a single-node database: RMAN connects to a node and issues a
backup database
command. The memory that RMAN needs to perform the backup operation will be grabbed from that one node. If backing up to disk, the backups will be local to that node; if backing up to tape, that instance will have to be configured for integration with your MML.
The RMAN Snapshot Control File and RAC
You need to move the snapshot control file to a shared location if you plan to run backups from more than one node. If you do not move the snapshot control file to a shared file system such as OCFS or an ASM disk group, then you must make sure that the local destination is identical on all nodes. To see the snapshot control file location, use the
show
command: rman> show snapshot controlfile name;
To change the value, use the
configure
command:
rman> configure snapshot controlfile name to
'/u02/oradata/grid10/snap grid10.scf';
504
Part IV: RMAN in the Oracle Ecosystem
The only problem with this scenario is that it puts quite a bit of load on a single node. This may be what you are after; if not, there is a better way. RMAN can connect to only a single node initially, but it can allocate channels at all of your nodes during an actual backup operation. The following shows an example of how this would be done:
configure default device type sbt;
configure device type sbt parallelism 2;
configure channel 1 device type sbt
connect 'sys/password@prod1';
configure channel 2 device type sbt
connect 'sys/password@prod2';
backup database;
Then, you can run your backup, and RMAN will spread the work between your two nodes.
RAC datafiles sometimes have something known as
node affinity,
where a particular datafile is accessed faster from one node or the other. If this is the case for your cluster, RMAN knows about it and will back up the datafile from the node where it can be read the fastest. If there is no node affinity on your system, RMAN just distributes the work across the two channels as it would any two channels used to parallelize a backup. Obviously, you could allocate two channels at each node, or three, four, or more. How many channels each node utilizes should be based on the same performance parameters we explored in Chapter 16.
Automatic Distribution of Backup Work Across Nodes
Since Oracle Database 10
g
Release 2, RMAN can utilize information gleaned from Oracle’s Cluster Ready Services (CRS) to provide better RAC integration. Of most importance, you no longer have to configure a channel to specifically connect at a particular node. If you have two nodes and you set parallelism to 2, RMAN will query CRS for the node information and automatically spread the two channels across the two nodes. In addition, CRS keeps track of node utilization and will spread RMAN backup jobs out to those nodes that are currently being least utilized, to avoid I/O traffic jams. This is a significant automation improvement, and the lesson to take away is simple: don’t try to out-think CRS. Let it do the footwork of determining how to distribute work across your cluster.
Just set your level of parallelism equal to the total number of nodes you want involved in your backup, and let CRS do the work.
Archive Log Backups
Archive log backups are far trickier than datafile backups, because each node is responsible for its own archiving, which means that each node has unshared files that only it can access. If we connect to only one node and issue a
backup archivelog all
command, RMAN will look in the control file and discover the listing for the archive logs from both nodes, but when it looks at the local node, it will find only the archive logs from that node and it will error out.
Of course, the question may be posed, “Why not write archive logs to a raw partition on the shared disk array?” The answer is that you could, if you don’t mind the administrative nightmare.
Think about it: with raw partitions, you can write only one file per partition, which means you would have to anticipate all of your archive log filenames and create symbolic links to a partition that exists for each archive log. Such a task is simply too labor intensive for even the most scheming Unix-scripting mind among us. It is better to use a shared disk volume that has a clustered file system or that is managed by Oracle ASM.
Chapter 21: RMAN and Real Application Clusters
505
RMAN, RAC, and Net Connections
RAC comes with many extremely powerful load-balancing and failover features as part of the Net configuration, but this means changes in the listener.ora file and in the tnsnames.ora files for both the cluster nodes and the clients. RMAN is a little too picky for these features. RMAN
can only connect to one node and cannot fail over or be load balanced. Therefore, the Net aliases you use for the target connection and for the
connect
clause of the channel allocation string must be configured to connect to a single node with a dedicated server. This means that you cannot use the same NET aliases configured for failover that you use for other connection purposes.
If you insist on leaving archive logs local to each node, a solution is available that allows RMAN to cope with the non-shared disk locations. First, make sure that each node is archiving to a unique file location. For example, prod1 archives to a directory called /u04/prod1/arch, and prod2 archives to /u04/prod2/arch. Then, you can allocate channels at each node, as you did to load balance the datafile backups earlier, and back up the archive logs: configure default device type sbt;
configure device type sbt parallelism 2;
configure channel 1 device type sbt
connect 'sys/password@prod1';
configure channel 2 device type sbt
connect 'sys/password@prod2';
backup archivelog all delete input;
RMAN has a feature known as
autolocate
that identifies which archive logs belong to which node and that attempts to back them up only from that node. In this way, you don’t have to specify in RMAN which logs you need backed up at which node—RMAN can figure it out for you.
Another option that would allow you to perform your archive log backup from a single node would be to NFS mount the archive log destination of the other node. For example, at the node winrac1, you have local archive logs located at /u04/prod1/arch. Then, on winrac1, you NFS
mount the drive /u04/prod2/arch on winrac2 as /u04/prod2/arch. That way, when you run your archive log backups, RMAN checks the control file for the archive log locations, and it can find both locations while connected to only prod1. Figure 21-2 illustrates this methodology.
The only problem in the scenarios we’ve provided so far is that you are giving yourself a single point of failure for archive logs. If you archive your logs only to their respective nodes and you lose a node, you lose the archive logs from that node. That means you may have to perform point-in-time recovery of the entire database to the point right before the node was lost.
A better strategy is to set up each node with a LOG_ARCHIVE_DEST_2 parameter that writes to another node. One way to approach this task is to consider the NFS mount strategies already discussed in this chapter. Instead of just NFS mounting in READ ONLY mode the archive destination of the other node, consider NFS mounting a drive on the other node with write access, and then setting that NFS mount as a second archive destination. Take our two-node RAC
database, for example. On winrac1, we could mount the shared directory /u04/prod2/arch from
506
Part IV: RMAN in the Oracle Ecosystem
FIGURE 21-2
Mounting the archive log destination
winrac2, and on winrac2, we could mount winrac1’s /u04/prod1/arch directory. Then, we could set up the init.ora files for each node, as shown next:
winrac1 init.ora file:
log archive dest 1 'location /u04/prod1/arch'
log archive dest 2 'location /u04/prod2/arch'
…
winrac2 init.ora file:
log archive dest 1 'location /u04/prod2/arch'
log archive dest 2 'location /u04/prod1/arch'
When set up like this, Oracle writes archive logs from each node to the archive destination of the other node. This gives us an elegant solution for backing up the archive logs from a single node and provides us with fault tolerance in case a node is lost.
Avoid Archive Log Backup Complexity with ASM and the Flash Recovery Area
All of these complications can be avoided by creating a location on the shared disk array that has a volume cooked with a cluster file system, such as OCFS. Even better, you can employ ASM as