Read Oracle Essentials Oracle Database 11g Online
Authors: Rick Greenwald
As shown in this figure, TAF can automatically reconnect clients to another instance of the database, which provides access to the same database as the original instance.
The high-availability benefits of TAF include the following:
Transparent reconnection
Clients don’t have to manually reconnect to a surviving instance. You can optimally reconfigure TAF to preconnect clients to an alternate instance in addition to their primary instance when they log on. Preconnecting clients to an alternate instance removes the overhead of establishing a new connection when automatic
272
|
Chapter 11: Oracle and High Availability
Before Failure
After Failure
Oracle
Oracle
Oracle
Instance
Instance
Instance
Oracle Database
Oracle Database
• Client automatically reconnects to surviving instance
• TAF can resubmit queries automatically
• Applications can be made failover-aware and can resubmit transactions
Figure 11-7. Failover with TAF and Real Application Clusters
failover takes place. For systems with a large number of connected clients, this preconnection avoids the overhead and delays caused by flooding the alternate instance with a large number of simultaneous connection requests.
Automatic resubmission of queries
TAF can automatically resubmit queries that were active at the time the first instance failed and can resume sending results back to the client. Oracle will reexecute the query as of the time the original query started. Oracle’s read consistency will therefore provide the correct answer regardless of any activity since the query began. However, when the user requests the “next” row from a query, Oracle will have to process through all rows from the start of the query until the requested row, which may result in a performance lag.
Callback functions
Oracle8
i
enhanced TAF by enabling the application developer to register a “callback function” with TAF. Once TAF has successfully reconnected the client to the alternate instance, the registered function will be called automatically. The application developer can use the callback function to reinitialize various aspects of session state as desired.
Failover-aware applications
Application developers can leverage TAF by writing “failover-aware” applications that resubmit transactions that were lost when the client’s primary instance failed, further reducing the impact of failure. Note that unlike query resubmission, TAF itself doesn’t automatically resubmit the transactions that were in-flight.
Rather, it provides a framework for a seamless failover that can be leveraged by application developers.
Protecting Against System Failure
|
273
How TAF works
TAF is implemented in the Oracle Call Interface (OCI) layer, a low-level API for establishing and managing Oracle database connections. When the instance to which a client is connected fails, the client’s server process ceases to exist. The OCI layer in the client can detect the absence of a server process on the other end of the channel and automatically establish a new connection to another instance. The alternate instance to which TAF reconnects users is specified in the Oracle Net configuration files, which are described in the Oracle Net documentation.
Because OCI is a low-level API, writing programs with OCI requires more effort and sophistication on the part of the developer. Fortunately, Oracle uses OCI to write client tools and various drivers, so that applications using these tools can leverage TAF.
Support for TAF in ODBC and JDBC drivers is especially useful; it means that TAF
can be leveraged by any client application that uses these drivers to connect to Oracle. For example, TAF can provide automatic reconnection for a third-party query tool that uses ODBC. To implement TAF with ODBC, set up an ODBC data source that uses an Oracle Net service name that is configured to use TAF in the Oracle Net configuration files. ODBC uses Oracle Net and can therefore leverage the TAF
feature.
TAF and various Oracle configurations
Although the TAF-Real Application Clusters combination is the most obvious combination for high availability, TAF can be used with a single Oracle instance or with multiple databases, each accessible from a single instance. Some possible configurations are as follows:
• TAF can automatically reconnect clients back to their original instances for cases in which the instance failed but the node did not. An automated monitoring system, such as Oracle Enterprise Manager, can detect instance failure quickly and restart the instance. The fast-start recovery features in Oracle enable very low crash recovery times. Users that aren’t performing heads-down data entry work can be automatically reconnected by TAF and might never be aware that their instance failed and was restarted.
• In simple clusters, TAF can reconnect users to the instance started by simple hardware failover on the surviving node of a cluster. The reconnection cannot occur until the alternate node has started Oracle and has performed crash recovery.
• When there are two distinct databases, each with a single instance, TAF can reconnect clients to an instance that provides access to a different database running in another data center. This clearly requires replication of the relevant data between the two databases. Oracle fortunately provides automated support for data replication, which is covered in the later section entitled
“Complete Site
274
|
Chapter 11: Oracle and High Availability
Recovering from Failures
Despite the prevalence of redundant or protected disk storage, media failures can and do occur. In cases in which one or more Oracle datafiles are lost due to disk failure, you must use database backups to recover the lost data.
There are times when simple human or machine error can also lead to the loss of data, just as a media failure can. For example, an administrator may accidentally delete a datafile, or an I/O subsystem may malfunction, corrupting data on the disks.
The key to being prepared to handle these types of failures is implementing a good backup-and-recovery strategy and understanding the power of Oracle’s newer features such as Flashback.
Developing a Backup-and-Recovery Strategy
Proper development, documentation, and testing of your backup-and-recovery strategy is one of the most important activities in implementing an Oracle database. You must test every phase of the backup-and-recovery process to ensure that the entire process works, because once a disaster hits, the complete recovery process
must
work flawlessly.
Some companies test the backup procedure but fail to actually test recovery using the backups taken. Only when a failure requires the use of the backups do companies discover that the backups in place were unusable for some reason. It’s critical to test the entire cycle from backup through restore and recovery.
Taking Oracle Backups
Two basic types of backups are available with Oracle:
Hot backup
The datafiles for one or more tablespaces are backed up while the database is active.
Cold backup
The database is shut down and all the datafiles, redo log files, and control files are backed up.
With a hot backup, not all of the datafiles must be backed up at once. For instance, you may want to back up a different group of datafiles each night. You must be sure to keep backups of the archived redo logs that date back to your oldest backed-up datafile, since you’ll need them if you have to implement rollforward recovery from the time of that oldest datafile backup.
Some DBAs with very large databases back up the various datafiles over several runs.
Some DBAs back up the datafiles that contain data subject to frequent changes more frequently (for example, daily), and back up datafiles containing more static data less
Recovering from Failures
|
275
often (for example, weekly). There are commands to back up the control file as well; this should be done after all the datafiles have been backed up.
If the database isn’t archiving redo logs (this is known as running in NOARCHIVELOG mode and is described in
Chapter 2),
youcan take only complete cold backups. If the database is archiving redo logs, it can be backed up while running.
Regardless of backup type, you should also back up the
INIT.ORA
or
SPFILE
file and password files—these are key files for the operation of your Oracle database.
While not required, you should also back up the various scripts used to create and further develop the database. These scripts represent an important part of the documentation of the structure and evolution of the database.
For more information about the different types of backups and variations on these types, please refer to your Oracle documentation as well as the third-party books listed in
Appendix B.
Using Backups to Recover
Two basic types of recovery are possible with Oracle, based on whether or youare archiving the redo logs:
Complete database recovery
If the database did not archive redo logs, only a complete cold backup is possible. Correspondingly, only a complete database recovery can be performed. You restore the database files, redo logs, and control files from the backup. The database is essentially restored as of the time of the backup. All work done since the time of the backup is lost and a complete recovery must be performed even if only one of the datafiles is damaged. The potential for lost work, coupled with the need to restore the entire database to correct partial failure, are reasons most shops avoid this situation by running their databases in ARCHIVELOG mode.
Figure 11-8
illustrates backup and recovery for a database without archived redo logs.
Partial or targeted restore and rollforward recovery
When you’re running the Oracle database in ARCHIVELOG mode, you can restore only the damaged datafile(s) and can apply redo log information from the time the backup was taken to the point of failure. The archived and online redo logs reproduce all the changes to the restored datafiles to bring them up to the same point in time as the rest of the database. This procedure minimizes the time for the restore and recovery operations. Partial recovery like this can be done with the database down. Alternatively, the affected tablespace(s) can be placed offline and recovery can be performed with the rest of the database available. Oracle9
i
improved the granularity of the recovery process by also enabling restore and recovery of individual data blocks instead of providing restore and recovery only of entire datafiles.
Figure 11-9
illustrates backup and recovery with archived redo logs.
276
|
Chapter 11: Oracle and High Availability
WITHOUT ARCHIVING - The work from T to T+10 is lost
1
Full Cold Backup
Datafiles
Control
Online
Files
Redo Logs
3
Complete
Restore to
Time = T
2
Disk failure
TIME
T
T + 10
Figure 11-8. Database backup and recovery without archived redo logs
WITH ARCHIVING - Minimized restore, no work lost
T
T + 10
TIME
Archived Redo Logs
Online Redo Logs
1
Hot Backup
Datafiles
Control
Files
4
Replay Changes from Logs
2
Disk Failure
3
Restore Only
Damaged Datafiles
TIME
T
T + 10
Figure 11-9. Database backup and recovery with archived redo logs
Obviously, the redo logs are extremely important. Oracle first enabled analysis of these files through the LogMiner tool in Oracle8
i
. Since Oracle9
i
, the LogMiner is accessible through an Oracle Enterprise Manager GUI, and it provides log analysis for all datatypes. If the redo log has become corrupted, the LogMiner can now read past corrupted records as desired in order to analyze the impact on transactions after the corruption.
Recovering from Failures
|
277
Recovery Manager
Recovery Manager (RMAN), first available with Oracle8, provides server-managed online backup and recovery. RMAN does the following:
• Backs up one or more datafiles to disk or tape
• Backs up archived redo logs to disk or tape
• Restores datafiles from disk or tape
• Restores and applies archived redo logs to perform recovery
• Automatically parallelizes both the reading and writing of the various Oracle files being backed up
RMAN performs the backup operations and updates a catalog (stored in an Oracle database) with the details of what backups were taken and where they were stored.
You can query this catalog for critical information, such as datafiles that have not been backed up or datafiles whose backups have been invalidated through NOLOGGING operations performed on objects contained in those datafiles.
RMAN also uses the catalog to perform incremental backups. RMAN will back up only database blocks that have changed since the last backup. When RMAN backs up only the individual changed blocks in the database, the overall backup and recovery time can be significantly reduced for databases in which a small percentage of the data in large tables changes. Since Oracle Database 10
g
, RMAN can apply incremental backups to an image backup of the database. Improvements in methods used by RMAN in recent Oracle releases have greatly enhanced performance for incremental backups.