Read Oracle Essentials Oracle Database 11g Online
Authors: Rick Greenwald
RMAN reads and writes Oracle blocks, not operating system blocks. While RMAN
is backing up a datafile, Oracle blocks can be written to it, but RMAN will read and write in consistent Oracle blocks, not operating system blocks within an Oracle block.
The following list summarizes the RMAN capabilities that enable high availability:
• Automated channel failover during backup and restore
• Automated failover to a previous backup during restore when the current backup is missing or corrupt
• Automated new database and temporary file creation during recovery
• Automated recovery to a previous point in time
• Block media recovery while the datafile is online
• Block change tracking for fast incremental backups
• Merged incremental backups
• Backup and restore of required files only
278
|
Chapter 11: Oracle and High Availability
• Retention policy ensuring that relevant backups are available
• Resumable backup and restore if operations failed
• Automatic backup of the control file and server parameter file Since Oracle Database 10
g
, RMAN is also used to support automated disk-based backup. Disk-based strategies have an advantage over tape: they enable random access to any data such that only changes need be backed up or recovered. RMAN
can be set up to run a backup job to disk at a specific time. RMAN manages the deletion of backup files that are no longer necessary. In combination with ASM, RMAN
will write all backups, archive logs, control file autobackups, and datafile copies to a designated disk group. The single storage location is referred to as the Flash Recovery Area.
More recently, Oracle introduced the Information Lifecycle Management (ILM) Assistant for managing online data and allocating data to appropriate tiers of disk performance. In Oracle Database 11
g
, the Flashback data archive feature has been added to ILM, enabling the storing and tracking of all transactional changes to a record. This feature, available through Oracle’s Total Recall Option, allows you to gain access to previous database records.
Read-Only Tablespaces
Oracle 7.3 introduced read-only tablespaces. Using the ALTER TABLESPACE command in SQL, youcan mark a tablespace as read-only. No changes are possible to the objects stored in a read-only tablespace. Youcan toggle a tablespace between read/write and read-only states as you wish.
Once a tablespace is in read-only mode, it can be backed up once and doesn’t have to be backed up again, since its contents cannot change unless it’s placed in read/
write mode. Marking a tablespace as read-only allows entire sections of a database to be marked read-only, backed up once, and excluded from regular backups thereafter.
If a datafile of a read-only tablespace is damaged, youcan restore it directly from the backup without any recovery. Because no changes were made to the datafiles, no redo log information needs to be applied. For databases with significant static or historical data, this option can significantly simplify and streamline backup and restore operations.
Read-only tablespaces, combined with Oracle’s ability to partition a table on a range or list of column values (for example, a date) provide powerful support for the rolling windows common to data warehouses (described in
Chapter 10).
Once a new month’s data is loaded, indexed, and so on, the relevant tablespaces can be marked read-only and backed up once, removing the tablespaces datafile(s) from the cycle of ongoing backup and significantly reducing the time required for those backup operations.
Recovering from Failures
|
279
Point-in-Time Recovery
Oracle 7.3 introduced point-in-time recovery (PITR) for the entire database. Pointin-time recovery allows a DBA to restore the datafiles for the database and apply redo information up to a specific time or System Change Number (SCN). This limited type of recovery is useful for cases in which an error occurred—for example, if a table was dropped accidentally or a large number of rows were deleted incorrectly.
The DBA can restore the database to the point in time just prior to the event to undo the results of the mistake.
A difficulty with database-level point-in-time recovery is that the entire database has to be restored. In response to this limitation, Oracle8 introduced point-in-time recovery at the tablespace level within the database. Point-in-time recovery based on a tablespace allows a DBA to restore and recover a specific tablespace or set of tablespaces to a particular point in time. Only the tablespace(s) containing the desired objects need to be recovered. This has been a very useful improvement given the ever-increasing size of today’s databases.
However, this tablespace feature needs to be used carefully, since objects in one tablespace may have dependencies, such as referential integrity constraints, on objects in other tablespaces. For example, suppose that Tablespace1 contains the EMP table and Tablespace2 contains the DEPT table, and a foreign key constraint links these two tables together for referential integrity. If youwere to recover Tablespace2 to an earlier point than Tablespace1, youmight find that youhad rows in the EMP table that contained an invalid foreign key value, since the matching primary key entry in the DEPT table had not been rolled forward to the place where the primary key value to which the EMP table refers had been added. The newer Flashback capability (described in the next section), particularly the Flashback Table feature, now provides an easier-to-use alternative for table recovery.
Flashback
Oracle9
i
introduced a recovery approach called
Flashback
, which was designed to help in recovering from user errors. Flashback Query was the first example of this feature made available by Oracle. The concept behind a Flashback Query is simple.
You can execute a query against the database as of a particular time or System Change Number (SCN). Oracle delivers the result set as it would have appeared if the query were run at that time, using the undo log information segments to reconstruct the data, which can then be used to correct the results of the errant action.
Oracle Database 10
g
added a much wider range of flashback capabilities, including:
Flashback Versions Query
Returns all the versions of rows in a particular query over a span of time.
Flashback Transaction Query
Returns all the changes made by a specific transaction.
280
|
Chapter 11: Oracle and High Availability
Flashback Drop
When an object is dropped, it is placed in a Recycle Bin, so a user can simply un-drop the object to restore it.
Flashback Table
Returns a table to a specific point in time.
Flashback Database
Returns the entire database to a particular point in time. Can be used instead of point-in-time recovery in some situations.
Flashback Restore Points
Enables canceling of planned database changes using user-defined labels (instead of SCNs or timestamps). Can also be used with Data Guard and RMAN to resynchronize a clone database.
Oracle Database 11
g
adds a Flashback Transaction command for backing out an individual transaction and its dependent transactions by utilizing undo data to revert data to its original state.
As we noted earlier in this chapter, Oracle now provides a Flashback Data Archive capability through the Oracle Database 11
g
Total Recall Option. This data archive is established for a defined retention period. Update and delete operations are then recorded in tables that map to the database tables being tracked. If youhave specified an “AS OF” clause in your SQL specifying a particular moment it time, you will then have access to the data as it appeared at that moment. The updates and deletes are rolled back, as appropriate, and show youwhat the data looked like at the specified point in time.
Complete Site Failure
Protection from the complete failure of your primary Oracle site poses significant challenges. Your organization must carefully evaluate the risks to its primary site.
These risks include physical and environmental problems as well as hardware risks.
For example, is the data center in an area prone to floods, tornadoes, or earthquakes?
Are power failures a frequent occurrence? Earlier versions of this book treated events such as “a terrorist attack or an airplane crash into the data center” as remote possibil-ities, but, unfortunately, these scenarios no longer seem so implausible.
Protection from primary site failure involves monitoring of and redundancy controls for the following:
• Data center power supply
• Data center climate control facilities
• Database server redundancy
• Database redundancy
• Data redundancy
Complete Site Failure
|
281
The first three items on the list are aimed at preventing the failure of the data center.
Data server redundancy, through simple hardware failover or Real Application Clusters, provides protection from node failure within a data center but not from complete data center loss.
Should the data center fail completely, the last two items—database redundancy and data redundancy—provide for disaster recovery.
Emerging Technologies: Clusters Across a Distance
Some vendors are now offering clustering solutions that allow the nodes of the cluster to be separated by enough distance to allow one node to survive the failure of the data center that contains the other node. In fact, it is anticipated that many grid computing deployments will occur this way in the future. The clustering of nodes separated by a few kilometers is becoming possible using sophisticated interconnect technologies that can function over greater distances. The disks are mirrored with a copy at each site to allow each site to function in the event of a complete failure of the other site.
These solutions are intriguing because they can provide data server redundancy and data center redundancy in a single solution. If one node (or the data center containing it) fails, the node in the other data center provides failover.
A simpler approach often used in data warehousing is to create duplicate grid implementations at primary and secondary sites. Extraction scripts from the source systems load both data warehouses simultaneously. If one of the target sites fails, ETL remains queued such that the failed system can be updated once it is recovered.
Oracle Data Guard: Standby Database for Redundancy
Oracle’s physical standby database functionality was introduced in Oracle 7.3 to provide database redundancy. In Oracle9
i
, this concept was extended to include support for a logical standby database. The enhanced feature set is called Oracle Data Guard.
The concept of a physical standby database is simple—keep a copy of the database files at a second location, ship the redo logs to the second site as they are filled, and apply them to the copy of the database. This process keeps the standby database “a few steps” behind the primary database. If the primary site fails, the standby database is opened and becomes the production database. The potential data loss is limited to the transactions in any redo logs that have not been shipped to the standby
site. Figure 11-10 illustrates the standby database feature.
The physical standby database can be used to offload reporting, such as end-of-day reports, from the primary server to the standby server. The ability to offload reporting requests provides flexibility for reporting and queries and can help performance on the primary server while making use of the standby server.
282
|
Chapter 11: Oracle and High Availability
Production
Standby
System
System
• Production database is copied
(New York)
(London)
to the standby system
• Standby system is started in
standby recovery mode
• As archived redo logs are
generated on the production
Archived
systems, they are transferred to
Redo Logs
the standby system and applied
Oracle
Oracle
Instance
Instance
• If the production system fails,
the standby system is activated
Database
Database
Copy
Figure 11-10. Standby database
When the standby database was being used for reporting, archived redo information from the primary site could not be applied prior to Oracle Database 10
g
. Recovery could only continue when the standby database was closed again. This factor had important implications for the time it took to recover from an outage with the standby database. Oracle Database 10
g
introduced a real-time apply, enabling redo data to be applied at the standby as soon as it was received.
The physical standby database is still more useful as of Oracle Database 11
g
since by deploying the Active Data Guard Option you can query the standby database while the redo apply is active. The implication of all of these enhancements is that youcan use your disaster recovery database to handle some of your query workload, as a site for database backups, and as a site to test database changes.
Logical standby database
Oracle Data Guard also offers a logical standby database capability. With this capability, the standard Oracle archive logs are transformed into SQL transactions, and these are applied to an open standby database. The logical standby database is different physically from the primary standby database and can be used for different tasks.
For example, the primary database might be indexed for transaction processing while the standby database might be indexed for data warehousing. Although physically different from the primary database, the secondary database is logically the same and can take over processing in case the primary fails. As archive logs are shipped from the primary to the secondary, undo records in the shipped archive log can be compared to the logical standby undo records to guard against potential corruption. As of Oracle Database 10
g
, youcan instantiate the logical standby database without quiescing the primary.