Read Oracle Essentials Oracle Database 11g Online
Authors: Rick Greenwald
In Oracle releases prior to Version 7.3, the database wasn’t available until all uncommitted transactions rolled back. Although a DBA could control the checkpoint frequency and therefore control the time required for the rollforward phase of instance recovery, the number of uncommitted transactions at the time of the crash varied tremendously so the time needed for rollback could not really be accurately controlled or predicted. In a busy OLTP system, there are typically a fair number of in-flight transactions requiring rollback after a crash. This situation led to variable and unpredictable times for crash recovery.
The solution to this problem,
deferred rollback
, was introduced in Oracle 7.3. Oracle opens the database after the rollforward phase of recovery and performs the rollback of uncommitted transactions in the background. This process reduces database downtime and helps to reduce the variability of recovery times by deferring the rollback phase.
But what if a user’s transaction begins working in a database block that contains some changes left behind by an uncommitted transaction? If this happens, the user’s transaction will trigger a foreground rollback to undo the changes and will then proceed when rollback is complete. This action is transparent to the user—he doesn’t receive error messages or have to resubmit the transaction.
Oracle8
i
further optimized the deferred rollback process by limiting the rollback triggered by a user transaction to the block in which the transaction is interested. For example, suppose there is a large uncommitted transaction that affected 500 database blocks. Prior to Oracle8
i
, the first user transaction that touched one of those 500 blocks would trigger a foreground rollback and absorb the overhead of rolling back the entire transaction. Leveraging fast-start rollback, the user’s transaction would roll back only the changes to the block in which it was interested. New transactions would not have to wait for the complete rollback of large uncommitted transactions.
System Failure
|
261
Today, rollback management is made simpler by automated features in the database. For example, as of Oracle Database 10
g
, automatic undo retention tuning occurs controlling the amount of undo information held in rollbacks. The Redo Logfile Size Advisor determines the optimal smallest redo logfile size based on the FAST_START_MTTR_TARGET setting and database statistics gathered.
Protecting Against System Failure
There are a variety of approaches youcan take to help protect your system against the ill effects of system crashes and failures, including the following:
• Providing component redundancy
• Using Real Application Clusters
• Using Transparent Application Failover software services
Component Redundancy
As basic protection, the various hardware components that make up the database server itself must be fault-tolerant.
Fault-tolerance
, as the name implies, allows the overall hardware system to continue to operate even if one of its components fails.
This, in turn, implies redundant components and the ability to detect component failure and seamlessly integrate the failed component’s replacement. The major system components that should be fault-tolerant include the following:
• Disk drives
• Disk controllers
• CPUs
• Power supplies
• Cooling fans
• Network cards
• System buses
Disk failure is the largest area of exposure for hardware failure, since disks have the shortest mean times to failure of any of the components in a computer system. Disks also present the greatest variety of redundant solutions, so discussing that type of failure in detail should provide the best example of how high availability can be implemented with hardware.
Disk redundancy
Although the mean time to failure of an individual disk drive is very high, the ever-increasing number of disks used for today’s very large databases results in more frequent failures. Protection from disk failure is usually accomplished using RAID
262
|
Chapter 11: Oracle and High Availability
(Redundant Array of Inexpensive Disks) technology. The use of redundant storage has become common for systems of all sizes and types for two primary reasons: the real threat of disk failure and the proliferation of packaged, relatively affordable RAID solutions.
RAID technology uses one of two concepts to achieve redundancy:
Mirroring
The actual data is duplicated on another disk in the system.
Striping with parity
Data is striped on multiple disks, but instead of duplicating the data itself for redundancy, a mathematical calculation termed
parity
is performed on the data and the result is stored on another disk. You can think of parity as the sum of the striped data. If one of the disks is lost, youcan reconstruct the data on that disk using the surviving disks and the parity data. The lost data represents the only unknown variable in the equation and can be derived. You can conceptual-ize this as a simple formula:
A + B + C + D = E
in which A–D are data striped across four disks and E is the parity data on a fifth disk. If youlose any of the disks, youcan solve the equation to identify the missing component. For example, if youlose the B drive youcan solve the formula as:
B = E – A – C – D
There are a number of different disk configurations or types of RAID technology, which are formally termed
levels
. The basics of RAID technology were introduced in
Chapter 7,
but
Table 11-2
summarizes the most relevant levels of RAID in a bit more detail, in terms of their cost, high availability, and the way Oracle uses each RAID
level.
Which RAID Levels Should You Use with Oracle?
Some people say that you should never use RAID-5 for an Oracle database because of the degraded write performance of this level of RAID. RAID-1 and RAID-0+1 offer better performance, but at double the cost of disk storage. RAID-5 offers a cheaper and reasonable solution, provided that you can meet performance requirements despite the extra write overhead for maintaining parity data. Use these generic guidelines to help determine the appropriate uses of different RAID levels:
• Use RAID-1 for redo log files.
• Use RAID-5 for database files, provided that the write overhead is acceptable and adequate I/O is available.
• Use RAID-1 or RAID-0+1 for database files if RAID-5 write overhead is unacceptable.
Protecting Against System Failure
|
263
Table 11-2. RAID levels relevant to high availability
Level
Disk configuration
Cost
Comments
Oracle usage
0
Simple striping, no
Same cost as unpro-
The term RAID-0 is used
Striping simplifies administra-
redundancy
tected storage.
to describe striping,
tion for Oracle datafiles. Suitable
which increases read
for all types of data for which
and write throughput.
redundancy isn’t required.
However, this is not
really RAID, as there is
no actual redundancy.
1
Mirroring
Twice the cost of
Same write perfor-
Lack of striping adds complexity
unprotected storage.
mance as a single disk.
of managing a larger number of
Read performance may
devices for Oracle. Often used for
improve through servic-
redo logs, since the I/O for redo
ing reads from both
is typically relatively small
copies.
sequential writes. Striped arrays
are more suited to large I/Os or
to multiple smaller, random I/Os.
0+1
Striping and mir-
Twice the cost of
Best of both worlds—
Same usage as RAID-0, but
roring
unprotected storage.
striping increases read
provides protection from disk
and write performance
failure.
and mirroring for
redundancy avoids
“read-modify-write”
overhead of RAID-5.
5
Striping with rotat-
Storage capacity is
Parity data is spread
Cost-effective solution for all Ora-
ing or distributed
reduced by 1/
N
,
across all disks, avoiding
cle data except redo logs.
parity
where
N
is the num-
the potential bottleneck
Degraded write performance
ber of disks in the
found in some other
must be taken into account. Pop-
array. For example,
types of RAID arrays.
ular for reads where adequate I/O
the storage is
Striping increases read
is provided. Write penalties may
reduced by 20%, or
performance. Maintain-
slow loads and index builds.
1/5 of the total disk
ing parity data adds
Often avoided for high-volume
storage, for a 5-disk
additional I/O, decreas-
OLTP due to write penalties.
array.
ing write performance.
Some storage vendors, such as
For each write, the asso-
EMC, have proprietary solutions
ciated parity data must
(RAID-S) to minimize parity over-
be read, modified, and
head on writes.
written back to disk. This
is referred to as the
“read-modify-write”
penalty.
264
|
Chapter 11: Oracle and High Availability
Figure 11-3 illustrates the disk configurations for various RAID levels.
RAID-O: Simple Striping, No Redundancy
DATA
DATA
DATA
DATA
RAID-1: Simple Mirroring
DATA
DATA
RAID-0+1: Striping and Mirroring
64 KB
64 KB
64 KB
64 KB
64 KB
64 KB
64 KB
64 KB
RAID-5: Striping with Distributed Parity
A
B
C
D
PARITY
PARITY
A
B
C
D
D
PARITY
A
B
C
C
D
PARITY
A
B
Figure 11-3. RAID levels commonly used with an Oracle database
Automatic Storage Management
Oracle Database 10
g
and more recent database releases include Automatic Storage Management (ASM). We introduced ASM in
Chapter 5
and described its manageability considerations. ASM enables youto create a pool of storage on disk groups and then manages the placement of database files on the storage. ASM features enable it to replace non-Oracle file systems and logical volume managers for files managed by the Oracle database. An ASM instance manages each of the disks in the disk group, and one ASM instance is provided for each database node in a RAC
environment.
ASM provides “Striping and Mirroring Everything” (SAME) for many types of disks, including “Just a Bunch of Disks” (JBOD) arrays. You can specify groups of disks, and designate a failure group to be used in the result of a disk failure. Mirroring can also be set up on a per-file basis, and you can specify one or two mirrors. ASM
includes the ability to detect disk “hot spots” and redistribute data to avoid disk bottlenecks, as well as the capability of adding disks to a disk group without any interruption in service. DBAs add the disks to disk groups or remove disks from disk groups using Oracle Enterprise Manager.
Protecting Against System Failure
|
265
Stored data is automatically rebalanced when disks are added or removed. When a drive fails, remirroring to remaining drives is automatic. These features make ASM
ideal for managing a database storage grid and allow youto use cheaper disk systems while obtaining higher levels of availability and performance.
Oracle Database 11
g
introduced a fast mirror resynchronization capability enabling faster recovery from transient failures. ASM can now be set to only resynchronize changed ASM disk extents for limited duration failures.
Simple Hardware Failover
Oracle recovers automatically from a system crash. The automatic recovery protects data integrity, critical in a relational database, but also results in downtime as the database recovers from a crash. When a hardware failure occurs, the ability to quickly detect a system crash and initiate recovery is crucial to minimizing the associated downtime.
When an individual server fails, the instance running on that node fails as well.
Depending on the cause, the failed node may not return to service quickly or be noticed immediately. Either way, companies that wish to protect their systems from the failure of a node typically employ a cluster of machines to achieve simple hardware
failover
. Failover is the ability of a surviving node in a cluster to assume the responsibilities of a failed node. Although failover doesn’t directly address the issue of the reliability of the underlying hardware, automated failover can reduce the downtime from hardware failure.
The concept is very simple: a combination of software and hardware “watches” over the cluster. Typically, this monitoring is done by regularly checking a
heartbeat
, which is a message sent between machines in the cluster. If Machine A fails, Machine B will detect the failure through the loss of the heartbeat and will execute scripts to take over control of the disks, assume Machine A’s network address, and restart the processes that failed with Machine A. From an Oracle database perspective, the entire set of events is identical to an instance crash followed by an instance recovery.