Read Oracle RMAN 11g Backup and Recovery Online
Authors: Robert Freeman
Oracle created a plug-in for VSS called the Oracle VSS Writer, a separate Windows service that runs independently from the Oracle Database service. The Oracle VSS Writer coordinates the specific activities required to take a VSS copy of the database.
Oracle VSS Writer is capable of making either component-level backups (i.e., file by file, such as datafiles and control files) or full volume backups. When making component-level backups of datafiles, the VSS Writer keeps track of redo generated separately from existing mechanisms, and then, during restore, it applies the redo automatically to the components that were backed up.
When VSS is making a full volume backup, nothing magical is occurring here. A database’s data blocks can still be caught in mid-write, and therefore fuzzy, by the VSS Writer. So the Oracle VSS Writer still does the same things we’ve discussed so far in this chapter: it puts datafiles into hot backup mode for the duration of the datafile backup, so that the archive logs will have full copies of changed blocks to overwrite any fuzzy blocks.
The difference is the level of integration that we are starting to see—as the sync/split technologies offer better interface points for their technologies, as Microsoft has done, it allows Oracle to provide better automation of tasks that otherwise would have to be scripted separately by the system administrator or DBA.
Summary
In this chapter, we covered how a hardware sync and split architecture would impact your backup and recovery solutions. We discussed how to implement sync and split with the Oracle database and how to take RMAN backups from a split mirror copy of the database. Finally, we discussed how to use an existing Oracle RDBMS to implement a software-based sync and split environment.
This page intentionally left blank
CHAPTER
23
RMAN in the Workplace:
Case Studies
532
Part IV: RMAN in the Oracle Ecosystem
e have covered a number of different topics in this book, and we are sure you have figured out that you might face almost an infinite number of recovery combinations.
W
In this chapter, we provide various case studies to help you review your knowledge of backup and recovery (see if you can figure out the solution before you read it).
When you do come across these situations, these case studies may well help you avoid some mistakes that you might otherwise make when trying to recover your database. You can even use these case studies to practice performing recoveries so that you become an RMAN
backup and recovery expert.
Before we get into the case studies, though, the following section provides a quick overview about facing the ultimate disaster, a real-life failure of your database.
Before the Recovery
Disaster strikes. Often, when you are in a recovery situation, everyone is in a big rush to recover the database. Customers are calling, management is panicking, and your boss is looking at you for answers, all of which is making you nervous, wondering if your résumé is up to date. When the real recovery situation occurs, stop. Take a few moments to collect yourself and ask these questions:
1.
What is the exact nature of the failure?
2.
What are the recovery options available to me?
3.
Might I need Oracle support?
4.
Is there anyone who can act as a second pair of eyes for me during this recovery?
Let’s address each of these questions in detail.
What Is the Exact Nature of the Failure?
Here’s some firsthand experience from one of the authors. Back in the days when I was contracting, I was paged one night (on Halloween, no less!) because a server had failed, and once they got the server back up, none of the databases would come up. Before I received the page, the DBAs at this site had spent upward of eight hours trying to restart the 25 databases on that box. Most of the databases would not start. The DBAs had recovered a couple of the seemingly lost databases, yet even those databases still would not open. The DBAs called Oracle, and Oracle seemed unsure as to what the problem was. Finally, the DBAs paged me (while I was out trick-or-treating with my kids).
Within about 20 minutes after arriving at the office, I knew what the answer was. I didn’t find the answer because I was smarter than all the other DBAs there (I wasn’t, in fact). I found the answer for a couple of reasons. First, I approached the problem from a fresh perspective (after eight hours of problem solving, one’s eyes tend to become burned and red!). Second, I looked to find the nature of the failure rather than just assuming the nature of the failure was a corrupted database.
What ended up being the problem, pretty clearly to a fresh pair of eyes, was a set of corrupted Oracle libraries. Once we recovered those libraries, all the databases came up quickly, without a problem. The moral of the story is that when you have a database that has crashed, or that will not open, do not assume that the cause is a corrupted datafile or a bad disk drive. Find out for sure what the problem is by investigative analysis. Good analysis may take a little longer to begin with, but, generally, it will prove valuable in the long run.
Chapter 23: RMAN in the Workplace: Case Studies
533
What Recovery Options Are Available?
Recovery situations can offer a number of solutions. Again, back when I was a consultant, I had a customer who had a disk controller drive fail over a weekend, and the result was the loss of file systems on the box, including files belonging to an Oracle database in ARCHIVELOG mode. The DBA at the customer site went ahead and recovered the entire database (about 150GB), which took, as I recall, a couple of hours.
The following Monday, the DBA and I had a discussion about the recovery method he selected.
The corrupted file systems actually impacted only about five database datafiles (the other file systems contained web server files that we were not concerned with). The total size of the impacted database datafiles was no more than 8 or 10GB. The DBA was pretty upset about having to come into the office and spend several hours recovering the database. When I asked the DBA why he hadn’t just recovered the five datafiles instead of the entire database, he replied that it just had not occurred to him.
The moral of this story is that it’s important to consider your recovery options. The type of recovery you do may make a big difference in how long it takes you to recover your database.
Another moral of this story is to really become a backup and recovery expert. Part of the reason the DBA in this case had not considered datafile recovery, I think, is that he had never done such a recovery. When facing a stressful situation, people tend to not consider options they are not familiar with. So, we strongly suggest you set up a backup and recovery lab and practice recoveries until you can do it in your sleep.
Might Oracle Support Be Needed?
You might well be a backup and recovery expert, but even the experts need help from time to time. This is what Oracle support is there for. Even though I feel like I know something about backup and recovery, I ask myself if the failure looks to be something that I might need Oracle support for. Generally, if the failure is something odd, even if I think I can solve it on my own, I
“prime” support by opening a service request on the problem. That way, if I need help, I have already provided Oracle with the information they need (or at least some initial information) and have them primed to support me should I need it. If you are paying for Oracle support, use it now, don’t wait for later.
Who Can Act as a Second Pair of Eyes During Recovery?
When I’m in a stressful situation, first of all it’s nice to have someone to share the stress with.
Somehow I feel a bit more comfortable when someone is there just to talk things out with.
Further, when you are working on a critical problem, mistakes can be costly. Having a second, experienced pair of eyes there to support you as you recover your database is a great idea!
Recovery Case Studies
Now to the meat of the chapter, the recovery case studies. In this section, we provide you with a number of case studies listed next in the order they appear:
1.
Recovering from complete database loss in NOARCHIVELOG mode with a recovery catalog
2.
Recovering from complete database loss in NOARCHIVELOG mode without a recovery catalog
534
Part IV: RMAN in the Oracle Ecosystem
3.
Recovering from complete database loss in ARCHIVELOG mode without a recovery catalog
4.
Recovering from complete database loss in ARCHIVELOG mode with a recovery catalog
5.
Recovering from the loss of the SYSTEM tablespace
6.
Recovering online from the loss of a datafile or tablespace
7.
Recovering from loss of an unarchived online redo log
8.
Recovering through
resetlogs
9.
Completing a failed duplication manually
10.
Using RMAN duplication to create a historical subset of the target database
11.
Recovering from a lost datafile in ARCHIVELOG mode using an image copy in the flash recovery area
12.
Recovering from running the production datafile out of the flash recovery area
13.
Using Flashback Database and media recovery to pinpoint the exact moment to open the database with
resetlogs
In each of these case studies, we provide you with the following information:
The
■
Scenario
Outlines the environment for you
The
■
Problem
Defines a problem that needs to be solved
The
■
Solution
Outlines the solution for you, including RMAN output solving the
problem
Now, let’s look at our case studies!
Case #1: Recovering from Complete Database Loss
(NOARCHIVELOG Mode) with a Recovery Catalog
The Scenario
Thom is a new DBA at Unfortunate Company. Upon arriving at his new job, he finds that his databases are not backed up at all, and that they are all in NOARCHIVELOG mode. Because Thom’s manager will not shell out the money for additional disk space for archived redo logs, Thom is forced to do offline backups, which he begins doing the first night he is on the job. Thom also has turned on autobackups of his control file and has converted the database so that it is using an SPFILE. Finally, Thom has created a recovery catalog schema in a different database that is on a different database server.
The Problem
Unfortunate Company’s cheap buying practices catch up to it in the few days following Thom’s initial work, when the off-brand (cheap) disks that it has purchased all become corrupted due to a bad controller card. Thom’s database is lost.
Thom’s offline database backup strategy includes tape backups to a local tape drive. Once the hardware problems are solved, the system administrator quickly rebuilds the lost file systems, and Thom quickly gets the Oracle software installed. Now, Thom needs to get the database back up and running immediately.