Tuesday, September 30, 2014

Find the correct SCN for restore/Recover

The scenario

To describe starting situation, let us suppose this restore/recover scenario (we are talking about incomplete recover).
  1. startup nomount
  2. restore control file
  3. alter database mount
  4. restore database
Next step should be database recover (with or without until clause). But this action finishes with: 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 01/11/2010 12:08:19
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '+DATA/hac/datafile/system.366.708004385'
So we are stacked because recover could not be consistent. The main problem is that restored data files need ALL to be recovered at least to largest SCN number+1.

This situation is explained world wide but main main question is: what is the lowest SCN that can recover database enough to be open? 

Solution-RMAN list backup

First solution is to look in RMAN repository or use control file records to retrieve backup of datafile 1. In a case of RMAN catalog you'll not have too much trouble because you'll find enough information to see backup of wanted datafile and it's SCN. Output may looks like:
RMAN> list backup of datafile 1;
 
List of Backup Sets
===================
 
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
645     Full    127.30M    DISK        00:01:02     06-DEC-09
        BP Key: 645   Status: AVAILABLE  Compressed: YES  Tag: TAG20091206T183018
        Piece Name: +BCKP/hac/backupset/2009_12_06/tag20091206t183018_0.7966.704917821
  List of Datafiles in backup set 645
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  1       Full 35728924   06-DEC-09 +DATA/hac/datafile/system.282.694017165
...
If you are lucky and get result look in column "Ckp SCN" and you'll get missing SCN. Unfortunately this may not help you always.

In nocatalog mode it is more then likely that your command may retrieve no data:

RMAN> list backup of datafile 1;
 
 
RMAN>
No backup! The reason is no backup was taken in time provided with control_file_record_keep_time parameter. 

Solution-RMAN list copy

Next solution is usage of RMAN's "list copy", which shows only backup of "datafile 1", with the same retention as described in previously explained in "List backup" solution. However if you are lucky and get result it is more or less like:
RMAN> list copy of datafile 1;
 
List of Datafile Copies
Key     File S Completion Time Ckp SCN    Ckp Time        Name
------- ---- - --------------- ---------- --------------- ---------------------------------------------
1917    1    A 08-JAN-10       49152352   08-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_nvl2urjr
1867    1    X 07-JAN-10       49125664   07-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_m4l2sii8
1841    1    X 07-JAN-10       49117957   07-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_l7l2sb2u
1813    1    X 18-SEP-09       9643175    18-SEP-09       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_edkpitiv
1666    1    A 17-SEP-09       9174432    17-SEP-09       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_ahkpfgq1

Again, regardless shown several SCNs this might not be enough to help you.

Solution-use x$ (C based) tables

x$ tables are Oracle's core tables based on pure C data and should be handle with additional care. The beauty of them is they are available when database in mounted and when all "dba_*" views are not available as well as some v$ based ones.

To see status of all restored datafiles can be done with this SQL

set linesize 200;
set pagesize 100;
col inst_id for 9999999 heading 'Instance #'
col file_nr for 9999999 heading 'File #'
col file_name for A50 heading 'File name'
col checkpoint_change_nr for 99999999999999 heading 'Checkpoint #'
col checkpoint_change_time for A20 heading 'Checkpoint time'
col last_change_nr for 99999999999999 heading 'Last change #'
SELECT
      fe.inst_id,
      fe.fenum file_nr,
      fn.fnnam file_name,
      TO_NUMBER (fe.fecps) checkpoint_change_nr,
      fe.fecpt checkpoint_change_time,
      fe.fests last_change_nr,
      DECODE (
              fe.fetsn,
              0, DECODE (BITAND (fe.festa, 2), 0, 'SYSOFF''SYSTEM'),
              DECODE (BITAND (fe.festa, 18),
                      0, 'OFFLINE',
                      2, 'ONLINE',
                      'RECOVER')
      ) status
FROM x$kccfe fe,
     x$kccfn fn
WHERE    (   (fe.fepax != 65535 AND fe.fepax != 0 )
          OR (fe.fepax = 65535 OR fe.fepax = 0)
         )
     AND fn.fnfno = fe.fenum
     AND fe.fefnh = fn.fnnum
     AND fe.fedup != 0
     AND fn.fntyp = 4
     AND fn.fnnam IS NOT NULL
     AND BITAND (fn.fnflg, 4) != 4
ORDER BY fe.fenum
;
The output might looks like:
Instance #   File # File name                     CheckpointCheckpoint time      Last change #  STATUS
---------- -------- ----------------------------- ----------- -------------------- -------------- ------
         2        1 /u01/system.312.711642293         891734 02/25/2010 10:24:21   891734         SYSTEM
         2        2 /u01/undotbs1.311.711642299       891734 02/25/2010 10:24:21   891734         ONLINE
         2        3 /u01/sysaux.310.711642299         891734 02/25/2010 10:24:21   891734         ONLINE
         2        4 /u01/undotbs2.307.711642303       891734 02/25/2010 10:24:21   891734         ONLINE
         2        5 /u01/undotbs3.306.711642305       891734 02/25/2010 10:24:21   891734         ONLINE
         2        6 /u01/users.315.711642305          891734 02/25/2010 10:24:21   891734         ONLINE
 
rows selected.
Suppose that real 891734 represents imaginary "xxxxxxxx" number. As you see it represents SCN numbers of each datafile's time when it was last changed. These "xxxxxxxx" numbers are all probably different (opposite in this case). This is tipical representation of inconsistent backup.

So your job is to find record where "File #" column value is equal "1". This is usually first record named like "system.yyy" and check it's last change SCN. And this is the number you are looking for. 

Remember that SCN for recover is always xxxxxxxx+1, for one bigger then one found in list! 

To cover the whole story, for successful recover of any backup set like that, you need to apply archive logs which's SCN's are between min(xxxxxxxx)-max(xxxxxxxx)+1 but you do not need to specify min(xxxxxxxx) because RMAN will know that automatically.

So your restore should use until SCN clause in RMAN script. In pseudo language that looks like:

startup nomount;
restore controlfile;
alter database mount;
 
set until scn xxxxxxxx+1;
 
restore database;
recover database;
 
alter database open resetlogs;

If all SCNs from previous query are the same then (as they are in our case) then you are dealing with consistent (aka cold backup). Cold backup means that all datafiles are having same SCN so practicly they do not need recover but only as declarative term (you do not need any archived log file beside). In a cases like that, your restore RMAN script might looks like:

restore database;
recover database noredo;
alter database open resetlogs;

However, in some consistent backup cases, I remember that I found on some database version a bug as explained at the begginig (ORA-01152). Based on previously shown real numbers, your "full proof" restore script for consistent backup, should look like:

set until scn=891735;
restore database;
recover database noredo;
alter database open resetlogs;
So there is no affraid of use until clause even in restoring consistent backups. This is especially the case in "backup as copy" when you deal with backup outside of ASM storage.

Solution-RMAN restore database preview

To prevent searching for information in restore/recover time, there is elegant method that will tell you which SCN is needed for successful restore.

After regular database backup, use MAN command which will collect and show all important data for this backupset. 

backup database;
restore database preview;
At the end of command shows exact SCN that you need for successful recover. At the end of log file output is like:
Media recovery start SCN is xxxxxxxxx
Recovery must be done beyond SCN xxxxxxxxx to clear data files fuzziness
Finished restore at 17-SEP-09
released channel: t1
released channel: t2
 
RMAN>
 
Recovery Manager complete.
where xxxxxxxxx+1 represent SCN you are searching for.

No comments:

Post a Comment