Friday, March 18, 2011

Oracle Database Switch over and failover steps

Switchover and Failover for Oracle RAC Database Using Physical Standby

  1. Switchover
1)      This is for:
a.       Scheduled and unscheduled maintenance at primary site (e.g., OS patching, database patching).
b.      A primary database that is running normally --- the switchover operation must start from the primary site.
c.       Application DR Tests --- switchover can be used for a DR test that tests non-database parts (Failover is the right procedure for database DR test).
2)      Steps
a.       Shutdown all primary instances except one.
b.      Shutdown all standby instances except one.
c.       On the primary site
                                                                                 i.            SQL> select switchover_status from V$database;
                                                                                ii.            If switchover_status shows ‘sessions active’ go to step iii. If switchover_status shows ‘TO_STANDBY’ go to step v.
                                                                              iii.            SQL> select sid, process, program from v$session where type='user' and sid <>(select distinct sid from V$mystat);
                                                                              iv.            If there are rows returned, contact customers and ask if the proceses can be killed. If yes go to step v. Otherwise stop here until they log out.
                                                                               v.            SQL> alter database commit to switchover to physical standby with session shutdown;
                                                                              vi.            SQL>shutdown immediate;
                                                                            vii.            SQL>exit;
                                                                           viii.            Restart the instance and mount it as standby
         SQL> startup nomount;
         SQL> alter database mount standby database;
                                                                               ix.            Start the recovery
         SQL> alter database recover managed standby database disconnect;
d.      On the DR site.
                                                                                 i.            SQL>alter database recover managed standby database cancel;
                                                                                ii.            SQL>alter database recover managed standby database nodelay disconnect from session through last switchover; --- this is needed when a time lag used.
                                                                              iii.            SQL> alter database commit to switchover to primary with session shutdown;
                                                                              iv.            SQL> alter database open;
e.       On the original primary site mount the other instances as standby.
f.        On the original DR site start up and open the other instances (as primary).
g.       Verify the new data guard configuration
                                                                                 i.            on new primary
SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;
                                                                               ii.            on new standby --- make sure the log file is copied over and applied or will be applied (if time lag in apply is used).


  1. Failover
1)      This is for:
a.       A situation when primary database is completely gone and can’t be recovered within a time window as defined by business units.
b.      A real DR test --- failover operation is performed completely on DR site. There is no dependency on DR site.
c.       Repairing or trouble shooting a database due to human mistakes or data corruption --- time lag has to be used in this case.  
d.      Documentum is currently using 4 hr time lag.
2)      Steps
a.       Identify and resolve any gaps in the archived redo logfiles.
                                                                                 i.            SQL>  select thread#, low_sequence#, high_ sequence# from v$archive_gap
                                                                                ii.            SQL>  alter database register physical logfile ‘2_820_657623969.log’
b.      Identify, copy and register any other missing archived redo logfiles.
                                                                                 i.            SQL>  select unique thread# as thread, max(sequence#) over (partition by thread#) as last from v$archived_log;
                                                                                ii.            SQL>  alter database register physical logfile ‘1_820_657623970.log’
c.       We should use log writer (lgwr) and standby redo for log transfer, which will help avoid steps a and b.
d.      On DR site shutdown all the standby instances except one.
e.       On DR site initiate a failover.
                                                                                 i.            SQL> alter database recover managed standby database finish force;
                                                                                ii.            SQL> alter database commit to switchover to primary;
                                                                              iii.            SQL> alter database open;
                                                                              iv.            SQL> shutdown immediate;
                                                                               v.            SQL> startup;
f.    Failover with a time lag and without left logs applied. This is only for repairing or trouble shooting a database due to human mistakes or data corruption.
                                                                                 i.            SQL> alter database activate physical standby database;
                                                                                ii.            SQL> shutdown immediate;
                                                                              iii.            SQL> startup;
g.       Startup and open other instances on DR site.
h.   Rebuild physical standby on the original primary site after it comes ba

4 comments:

  1. Hi friends....plzs help me,i'm newer to oracle dba,
    i have struggle in DR primary and standby by configuration after the script,
    RMAN> run {
    2> allocate channel prmy1 type disk;
    3> allocate channel prmy2 type disk;
    4> allocate channel prmy3 type disk;
    5> allocate channel prmy4 type disk;
    6> allocate auxiliary channel stby type disk;
    7> duplicate target database for standby from active database
    8> spfile
    9> parameter_value_convert 'pc00prmy','pc00sby1'
    10> set db_unique_name='pc00sby1'
    11> set db_file_name_convert='/pc00prmy/','/pc00sby1/'
    12> set log_file_name_convert='/pc00prmy/','/pc00sby1/'
    13> set control_files='/u01/app/oracle/oradata/pc00sby1.ctl'
    14> set log_archive_max_processes='5'
    15> set fal_client='pc00sby1'
    16> set fal_server='pc00prmy'
    17> set standby_file_management='AUTO'
    18> set log_archive_config='dg_config=(pc00prmy,pc00sby1)'
    19> set log_archive_dest_1='service=pc00prmy ASYNC
    20> valid_for=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=pc00prmy'
    21> ;
    22> }

    After this step the standby database s not shutdown and start clone in dr configuration.

    ReplyDelete
  2. Thank You Guys :-) lovely see your comments and the steps worked :-)

    ReplyDelete