Vadi Oracle DBA: Log Shipping from Primary to Standby not working

Oracle ADG on 11.2.0.4 version went to hang state with the lots of below messages in the alert log on both primary and ADG instances.

WARN: ARC5: Terminating pid 31981640 hung on an I/O operation
Thu Feb 20 05:00:38 2014
Killing 1 processes with pids 66912506 (Process by index) in order to remove hung processes. Requested by OS process 19267810

Number of archiver generation is normal at the primary site and database load also normal at the primary site.

Problem we have observed archivers are not getting transferred from primary to ADG.

Even shut immediate has no progress and not even able to collect the hang analysis for the db.

When we analysized, messages in the alert log realted to kill process are the "ARC" prceess.

This kind of a problem usually occurs after OS or network errors, or restarting the primary or standby instance or reboot the primary or standby node that abruptly crashes log shipping between the primary and standby

Cause for this problem:

ARCx processes on the primary stuck on the network forever or that are responsible to update the APPLIED column get stuck and can not recover themselves.
Additionally these processes that may be used for local and remote archiving, heartbeat and FAL fetching logs on the primary.
So when they are all stuck and reach the maximum number of values specified in log_archive_max_processes, they can cause ambiguous errors as shown above.
The worst case would be all ARCx processes on the primary are stuck and they couldn't do local archiving, so that all online redo log files are full which causes the primary database hangs.
This may be due to standby db crash, network errors or some abrupt outage on the standby or primary.

The other common cause is the firewall.

Solution:

ARCx processes on the primary need to be restarted.

Assuming that log transport from the primary is configured by log_archive_dest_2.

Please perform the following:

1) If the Data Guard Broker is running, disable Data Guard Broker on both primary and standby:

SQL> alter system set dg_broker_start=FALSE;

2) On the Primary Database:

- Set log transport state to DEFER status:
SQL> alter system set log_archive_dest_state_2='defer';
SQL> alter system switch logfile;
- Reset log_archive_dest_2
SQL> show parameter log_archive_dest_2
SQL> alter system set log_archive_dest_2 = '........';
- Switch logfiles on the Primary
SQL> alter system switch logfile;

3) On the Standby Database:

- Cancel Managed Recovery
SQL> alter database recover managed standby database cancel;
- Shutdown the Standby Database
SQL> shutdown immediate

4) On the Primary: kill the ARCx Processes and the Database will respawn them automatically immediately without harming it.

ps -ef | grep -i arc
kill -9 <ospid of ARC process> <another ospid of ARC process> ...

5) On standby server, startup Standby Database and resume Managed Recovery

SQL> startup mount;
SQL> alter database recover managed standby database [using current logfile] disconnect;

6) Re-enable Log Transport Services on the Primary:

SQL> alter system set log_archive_dest_state_2='enable' ;

At this point all the ARCx processes should be up and running on the Primary.

7) Re-enable the Data Guard Broker for both, Primary and Standby if applicable:

SQL> alter system set dg_broker_start=true;

8) Please work with your Network Administrator to make sure the following Firewall Features are disabled.

SQLNet fixup protocol
Deep Packet Inspection (DPI)
SQLNet packet inspection
SQL Fixup
SQL ALG (Juniper firewall)

ORACLE Reference Doc: Logs are not shipped to the physical standby database (Doc ID 1130523.1)

Vadi Oracle DBA

Sunday, 23 February 2014

Log Shipping from Primary to Standby not working - 11.2.0.4

Solution:

No comments:

Post a Comment