Multimaster Replication :How to resolve Replication Errors

Posted by Sagar Patil

While using multimaster replication often you would be bombarded with errors like “No data found”, “Unique Key Violated” etc .

These are purely data mismatch errors and a user intervention needed to resolve them.

I often come across these errors when we run applications at multiple sites which are part of same replication group/tables when they are designed to run at one replication site at any given time.

You can get list of these errors at DEFERROER view with transaction ids but there is no easy way to locate the parameters,columns and tables on which these errors occurred.

Thankfully there is a solution if you have EM installed.

Read more…

API to Retrieve runtime Replication Parameters

Posted by Sagar Patil

We often get thousands of replication conflits/errors at current system. A resolution is generally looking at EM console and resolving each error manually.  Please see examples below.

This approach is working well until we have less than some thousands error x’actions at DEFERROR. Anything above that is causing a performance hit. Also DEFERROR /DEFTRANS get highly fragmented adding to the current problem. A solution would be writing a PLSQL API to retrieve runtime parameters as and when needed than using EM console to retrieve millions of rows.

SELECT argcount, schemaname, packagename, procname
FROM defcall WHERE  deferred_tran_id in
(select  deferred_tran_id from deftran where rownum <100)

ARGCOUNT	SCHEMANAME	PACKAGENAME	PROCNAME
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
17	CVD_SYSTEM	VOUCHER	REP_UPDATE
9	CVD_SYSTEM	VOUCHER_EVENT	REP_INSERT
SELECT e.destination, e.deferred_tran_id, e.origin_tran_db, e.origin_tran_id,
  e.start_time, count(e.callno)
FROM
 sys.deferror e, sys.defcall c WHERE e.deferred_tran_id = c.deferred_tran_id
  GROUP BY e.destination, e.deferred_tran_id, e.origin_tran_db,
  e.origin_tran_id, e.start_time ORDER BY 1
DESTINATION	DEFERRED_TRAN_ID	ORIGIN_TRAN_DB	ORIGIN_TRAN_ID	START_TIME	COUNT(E.CALLNO)CPI_O5CA.O2	105.45.83730	CPI_O5CB.O2	120.23.60368	09/11/2008 20:24:50	3
CPI_O5CA.O2	107.12.84097	CPI_O5CB.O2	121.41.62034	09/11/2008 20:24:51	1
CPI_O5CA.O2	107.13.84100	CPI_O5CB.O2	123.47.50472	09/11/2008 20:24:52	3
CPI_O5CA.O2	108.1.83596	CPI_O5CB.O2	123.42.50472	09/11/2008 20:24:52	1
SELECT error_msg FROM deferror
    WHERE deferred_tran_id = '107.12.84097' AND callno = 2

ERROR_MSG
ORA-01403: no data found
SELECT t.deferred_tran_id, d.dblink, t.start_time, count(c.callno)
               FROM sys.deftran t, sys.defcall c, sys.deftrandest d
            WHERE d.deferred_tran_id = t.deferred_tran_id
           AND c.deferred_tran_id = t.deferred_tran_id
          GROUP BY t.deferred_tran_id, t.start_time, d.dblink ORDER BY 3 desc
Find out the Argument Count and Call number :
select deferred_tran_id, callno, argcount, procname, packagename, schemaname
             from defcall;
Substitute Transaction ID, Argument Number and Call Number to locate parameter for that Call 

DECLARE
       atype       INTEGER;
       csform          NUMBER;
       arg_no      NUMBER;
       callno      NUMBER;
       tid         VARCHAR2(22);
       outval      VARCHAR2(2000);
BEGIN
    arg_no:=1;
    callno:='0';
    tid:='105.45.83730';
    atype:=dbms_defer_query.get_arg_type(callno, arg_no, tid);
    csform:=dbms_defer_query.get_arg_form(callno, arg_no, tid);
    IF atype = 2 THEN
      outval:=to_char(dbms_defer_query.get_number_arg(callno, arg_no, tid));
    ELSIF atype = 96 and
          csform = 1 THEN
      outval:=dbms_defer_query.get_char_arg(callno, arg_no, tid);
    ELSIF atype = 96 and
          csform = 2 THEN
      outval:=translate(dbms_defer_query.get_nchar_arg(callno, arg_no, tid) USING CHAR_CS);
    ELSIF atype= 1 and
          csform = 1 THEN
      outval:=dbms_defer_query.get_varchar2_arg(callno, arg_no, tid);
    ELSIF atype= 1 and
              csform = 2 THEN
      outval:=translate(dbms_defer_query.get_nvarchar2_arg(callno, arg_no, tid) USING CHAR_CS);
    ELSIF atype = 12 THEN
         outval:=to_char(dbms_defer_query.get_date_arg(callno, arg_no, tid), 'DD-MON-YYYY HH24:MI:SS');
    ELSIF atype = 23 THEN
         outval:=rawtohex(dbms_defer_query.get_raw_arg(callno, arg_no, tid));
    ELSIF atype = 11 THEN
         outval:=dbms_defer_query.get_rowid_arg(callno, arg_no, tid);
    ELSE
         outval:='-Binary Value-';
      -- RAISE NO_DATA_FOUND;
    END IF;
    dbms_output.put_line(outval);
END;

Rectify Differences Between Replicated Tables

Posted by Sagar Patil

Using the DIFFERENCES Procedure

The DIFFERENCES procedure compares two replicas of a table, and determines all rows in the first replica that are not in the second and all rows in the second that are not in the first. The output of this procedure is stored in two user-created tables. The first table stores the values of the missing rows, and the second table is used to indicate which site contains each row.

The RECTIFY procedure uses the information generated by the DIFFERENCES procedure to rectify the two tables. Any rows found in the first table and not in the second are inserted into the second table. Any rows found in the second table and not in the first are deleted from the second table.

To restore equivalency between all copies of a replicated table, complete the following steps:

1. Select one copy of the table to be the “reference” table. This copy will be used to update all other replicas of the table as needed.

2. Determine if it is necessary to check all rows and columns in the table for differences, or only a subset.

For example, it may not be necessary to check rows that have not been updated since the last time that you checked for differences. Although it is not necessary to check all columns, your column list must include all columns that make up the primary key (or that you designated as a substitute identity key) for the table.

3. After determining which columns you will be checking in the table, create two tables to hold the results of the comparison.

You must create one table that can hold the data for the columns being compared. For example, if you decide to compare the employee_id, salary, and department_id columns of the employees table, then your CREATE statement would need to be similar to the following:

CREATE TABLE hr.missing_rows_data (
  employee_id     NUMBER(6),
  salary          NUMBER(8,2),
  department_id   NUMBER(4));

You must also create a table that indicates where the row is found. This table must contain three columns with the datatypes shown in the following example:

CREATE TABLE hr.missing_rows_location (
  present     VARCHAR2(128),
  absent      VARCHAR2(128),
  r_id        ROWID);

4. Suspend replication activity for the replication group containing the tables that you want to compare. Although suspending replication activity for the group is not a requirement, rectifying tables that were not quiesced first can result in inconsistencies in your data.

BEGIN
   DBMS_REPCAT.SUSPEND_MASTER_ACTIVITY (
      gname => 'hr_repg');
END;
/

5. At the site containing the “reference” table, call the DIFFERENCES procedure in the DBMS_RECTIFIER_DIFF package.

For example, if you wanted to compare the employees tables at the New York and San Francisco sites, then your procedure call would look similar to the following:

BEGIN
   DBMS_RECTIFIER_DIFF.DIFFERENCES (
      sname1              =>   'hr',
      oname1              =>   'employees',
      reference_site      =>   'ny.world',
      sname2              =>   'hr',
      oname2              =>   'employees',
      comparison_site     =>   'sf.world',
      where_clause        =>   '',
      column_list         =>   'employee_id,salary,department_id',
      missing_rows_sname  =>   'hr',
      missing_rows_oname1 =>   'missing_rows_data',
      missing_rows_oname2 =>   'missing_rows_location',
      missing_rows_site   =>   'ny.world',
      max_missing         =>    500,
      commit_rows         =>    50);
END;

6. Rectify the table at the “comparison” site to be equivalent to the table at the “reference” site by calling the RECTIFY procedure in the as shown in the DBMS_RECTIFIER_DIFF package following example:

BEGIN
   DBMS_RECTIFIER_DIFF.RECTIFY (
      sname1              =>   'hr',
      oname1              =>   'employees',
      reference_site      =>   'ny.world',
      sname2              =>   'hr',
      oname2              =>   'employees',
      comparison_site     =>   'sf.world',
      column_list         =>   'employee_id,salary,department_id',
      missing_rows_sname  =>   'hr',
      missing_rows_oname1 =>   'missing_rows_data',
      missing_rows_oname2 =>   'missing_rows_location',
      missing_rows_site   =>   'ny.world',
      commit_rows         =>    50);
END;
/

The RECTIFY procedure temporarily disables replication at the “comparison” site while it performs the necessary insertions and deletions, as you would not want to propagate these changes. RECTIFY first performs all of the necessary DELETE operations and then performs all of the INSERT operations. This ensures that there are no violations of a PRIMARY KEY constraint.

After you have successfully executed the RECTIFY procedure, your missing rows tables should be empty.

Listing General Information About the Error Transactions at a Replication Site

Posted by Sagar Patil

Find out total number of error X’actions

SELECT   COUNT ( * )
FROM   (SELECT   DEFERRED_TRAN_ID,
ORIGIN_TRAN_DB,
DESTINATION,
TO_CHAR (START_TIME, ‘DD-Mon-YYYY hh24:mi:ss’)
TIME_OF_ERROR,
ERROR_NUMBER,
ERROR_MSG
FROM   DEFERROR);

Locate Earliest date for the Error X’actions

SELECT   TO_CHAR (MAX (START_TIME), ‘DD-Mon-YYYY hh24:mi:ss’) TIME_OF_ERROR
FROM   DEFERROR;

Locate Latest date for the Error X’actions

SELECT   TO_CHAR (MIN (START_TIME), ‘DD-Mon-YYYY hh24:mi:ss’) TIME_OF_ERROR
FROM   DEFERROR;

Locate details of Error X’actions

COLUMN DBLINK HEADING ‘SERVER NAME’ FORMAT AAAAAA
COLUMN AVG_LATENCY  HEADING ‘Average Latency’ FORMAT 9999999
COLUMN DEFERRED_TRAN_ID HEADING ‘Deferred|Transaction|ID’ FORMAT A13
COLUMN ORIGIN_TRAN_DB HEADING ‘Origin|Database’ FORMAT A20
COLUMN DESTINATION HEADING ‘Destination|Database’ FORMAT A20
COLUMN TIME_OF_ERROR HEADING ‘Time of|Error’ FORMAT A12
COLUMN ERROR_NUMBER HEADING ‘Oracle|Error|Number’ FORMAT 999999
COLUMN ERROR_MSG HEADING ‘Oracle|Error|Message’ FORMAT A35
COLUMN RECEIVER HEADING ‘Receiver’ FORMAT A25

SELECT   DEFERRED_TRAN_ID,
ORIGIN_TRAN_DB,
DESTINATION,
TO_CHAR (START_TIME, ‘DD-Mon-YYYY hh24:mi:ss’) TIME_OF_ERROR,
ERROR_NUMBER,
ERROR_MSG
FROM   DEFERROR
ORDER BY   START_TIME;

Deferred                                                              Oracle Oracle
Transaction   Origin               Destination          Time of        Error Error
ID            Database             Database             Error         Number Message
————- ——————– ——————– ———— ——- ———————————–
7.26.5257664  CPI_L1_B.O2          CPI_L1_A.O2          30-Jul-2008     1403 ORA-01403: no data found
04:39:37

4.9.5280783   CPI_L1_B.O2          CPI_L1_A.O2          30-Jul-2008     1403 ORA-01403: no data found
04:39:46

4.19.5280577  CPI_L1_B.O2          CPI_L1_A.O2          30-Jul-2008     1403 ORA-01403: no data found

Delete Errors (Transaction Queue, Errors)

Posted by Sagar Patil

This call will delete all Errors from replication Queue i.e deferror & deftran

BEGIN
DBMS_DEFER_SYS.DELETE_ERROR( null, null);
END;
CPI_O1>>select count(*) from deferror;
COUNT(*)
----------
71000
CPI_O1>>BEGIN DBMS_DEFER_SYS.DELETE_ERROR( null, null); END;
PL/SQL procedure successfully completed.
CPI_O1VA>>commit;
Commit complete.
CPI_O1>>select count(*) from deferror;
COUNT(*)
----------
0

Also look at this slightly more complicated example. This example will first try to re-apply the error before deleting it:

spool apply_errors.sql
select 'exec dbms_defer_sys.execute_error(''' || deferred_tran_id ||''','''||
destination || ''')' from deferror;
spool off
@apply_errors commit;

spool delete_errors.sql
select 'exec dbms_defer_sys.delete_error(''' || deferred_tran_id ||''','''||
destination || ''')' from deferror;
spool off
@delete_errors commit;

Here’s how to delete all errors having XYZ.COM as a destination:

BEGIN
DBMS_DEFER_SYS.DELETE_ERROR(null, 'XYZ.COM' );
END;

Above API can only delete records. If you had built up a massive replication queue then Deftran, DefError, DefTranDest would grow to enormous size. Even after deleting all records a simple query “select count(*) from Deftrans, DefError£ can takes hours to return results. Under this case , please run following truncate to release space back to system.

truncate table SYSTEM.def$_aqerror

truncate table SYSTEM.def$_aqcall

select dblink,last_txn_count,last_error_number,total_txn_count,total_round_trips,
       total_bytes_received,total_bytes_sent,avg_throughput
       from DEFSCHEDULE where DBLINK like '&1'
DBLINK   LAST_TXN_COUNT    LAST_ERROR_NUMBER    TOTAL_TXN_COUNT    TOTAL_ROUND_TRIPS    TOTAL_BYTES_RECEIVED    TOTAL_BYTES_SENT    AVG_THROUGHPUT
TEST_O5VA    0    0    7779752    1032081    162998568    266832988    13.0574377737869
TEST_O5VC    0    0    7779798    292695    51866294    117581934    10.174205757076

BEGIN  DBMS_DEFER_SYS.CLEAR_PROP_STATISTICS (dblink => ‘TEST_O5VA’); END;

BEGIN  DBMS_DEFER_SYS.CLEAR_PROP_STATISTICS (dblink => ‘TEST_O5VB’); END;

DBLINK    LAST_TXN_COUNT    LAST_ERROR_NUMBER    TOTAL_TXN_COUNT    TOTAL_ROUND_TRIPS    TOTAL_BYTES_RECEIVED    TOTAL_BYTES_SENT    AVG_THROUGHPUT
TEST_O5VA    0    0    0    0    0    0    0
TEST_O5VC    0        0    0    0    0

ORA-02055: distributed update operation failed; rollback required

Posted by Sagar Patil

Problem

SQL> BEGIN
DBMS_REPCAT.ADD_MASTER_DATABASE (
gname => 'scott_repg',
master => 'repb',
use_existing_objects => TRUE,
copy_rows => TRUE,
propagation_mode => 'ASYNCHRONOUS');
END;
/
BEGIN
*
ERROR at line 1:
ORA-02055: distributed update operation failed; rollback required
ORA-00001: unique constraint (SYSTEM.REPCAT$_REPCAT_PRIMARY) violated
ORA-06512: at "SYS.DBMS_REPCAT_UTL", line 2649
ORA-06512: at "SYS.DBMS_REPCAT_RPC", line 313
ORA-06512: at "SYS.DBMS_REPCAT_MAS", line 223
ORA-06512: at "SYS.DBMS_REPCAT_MAS", line 2248
ORA-06512: at "SYS.DBMS_REPCAT", line 146
ORA-06512: at line 2

Solution :

  1. Delete rows from SYSTEM.REPCAT$_REPCAT view.
  2. Stop push job “dbms_defer_sys.push(destination=>’REPA’)”
  3. “dbms_defer_sys.push(destination=>’REPB’)” else you will receive lock errors.
  4. ReEnable the jobs when done.

Replication Error "QUISCED" Group. How to use DBMS_DEFER_SYS.EXECUTE_ERROR & DBMS_DEFER_SYS.DELETE_ERROR

Posted by Sagar Patil

Delete all Errors from replication sites
Check if there are any errors "select * from deferror;"

select sname, master, status from sys.dba_repcat; - make sure the status is not QUISCED
select id, TIMESTAMP, Master Status from dba_repcatlog

Now if there are any values returned then use following API and replace ID in parameter with values returned above.
repA> exec dbms_repcat.purge_master_log(&ID,’repa’,’scott_repg’);
repB> exec dbms_repcat.purge_master_log(&ID,’repa’,’scott_repg’);

Errors in administration tasks, such as adding groups, are found in DBA_REPCATLOG. This view also holds admin tasks that are still pending. For example:

SELECT ONAME,
REQUEST,
MESSAGE
FROM dba_repcatlog
WHERE status = ‘ERROR’

Once transactions are submitted, any errors are logged in the DEFERROR view. There is a procedure to reapply the error, DBMS_DEFER_SYS.EXECUTE_ERROR. Or you can delete the error using DBMS_DEFER_SYS.DELETE_ERROR. As a last resort, you can remove errors by deleting entries from the table SYSTEM.DEF$_ERROR.

The EXECUTE_ERROR procedure forces execution of a transaction that originally failed , leaving a record in DEFERROR. You might call this procedure if you have repaired the error (for example, a conflict in the advanced replication option) and you now wish to re-attempt the transaction. If another error occurs during EXECUTE_ERROR, the attempt is aborted and the last error encountered is returned as an exception. Upon successful completion, the procedure deletes the entries from the DEFERROR data dictionary view. If the original call has been applied to all other destinations, then the procedure also removes the entries from DEFCALL and DEFTRAN.

As with the DELETE_ERROR and DELETE_TRAN procedures, you may pass NULLs to indicate wildcards.

DBMS_DEFER_SYS.EXECUTE_ERROR (
deferred_tran_id IN VARCHAR2,
destination IN VARCHAR2)

deferred_tran_id
The ID number from the DefError view of the deferred transaction that you want to re-execute. If this parameter is null, all transactions queued for DESTINATION that originated from the DEFERRED_TRAN_DB are re-executed.

destination
The fully qualified database name from the DefError view of the database to which the transaction was originally queued. This parameter must not be null.

Restrictions
Note the following restrictions on calling EXECUTE_ERROR:
The destination parameter may not be NULL.
The deferred_tran_id and deferred_tran_db parameters must either both be NULL or both be NOT NULL. If they are NULL, all transactions in DEFERROR destined for destination are applied.

To delete a transaction from the DefError view . If there are not other DefTranDest or DefError entries for the transaction, the transaction is deleted from the DefTran and DefCall views as well.

DBMS_DEFER_SYS.DELETE_ERROR(
deferred_tran_id IN VARCHAR2,
destination IN VARCHAR2)

deferred_tran_id
The ID number from the DefError view of the deferred transaction that you want to remove from the DefError view. If this parameter is null, all transaction meeting the requirements of the other parameters are removed.

destination
The fully qualified database name from the DefError view of the database to which the transaction was originally queued. If this parameter is null, all transactions meeting the requirements of the other parameters are removed from the DefError view.

Top of Page

Top menu