EM grid console active only at RAC 1 Instance
Case 1 : EM console is working at Node 1. Node 1 is shutdown , services failover to Node 2 but oemctl doesn’t failover to Node 2
[oracle@wygora02 ~]$ showcrs HA Resource Target State ----------- ------ ----- ora.wygora01.ASM1.asm ONLINE OFFLINE ora.wygora01.LISTENER_WYGORA01.lsnr ONLINE OFFLINE ora.wygora01.gsd ONLINE OFFLINE ora.wygora01.ons ONLINE OFFLINE ora.wygora01.vip ONLINE OFFLINE ora.wygora02.ASM2.asm ONLINE ONLINE on wygora02 ora.wygora02.LISTENER_WYGORA02.lsnr ONLINE ONLINE on wygora02 ora.wygora02.gsd ONLINE UNKNOWN on wygora02 ora.wygora02.ons ONLINE UNKNOWN on wygora02 ora.wygora02.vip ONLINE ONLINE on wygora02 ora.wygprod.db ONLINE ONLINE on wygora02 ora.wygprod.wygprod.cs ONLINE ONLINE on wygora02 ora.wygprod.wygprod.wygprod1.srv ONLINE OFFLINE ora.wygprod.wygprod.wygprod2.srv ONLINE ONLINE on wygora02 ora.wygprod.wygprod1.inst OFFLINE OFFLINE ora.wygprod.wygprod2.inst ONLINE ONLINE on wygora02
emctl start dbconsole Z set to GB-Eire racle Enterprise Manager 10g Database Control Release 10.2.0.1.0 opyright (c) 1996, 2005 Oracle Corporation. All rights reserved. ttp://wygora01.wyg-asp.com:1158/em/console/aboutApplication gent Version : 10.1.0.4.1 MS Version : 10.1.0.4.0 rotocol Version : 10.1.0.2.0 gent Home : /u01/app/oracle/product/10.2.0/db_1/wygora02_wygprod2 gent binaries : /u01/app/oracle/product/10.2.0/db_1 gent Process ID : 26599 arent Process ID : 26554 gent URL : http://wygora02.wyg-asp.com:3938/emd/main tarted at : 2008-03-13 15:58:50 tarted by user : oracle ast Reload : 2008-03-13 15:58:50 ast successful upload : 2008-03-13 16:43:03 ast attempted upload : 2008-03-13 16:44:54 otal Megabytes of XML files uploaded so far : 6.40 umber of XML files pending upload : 1 ize of XML files pending upload(MB) : 0.00 vailable disk space on upload filesystem : 65.82% gent is already started. Will restart the agent his will stop the Oracle Enterprise Manager 10g Database Control process. Continue [y/n] :y topping Oracle Enterprise Manager 10g Database Control ... ... Stopped. gent is not running. tarting Oracle Enterprise Manager 10g Database Control ..... started. ----------------------------------------------------------------- ogs are generated in directory /u01/app/oracle/product/10.2.0/db_1/wygora02_wygprod2/sysman/log
[oracle@wygora01 ~]$ showcrs HA Resource Target State ----------- ------ ----- ora.wygprod.db ONLINE ONLINE on wygora01 ora.wygprod.wygprod.cs ONLINE ONLINE on wygora02 ora.wygprod.wygprod.wygprod1.srv ONLINE OFFLINE ora.wygprod.wygprod.wygprod2.srv ONLINE ONLINE on wygora02 ora.wygprod.wygprod1.inst OFFLINE OFFLINE ora.wygprod.wygprod2.inst ONLINE ONLINE on wygora02
How to recover from a Loss of Voting Disk
Loss of Voting Disk
Check where voting disks are located using “crsctl check crs”
Backup of voting disk : dd if=/dev/raw/votingdisk of=/vmasmtest/BACKUP/VOTING/votingdisk_06_may_07
dd: reading `/dev/raw/votingdisk’: No such device or address
305172+0 records in
305172+0 records out
[root@vmractest1 VOTING]# ls -l
total 152744
-rw-r–r– 1 oracle dba 156248064 May 6 16:40 votingdisk_06_may_07
Delete voting disks using rm command
Check RAC status “crs_stat -t”
Look into alrtlog messages at both instances and both Instance should show instance terminated.
Check available backups
Restore Voting Disk
Restore Voting Disk dd if=/vmasmtest/BACKUP/VOTING/votingdisk_06_may_07 of=/dev/raw/votingdisk
305172+0 records in
305172+0 records out
Restart CRS /etc/init.d/init.crs start
Check and Restart all Cluster components
./crsctl check crs
./crsctl query css votedisk
./crsctl start resources
Login into database & see everything is OK
TAF Failover Configuration and Testing
Configure the service on RAC servers for a failover
TNS Client side config
PROD =
(DESCRIPTION =
(enable=broken)
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = oravip01.oracledbasupport.com)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = oravip02.oracledbasupport.com)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = prod)
(failover_mode=(type=select)(method=basic))
)
)
Let’s test a Failover – Connect to an Oracle Instance 1 or 2
oracle@ora02 ~]$ showcrs
HA Resource Target State
———– —— —–
ora.ora01.ASM1.asm ONLINE ONLINE on ora01
ora.ora01.LISTENER_ora01.lsnr ONLINE ONLINE on ora01
ora.ora01.gsd ONLINE UNKNOWN on ora01
ora.ora01.ons ONLINE UNKNOWN on ora01
ora.ora01.vip ONLINE ONLINE on ora01
ora.ora02.ASM2.asm ONLINE ONLINE on ora02
ora.ora02.LISTENER_ora02.lsnr ONLINE ONLINE on ora02
ora.ora02.gsd ONLINE UNKNOWN on ora02
ora.ora02.ons ONLINE UNKNOWN on ora02
ora.ora02.vip ONLINE ONLINE on ora02
ora.prod.db ONLINE ONLINE on ora01
ora.prod.prod.cs ONLINE ONLINE on ora02
ora.prod.prod.prod1.srv ONLINE ONLINE on ora01
ora.prod.prod.prod2.srv ONLINE ONLINE on ora02
ora.prod.prod1.inst ONLINE ONLINE on ora01
ora.prod.prod2.inst ONLINE ONLINE on ora02SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
prod2[oracle@ora02 ~]$ crs_stop ora.prod.prod2.inst
Attempting to stop `ora.prod.prod2.inst` on member `ora02`
Stop of `ora.prod.prod2.inst` on member `ora02` succeeded.
At this stage the connections are diverted to prod1 instance.SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
prod1[oracle@ora02 ~]$ showcrs
HA Resource Target State
———– —— —–
ora.ora01.ASM1.asm ONLINE ONLINE on ora01
ora.ora01.LISTENER_ora01.lsnr ONLINE ONLINE on ora01
ora.ora01.gsd ONLINE UNKNOWN on ora01
ora.ora01.ons ONLINE UNKNOWN on ora01
ora.ora01.vip ONLINE ONLINE on ora01
ora.ora02.ASM2.asm ONLINE ONLINE on ora02
ora.ora02.LISTENER_ora02.lsnr ONLINE ONLINE on ora02
ora.ora02.gsd ONLINE UNKNOWN on ora02
ora.ora02.ons ONLINE UNKNOWN on ora02
ora.ora02.vip ONLINE ONLINE on ora01
ora.prod.db ONLINE ONLINE on ora01
ora.prod.prod.cs ONLINE ONLINE on ora02
ora.prod.prod.prod1.srv ONLINE ONLINE on ora01
ora.prod.prod.prod2.srv ONLINE OFFLINE
ora.prod.prod1.inst ONLINE ONLINE on ora01
ora.prod.prod2.inst OFFLINE OFFLINE[oracle@ora02 ~]$ crs_start ora.prod.prod2.inst
Attempting to start `ora.prod.prod2.inst` on member `ora02`
Start of `ora.prod.prod2.inst` on member `ora02` succeeded.
What happens if Server is restarted?
I am connected to prod2 instance and a reboot migrates my connection to prod1 automatically.
SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
prod2SQL> select count(*) from
(select * from dba_source union select * from dba_source union select * from dba_source union select * from dba_source union select * from dba_source)
COUNT(*)
———-
292465SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
prod1
Let’s see how a RAC Load balancing works? Write a small sql test Script (verify.sql) like below
REM the following query is for TAF connection verification col sid format 999 col serial# format 9999999 col failover_type format a13 col failover_method format a15 col failed_over format a11 SELECT sid, serial#, failover_type, failover_method, failed_over FROM v$session WHERE username = 'SU'; REM the following query is for load balancing verification SELECT instance_name FROM v$instance; exit REM We can also combine two queries: col inst_id format 999 col sid format 999 col serial# format 9999999 col failover_type format a13 col failover_method format a15 col failed_over format a11 SELECT inst_id, sid, serial#, failover_type, failover_method, failed_over FROM gv$session WHERE username = 'SU'; REM a simple select to see the distribution of users when testing connection : load balancing SELECT inst_id, COUNT ( * ) FROM gv$session GROUP BY inst_id;
Write loop.sh file to make number SQL connections. Please copy and paste at least 100 entries of line below. Oracle Listener will load balance connections by diverting new connections to least loaded oracle RAC instance.
nohup sqlplus system/0ra01@failover @verify.sql &
sleep 1
nohup sqlplus system/0ra01@failover @verify.sql &
sleep 1
nohup sqlplus system/0ra01@failover @verify.sql &
sleep 1
nohup sqlplus system/0ra01@failover @verify.sql &
sleep 1
Run loop.sh and note down connections shared between RAC 1 & RAC 2 nodes
[oracle@ora01 scripts]$ grep prod2 nohup.out | wc -l
35
[oracle@ora01 scripts]$ grep prod1 nohup.out | wc -l
41