- Cloning is not cross platform
- Clone usually is done in 2 phases - prepare_clone.pl in source and clone.pl in target
- Clone does take care of inventory and relink so further patches can be applied on the new cloned home
- For Oracle database and CRS, prepare_clone.pl is not required
- ORACLE_HOME and ORACLE_HOME_NAME should be specified when clone.pl is run, in 11g, ORACLE_BASE should be specified also
- If Oracle inventory is not in default location, before running clone.pl, edit cs.properties and add
-ignoreSysPrereqs -invPtrLoc <
oraInst_path> /oraInst.loc - Make sure to run root.sh after clone.pl is completed
- You may need to create oraInst.loc file manually on target if the file does not exists, to do so, just create a file in default location
with inventory_loc=<
oraInventory_path> - The alternative of running clone.pl is to use runInstaller for cloning, the format would be like this :
./runInstaller -clone -silent -ignorePreReq ORACLE_HOME="
" ORACLE_HOME_NAME=" " ORACLE_BASE=" " (in Windows it would be setup.exe) - If the server has more than one perl installed, it is recommended to set PERL5LIB
- If clone is done on target server which has inventory and the cloned ORACLE_HOME has already been used at somepoint,
first ORACLE_HOME needs to be de-attached using this command : ./runInstaller -detachHome ORACLE_HOME=
This is why it is recommended to clone in brand new directory to avoid any issue. - Clone log in target is created in $ORACLE_HOME/clone/logs
Oracle DBA tips
TIP 106# : Quick tips on Cloning Oracle
As Oracle clone is one of the quickest way to create new software binaries with all the patches from a source database and it comes handy specially when Oracle software installation media is not available. In this post, I am trying to remind myself and readers some tips about Oracle clone which worth to know:
TIP 105# : ACFS and bugs
In Oracle Grid Infrastructure 11g Release 2, Oracle extends the capability of ASM by introducing the ASM
Cluster Filesystem, or ACFS. In other words, Oracle ASM 11g Release 2 has capability to the manage all data, including Oracle database files, Oracle Clusterware files and non-structured, general-purpose data such as log files and text files. This sounds amazing, however in my last experience with ACFS on 112034 on Linux 64bit, I found pretty nasty bugs which I think worth sharing.| Bug | Workaround | Bug No |
| Copy( cp) Performance On ACFS | Do
not use ls while cp is running OR Wait for 11204 |
|
| ls -l' and 'find' Commands Slow on ACFS | No workaround - Wait for 11204 | 10418517 |
TIP 104#: Node Eviction in RAC 11gr2 due to temporary network hiccups on heartbeat communication
In next couple months, I will examine different eviction scenarios in RAC 11gR2 and when I see some strange behavior, I will do my best to inform it in this blog, so RAC lover, please stay tuned and here is the scenario number 1 :
AS you know, in 11gR2, oracle uses UDP protocol for heartbeats between the nodes.
In this post, I present a node eviction scenario when UDP communication is blocked between the nodes and you see that depends on where and how UDP is blocked, a different situation could occur.
The test is done on two node RAC in 11.2.0.3PSU3 version on Linux
42517 is the port which ocssd on racnode2 sends its heartbeat.
To break heartbeat communication between node2 and node1, any outgoing traffic on racnode2 for ocssd process (port 42517) is blocked with this command
alter log on node1 shows that node2 is evicted (bootless eviction) and then it is reconfigured and is joined the cluster.
ocssed on racnode2 has more details, I highlighted couple key lines
A key note here is that udp is reconfigured to be run on different port and after that node2 is able to join the cluster and starts up all its resources.
The following netstat also confirms that ocssd.bin listens on the new port
If I sum up, the following sequence of events occurs :
This sounds perfect as node2 is able to recover by itself. It looks like transparent and straight forward recovery.
Let see how this failure is recovered if UDP hiccups occur on node1 (master node in two node RAC)
To block heartbeat, all outgoing traffic on port 36613 is blocked
Based on scenario 1, I expected to see the same sequence of events. In other words, I expected to see node2 is evicted and is reconfigured and is rejoined the cluster.
However, in this case, it is seen that node2 is evicted and then as it is shown in below cssd is hung in start up and joining the cluster.
After reviewing all logs (quite length log, so I avoid to copy it here!), It is seen that ocssd in node2 complains about network heartbeat and no reconfig attempt is done, also in node1, after blocking UDP, the interface was disabled and no try is done to setup communication on differently.
As I mentioned earlier, Although UDP port is unblocked, still the following error is reported on node2 and node1 repeatedly.
It turned out that the issue is reported as Bug 14281269 : "NODE CAN'T REJOIN THE CLUSTER AFTER A TEMPORARY INTERCONNECT FAILURE - PROBLEM:after an interconnect failure on the first node the second node restarts the clusterware (rebootless restart) as expected, but can't join the cluster again till the interconnect interface of node1
is not shutdown/startup manually "
At the time of posting this, there is no patch available and the suggested workaround is to bounce interconnect interface.
In my test, even bouncing node2 (evicted node) did not help and I ended up to kill gipc daemon on node1 (master node/surviving node) and it did help and whole cluster recovered and node2 was able to join the cluster.
To conclude, in 2 node RAC :
AS you know, in 11gR2, oracle uses UDP protocol for heartbeats between the nodes.
In this post, I present a node eviction scenario when UDP communication is blocked between the nodes and you see that depends on where and how UDP is blocked, a different situation could occur.
The test is done on two node RAC in 11.2.0.3PSU3 version on Linux
Scenario 1: When UDP communication is blocked on the second node
In this scenario, outgoing UDP for ocssd process on node2 is blocked.
To do so, we find out UDP port on which ocssd is listening and will disable any outgoing trafficnetstat -a --inet |grep -i udp | grep -i racnode2 udp 0 0 racnode2-priv:14081 *:* udp 0 0 racnode2-priv:52358 *:* udp 0 0 racnode2-priv:52242 *:* udp 0 0 racnode2-priv:42517 *:* --> ocssd udp 0 0 racnode2-priv:31126 *:* udp 0 0 racnode2-priv:60741 *:* [root@racnode2 ~]# lsof -i :42517 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 3005 grid 55u IPv4 22340 UDP racnode2-priv:42517
42517 is the port which ocssd on racnode2 sends its heartbeat.
To break heartbeat communication between node2 and node1, any outgoing traffic on racnode2 for ocssd process (port 42517) is blocked with this command
iptables -A OUTPUT -s 192.168.2.152 -p udp --sport 42517 -j DROP
alter log on node1 shows that node2 is evicted (bootless eviction) and then it is reconfigured and is joined the cluster.
[cssd(3015)]CRS-1612:Network communication with node racnode2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.840 seconds 2012-12-31 10:50:54.213 [cssd(3015)]CRS-1611:Network communication with node racnode2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.820 seconds 2012-12-31 10:50:58.232 [cssd(3015)]CRS-1610:Network communication with node racnode2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.810 seconds 2012-12-31 10:51:01.056 [cssd(3015)]CRS-1607:Node racnode2 is being evicted in cluster incarnation 249572820; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/racnode1/cssd/ocssd.log. 2012-12-31 10:51:02.584 [cssd(3015)]CRS-1625:Node racnode2, number 2, was manually shut down 2012-12-31 10:51:02.590 [cssd(3015)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 . 2012-12-31 10:51:02.630 [crsd(3393)]CRS-5504:Node down event reported for node 'racnode2'. 2012-12-31 10:51:05.827 [crsd(3393)]CRS-2773:Server 'racnode2' has been removed from pool 'Generic'. 2012-12-31 10:51:05.829 [crsd(3393)]CRS-2773:Server 'racnode2' has been removed from pool 'ora.orcl'. 2012-12-31 10:51:37.987 [cssd(3015)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 . 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'. 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'. 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'ora.orcl'.
ocssed on racnode2 has more details, I highlighted couple key lines
2012-12-31 10:51:01.129: [ CSSD][3019058064]################################### 2012-12-31 10:51:01.129: [ CSSD][3019058064]clssscExit: CSSD aborting from thread clssnmvKillBlockThread 2012-12-31 10:51:01.129: [ CSSD][3019058064]################################### . . 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssgmClientShutdown: total iocapables 0 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssgmClientShutdown: graceful shutdown completed. 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssnmSendManualShut: Notifying all nodes that this node has been manually shut down . . 2012-12-31 10:51:25.352: [ CSSD][3040868032]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1356979885 2012-12-31 10:51:25.353: [ CSSD][3040868032]clssscmain: Environment is production . . 2012-12-31 10:51:26.167: [GIPCHTHR][3024477072] gipchaWorkerCreateInterface: created local interface for node 'racnode2', haName 'CSS_racnode-cluster', inf 'udp://192.168.2.152:29788'
A key note here is that udp is reconfigured to be run on different port and after that node2 is able to join the cluster and starts up all its resources.
The following netstat also confirms that ocssd.bin listens on the new port
[root@racnode2 ~]# netstat -a --inet |grep -i udp | grep -i racnode2 udp 0 0 racnode2-priv:31126 *:* udp 0 0 racnode2-priv:35489 *:* udp 0 0 racnode2-priv:38321 *:* udp 0 0 racnode2-priv:60741 *:* udp 0 0 racnode2-priv:10321 *:* udp 0 0 racnode2-priv:29788 *:* --> new port is created..... [root@racnode2 working]# lsof -i :29788 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 5919 grid 52u IPv4 764918 UDP racnode2-priv:29788
If I sum up, the following sequence of events occurs :
- udp communication for heartbeat is blocked (outgoing udp on ocssd port)
- Node1 evicts Node2
- Node2 is able to stop all IO capable resources and as the result, no need to boot the node (11g feature).
- Node2 starts CSSD and reconfigures UDP port
- Node2 is able to join the cluster
This sounds perfect as node2 is able to recover by itself. It looks like transparent and straight forward recovery.
Let see how this failure is recovered if UDP hiccups occur on node1 (master node in two node RAC)
Scenario 2: When UDP communication is blocked on the first node
To follow the same step, UDP port for hearbeat is found and it is blocked as it is shown in belowbash-3.2$ netstat -a --inet |grep -i udp | grep -i racnode1 udp 0 0 racnode1-priv:36613 *:* udp 0 0 racnode1-priv:36892 *:* udp 0 0 racnode1-priv:26055 *:* udp 0 0 racnode1-priv:13167 *:* udp 0 0 racnode1-priv:17914 *:* udp 0 0 racnode1-priv:51067 *:* [root@racnode1 ~]# lsof -i :36613 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 3010 grid 55u IPv4 19676 UDP racnode1-priv:36613
To block heartbeat, all outgoing traffic on port 36613 is blocked
iptables -A OUTPUT -s 192.168.2.151 -p udp --sport 36613 -j DROP
Based on scenario 1, I expected to see the same sequence of events. In other words, I expected to see node2 is evicted and is reconfigured and is rejoined the cluster.
However, in this case, it is seen that node2 is evicted and then as it is shown in below cssd is hung in start up and joining the cluster.
[root@racnode1 ~]# crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
[root@racnode2 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Abnormal Termination
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE racnode2
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE racnode2
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE racnode2
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE racnode2
ora.gpnpd
1 ONLINE ONLINE racnode2
ora.mdnsd
1 ONLINE ONLINE racnode2
Even,unblocking the same port with dropping the rule from iptables does not help and still CSS on node2 is not able to join the cluster.iptables -L
iptables -D OUTPUT -s 192.168.2.151 -p udp --sport 36613 -j DROP
[root@racnode1 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
After reviewing all logs (quite length log, so I avoid to copy it here!), It is seen that ocssd in node2 complains about network heartbeat and no reconfig attempt is done, also in node1, after blocking UDP, the interface was disabled and no try is done to setup communication on differently.
As I mentioned earlier, Although UDP port is unblocked, still the following error is reported on node2 and node1 repeatedly.
Node 2
==========
[ CSSD][3013077904]clssnmvDHBValidateNcopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 249572810, wrtcnt, 181183, LATS 1471304, lastSeqNo 181182, uniqueness 1356788464, timestamp 1356790643/1482244
Node1
=========
[GIPCHALO][3023862672] gipchaLowerProcessNode: no valid interfaces found to node for 25790 ms, node 0xa062a88 { host 'racnode2', haName 'CSS_racnode-cluster', srcLuid be28e076-9f3aafb1, dstLuid 61a1f895-ba260945 numInf 0, contigSeq 2639, lastAck 2626, lastValidAck 2638, sendSeq [2627 : 2683], createTime 4294328280, sentRegister 1, localMonitor 1, flags 0x2408 }
It turned out that the issue is reported as Bug 14281269 : "NODE CAN'T REJOIN THE CLUSTER AFTER A TEMPORARY INTERCONNECT FAILURE - PROBLEM:after an interconnect failure on the first node the second node restarts the clusterware (rebootless restart) as expected, but can't join the cluster again till the interconnect interface of node1
is not shutdown/startup manually "
At the time of posting this, there is no patch available and the suggested workaround is to bounce interconnect interface.
In my test, even bouncing node2 (evicted node) did not help and I ended up to kill gipc daemon on node1 (master node/surviving node) and it did help and whole cluster recovered and node2 was able to join the cluster.
[root@racnode1 working]# ps -ef |grep -i gipc
grid 2961 1 0 05:40 ? 00:00:16 /u01/app/11.2.0/grid/bin/gipcd.bin
root 7709 4792 0 06:44 pts/1 00:00:00 grep -i gipc
[root@racnode1 working]# kill -9 2961
[root@racnode1 working]# ps -ef |grep -i gipc
grid 7717 1 15 06:44 ? 00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin
root 7755 4792 0 06:44 pts/1 00:00:00 grep -i gipc
[/u01/app/11.2.0/grid/bin/oraagent.bin(3528)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:5} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log.
2012-12-29 06:44:56.972
[/u01/app/11.2.0/grid/bin/orarootagent.bin(3535)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:2:23} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/orarootagent_root/orarootagent_root.log.
2012-12-29 06:44:56.974
[/u01/app/11.2.0/grid/bin/oraagent.bin(3741)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:5:63} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2012-12-29 06:44:57.098
[ohasd(2414)]CRS-2765:Resource 'ora.ctssd' has failed on server 'racnode1'.
2012-12-29 06:44:59.141
[ctssd(7732)]CRS-2401:The Cluster Time Synchronization Service started on host racnode1.
2012-12-29 06:44:59.141
[ctssd(7732)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racnode1.
2012-12-29 06:45:01.164
[cssd(3010)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 .
2012-12-29 06:45:02.363
[crsd(7759)]CRS-1012:The OCR service started on node racnode1.
2012-12-29 06:45:03.155
[evmd(7762)]CRS-1401:EVMD started on node racnode1.
2012-12-29 06:45:05.147
[crsd(7759)]CRS-1201:CRSD started on node racnode1.
2012-12-29 06:45:38.798
[crsd(7759)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'.
2012-12-29 06:45:38.800
[crsd(7759)]CRS-2772:Server 'racnode2' has been assigned to pool 'ora.orcl'.
===== alert for node2 =========
2012-12-29 06:39:38.132
[cssd(7700)]CRS-1605:CSSD voting file is online: /dev/sda1; details in /u01/app/11.2.0/grid/log/racnode2/cssd/ocssd.log.
2012-12-29 06:45:01.165
[cssd(7700)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 .
2012-12-29 06:45:03.641
[ctssd(8061)]CRS-2401:The Cluster Time Synchronization Service started on host racnode2.
2012-12-29 06:45:03.641
[ctssd(8061)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racnode1.
2012-12-29 06:45:05.257
[ohasd(2405)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2012-12-29 06:45:16.836
[ctssd(8061)]CRS-2408:The clock on host racnode2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2012-12-29 06:45:25.475
[crsd(8199)]CRS-1012:The OCR service started on node racnode2.
2012-12-29 06:45:25.541
[evmd(8079)]CRS-1401:EVMD started on node racnode2.
2012-12-29 06:45:27.331
[crsd(8199)]CRS-1201:CRSD started on node racnode2.
2012-12-29 06:45:35.642
[/u01/app/11.2.0/grid/bin/oraagent.bin(8321)]CRS-5016:Process "/u01/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_grid/oraagent_grid.log"
2012-12-29 06:45:36.181
[/u01/app/11.2.0/grid/bin/oraagent.bin(8347)]CRS-5011:Check of resource "orcl" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2012-12-29 06:45:37.301
[/u01/app/11.2.0/grid/bin/oraagent.bin(8321)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_grid/oraagent_grid.log"
To conclude, in 2 node RAC :
- Network hiccups on heartbeat port on node2 is recovered automatically.
- Network hiccups on heartbeat port on node1 requires manual intervention due to bug 14281269
- Due to several reported bug, it is recommended to be on 11203 PSU3 at least.check out metalink note for other bugs: List of gipc defects that prevent GI from starting/joining after network hiccups [ID 1488378.1])
TIP 103# : GoldenGate and statistics
After having GoldenGate setup in your environment, the main question you most likely face is whether or not GoldenGate is working and if so, how much work is done.
GoldenGate provides stats command to report the work which is done, the following present how to use it properly and effectively.
To get stats from now on wards, follow the below steps :
Please be careful that if you do not reset the numbers, all the reported statistics is since startup of a GoldenGate process. In other words, the following stats commands work in the following way if numbers are not reset
Hope this helps for those who may find numbers misleading/incorrect.
GoldenGate provides stats command to report the work which is done, the following present how to use it properly and effectively.
To get stats from now on wards, follow the below steps :
1. Reset numbers : stats extract ext_int, reset
2. Apply the changes
3. Reterive stats :
stats extract ext_int, table xxxx.yyyy, latest
stats extract ext_int, totalsonly xxxx.yyyy, latest
Please be careful that if you do not reset the numbers, all the reported statistics is since startup of a GoldenGate process. In other words, the following stats commands work in the following way if numbers are not reset
stats extract ext_int, totalsonly xxxx.yyyyy --> since startup of ext_int stats extract ext_int, daily, table xxxx.yyyy --> since startup of day stats extract ext_int, daily, table owner.* --> for all tables since startup of day
Hope this helps for those who may find numbers misleading/incorrect.
TIP 102#: GoldenGate replicat performance
One of the nice feature of GoldenGate is BATCHSQL which enables GoldenGate replicat to organize similar SQL statements into array and apply them at once instead of normal mode to apply SQL statement one by one. Oracle document says that around 5K bytes of data change per row, BATCHSQL loses its benefits.
The following shows a sample run of BATCHSQL with INSERTAPPEND and how it saves resources and improves the performance.
Scenario 1 : Small table with massive changes
Initial load :
To find out the impact of BATCHSQL in terms of performance, The changes are tested on Oracle 11203 using GG 11201 and the changes are shipped to target with Oracle 11203 using GG 11201. The following shows different setting for replicat and how each setting improves the performance
Replicat setting 1 : Default setting - No BATCHSQL, No INSERTAPPEND
The following trace shows that 224450 times delete and 224450 times insert is run and it took 1261.54+27.72=1289.26 seconds to complete executions of these two statements (Overheads and other internal executions are ignored).
Replicat setting 2 : BATCHSQL, INSERTAPPEND
The following shows that only 450 times delete and 450 times insert are run and it took only 452.10+0.71=452.81 seconds to complete the executions of two statements.
Also it shows that GoldenGate runs insert in DIRECT PATH with adding hints.
Replicat setting 3 : BATCHSQL, No INSERTAPPEND
So far, BATCHSQL and INSERTAPPEND sounds promising for massive changes in small table. In the next post, I will try to test this feature with wider table to see how this feature performs for changes around 5KB per row.
The following shows a sample run of BATCHSQL with INSERTAPPEND and how it saves resources and improves the performance.
Scenario 1 : Small table with massive changes
create table repuser.tbl1 (id number, name varchar2(30)); alter table repuser.tbl1 add constraint uc1 unique (id);
Initial load :
SQL> delete from repuser.tbl1; 0 rows deleted. SQL> insert into repuser.tbl1 select object_id*-1,object_name from all_objects where object_id is not null; 56135 rows created. SQL> c/-1/-100000 1* insert into repuser.tbl1 select object_id*-100000,object_name from all_objects where object_id is not null SQL> / 56135 rows created. SQL> c/-100000/1 1* insert into repuser.tbl1 select object_id*1,object_name from all_objects where object_id is not null SQL> / 56135 rows created. SQL> commit; Commit complete.
To find out the impact of BATCHSQL in terms of performance, The changes are tested on Oracle 11203 using GG 11201 and the changes are shipped to target with Oracle 11203 using GG 11201. The following shows different setting for replicat and how each setting improves the performance
Replicat setting 1 : Default setting - No BATCHSQL, No INSERTAPPEND
GGSCI (localhost.localdomain) 33> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RENA_IN 00:22:24 00:00:02 It takes more than 22 minutes to apply changes (224450 deletes and 224450 inserts)
The following trace shows that 224450 times delete and 224450 times insert is run and it took 1261.54+27.72=1289.26 seconds to complete executions of these two statements (Overheads and other internal executions are ignored).
DELETE FROM "REPUSER"."TBL1"
WHERE
"ID" = :b0 AND ROWNUM = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 224540 1261.54 1296.81 2979 567004335 238878 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 224541 1261.54 1296.81 2979 567004335 238878 224540
INSERT INTO "REPUSER"."TBL1" ("ID","NAME")
VALUES
(:a0,:a1)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 224540 26.91 27.72 0 787 237105 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 224541 26.91 27.72 0 787 237105 224540
Replicat setting 2 : BATCHSQL, INSERTAPPEND
GGSCI (localhost.localdomain) 23> ! info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RENA_IN 00:07:04 00:00:00 The same changes only took more than 7 mins to be completed. It is more than 3 times faster setting 1.
The following shows that only 450 times delete and 450 times insert are run and it took only 452.10+0.71=452.81 seconds to complete the executions of two statements.
Also it shows that GoldenGate runs insert in DIRECT PATH with adding hints.
DELETE FROM "REPUSER"."TBL1"
WHERE
"ID" = :b0 AND ROWNUM = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 450 440.00 452.10 12 139050526 238818 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 451 440.00 452.10 12 139050526 238818 224540
INSERT /*+ APPEND APPEND_VALUES */ INTO "REPUSER"."TBL1" ("ID","NAME")
VALUES
(:a0,:a1)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 450 0.60 0.71 0 922 5263 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 451 0.60 0.71 0 922 5263 224540
Replicat setting 3 : BATCHSQL, No INSERTAPPEND
GGSCI (localhost.localdomain) 23> ! info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING RENA_IN 00:07:54 00:00:00 It took around 8 minutes to apply the same change, Timing is very close to BATCHSQL/INSERTAPPEND but almost 3 times faster than traditional GoldenGate.Trace also shows that 450 executions of delete and 450 executions of inserts for the same amount of the change.Elapsed time is 518.87+0.48=519.35 seconds.
DELETE FROM "REPUSER"."TBL1"
WHERE
"ID" = :b0 AND ROWNUM = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 450 509.74 518.87 0 168600692 238673 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 451 509.74 518.87 0 168600692 238673 224540
INSERT INTO "REPUSER"."TBL1" ("ID","NAME")
VALUES
(:a0,:a1)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 450 0.46 0.48 0 2312 11070 224540
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 451 0.46 0.48 0 2312 11070 224540
So far, BATCHSQL and INSERTAPPEND sounds promising for massive changes in small table. In the next post, I will try to test this feature with wider table to see how this feature performs for changes around 5KB per row.
Subscribe to:
Posts (Atom)