Quantcast
Channel: zaheer.appsdba's Groups Activities
Viewing all articles
Browse latest Browse all 74

Oracle 12c RAC Flex Cluster Mystery

$
0
0

Introduction

As discussed earlier Oracle Flex ASM and Flex Cluster is the new option introduced from oracle 12c Grid infrastructure.  In my previous articles we have seen how we can install, configure  and scale the Flex Cluster.  This article will demonstrate some of the hidden facts about Oracle 12c Flex Cluster.

In recent projects we worked on Oracle 12c Flex cluster deployment and we came across some of the issues which are not documented. This article will cover  two major issue we encountered while working with oracle flex cluster.

Before we continue further I would like to clarify the key difference between the standard cluster Installation and flex cluster Installation. 

 

  • Standard cluster installation will be performed using the non GNS  configuration that means all virtual host names and virtual IP addresses should be configured manually.
  • Flex cluster Installation will be performed using the GNS configuration and in this configuration all Virtual host names and Virtual IP addresses will be assigned using the GNS sub domain delegation.

Issue-1 :

During the Installation of Flex cluster the software has been copied  on all cluster nodes and Installer prompted to execute "root.sh" script on all participating cluster nodes as shown in below screen shot.

 

 

-Executed script "root.sh" on flexrac1 and it failed with error.

 

[root@flexrac1 /]# /u01/grid/12.1.0/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/12.1.0

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/12.1.0/crs/install/crsconfig_params
2015/03/27 01:36:43 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
2015/03/27 01:37:40 CLSRSC-330: Adding Clusterware entries to file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'flexrac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'flexrac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'flexrac1'
CRS-2676: Start of 'ora.mdnsd' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.evmd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'flexrac1'
CRS-2676: Start of 'ora.gpnpd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'flexrac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'flexrac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'flexrac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'flexrac1'
CRS-2676: Start of 'ora.diskmon' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'flexrac1' succeeded

ASM created and started successfully.

Disk Group GRID created successfully.

CRS-2672: Attempting to start 'ora.crf' on 'flexrac1'
CRS-2672: Attempting to start 'ora.storage' on 'flexrac1'
CRS-2676: Start of 'ora.storage' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.crf' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'flexrac1'
CRS-2676: Start of 'ora.crsd' on 'flexrac1' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk b924385202a04f41bfa64e62712717e8.
Successfully replaced voting disk group with +GRID.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE b924385202a04f41bfa64e62712717e8 (/dev/oracleasm/disks/DATA1) [GRID]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'flexrac1'
CRS-2677: Stop of 'ora.crsd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crf' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'flexrac1'
CRS-2677: Stop of 'ora.storage' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'flexrac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'flexrac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'flexrac1'
CRS-2677: Stop of 'ora.cssd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'flexrac1'
CRS-2677: Stop of 'ora.gipcd' on 'flexrac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'flexrac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

The script was hanging at this stage and after some time it completed with "failed" status.

On further investigation the following error has been reported in the Installation log file:

crsd(23378)]CRS-2772:Server 'flexrac1' has been assigned to pool 'Free'.
2015-03-28 22:49:22.488:
[gnsd(23641)]CRS-10001:CLSGN-0121: Trace level set to 1.
2015-03-28 22:49:24.002:
[gnsd(23641)]CRS-10001:CLSGN-0125: GNSD started on node flexrac1.
2015-03-28 22:50:17.736:
[/u01/grid/12.1.0/bin/orarootagent.bin(23516)]CRS-5017:The resource action "ora.flexrac1.vip start" encountered the following error:
CRS-5005: IP Address: 192.168.1.13 is already in use in the network
. For details refer to "(:CLSN00107:)" in "/u01/grid/12.1.0/log/flexrac1/agent/crsd/orarootagent_root/orarootagent_root.log".
2015-03-28 22:50:19.785:
[crsd(23378)]CRS-2807:Resource 'ora.flexrac1.vip' failed to start automatically.
2015-03-28 22:51:36.498:
[/u01/grid/12.1.0/bin/orarootagent.bin(23516)]CRS-5017:The resource action "ora.flexrac1.vip start" encountered the following error:
CRS-5005: IP Address: 192.168.1.13 is already in use in the network
. For details refer to "(:CLSN00107:)" in "/u01/grid/12.1.0/log/flexrac1/agent/crsd/orarootagent_root/orarootagent_root.log".
2015-03-28 22:59:26.674:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.676:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.678:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.678:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.680:
[gnsd(23641)]CRS-10001:(:CLSGN00002:)CLSGN-0201: first self-check name resolution failed.

After  analyzing this piece of log we tried to reach IP address "192.168.1.13" then it was reachable.

root@flexrac1 /]# ping 192.168.1.13
PING 192.168.1.13 (192.168.1.13) 56(84) bytes of data.
64 bytes from 192.168.1.13: icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from 192.168.1.13: icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from 192.168.1.13: icmp_seq=3 ttl=64 time=0.026 ms
64 bytes from 192.168.1.13: icmp_seq=4 ttl=64 time=0.024 ms
--- 192.168.1.13 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.018/0.024/0.028/0.003 ms
[root@flexrac1 /]#

The DHCP lease defined in the scope of this cluster is starting from 192.168.1.13 - 192.168.1.25. By default GNS is trying to assign 192.168.1.13 as its the first starting IP address of the DHCP  lease. 

root.sh script is failing as its unable to allocate the first IP address  from the DHCP lease. This is a strange behavior if any of the IP is not free then it should assign the next available IP address from the lease. 

Oracle grid infrastructure GNS configuration was trying to allocate VIP address "192.168.1.13" to the virtual host flexrac1-vip. But the target IP address is not free to be allocate it.All cluster nodes are configured with two active Ethernet  cards, one for Public network and another for private network.  But when we check the server there were three active Ethernet cards.

"eth0" was configured to be used as public

"eth1" was configured to used as private

"eth2" to should be inactive


But when we check  "eth2" was active and configured to assign IP address using the DHCP and due to this reason the VIP  "192.168.1.13" that are supposed to assign it to virtual interface was assigned to the physical Interface and hence Grid Infrastructure is unable to allocate the required virtual IP address to flexrac1.

After pointing out this we disabled interface "eth2" and executed root.sh script again.

Execution of "root.sh" script after fixing the problem:

PRKO-2188 : All the node applications already exist. They were not recreated.
PRKF-1107 : GNS server already configured
PRKZ-1072 : SCAN name "flexrac-cluster-scan.flex-cluster.oralabs.com" is already registered on network 1
PRCS-1028 : Single Client Access Name (SCAN) listener resources already exist on network 1
PRCN-3004 : Listener LISTENER_LEAF already exists
PRCA-1095 : Unable to create ASM resource because it already exists.
PRCN-3004 : Listener ASMNET1LSNR_ASM already exists
CRS-5702: Resource 'ora.GRID.dg' is already running on 'flexrac1'
PRCR-1086 : resource ora.cvu is already registered
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'flexrac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.GRID.dg' on 'flexrac1'
...........
...........
...........
CRS-2672: Attempting to start 'ora.scan3.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.scan2.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.scan1.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.flexrac1.vip' on 'flexrac1'
CRS-2676: Start of 'ora.scan3.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.scan2.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.scan1.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.flexrac1.vip' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.oc4j' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'flexrac1' succeeded
CRS-6016: Resource auto-start has completed for server flexrac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2015/03/28 23:20:24 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded. 

Once we execute root.sh script on first node then GNS will assign one virtual IP on the cluster node and 3 scan  IP's to the scan name.

[oracle@flexrac1 ~]$ olsnodes -i
flexrac1 192.168.1.13
[oracle@flexrac1 ~]$

[oracle@flexrac1 ~]$ srvctl config scan
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 0 IPv4 VIP: -/scan1-vip/192.168.1.14
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 1 IPv4 VIP: -/scan2-vip/192.168.1.15
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 2 IPv4 VIP: -/scan3-vip/192.168.1.16
[oracle@flexrac1 ~]$

If we observe the VIP allocation here "192.168.1.13" was allocated to flexrac1-vip and subsequent IP's 192.168.1.14/15/16 is allocated to SCAN. 

List of VIP's after completion of root.sh script:


[oracle@flexrac1 ~]$ olsnodes -i
flexrac1 192.168.1.13
flexrac3 192.168.1.17
flexrac2 192.168.1.18
flexrac4 <none>
flexrac5 <none>
[oracle@flexrac1 ~]$

Recommendation:

We should ensure that we are using the equal number of active interfaces on all participating cluster nodes. If there are additional interfaces exists on cluster nodes then we must ensure that these interfaces are not configured with DHCP configuration. Additional interfaces with DHCP configuration will fail the execution of "root.sh" script.

Issue-2 :

The addition of nodes failed with the following error:

[oragrid@flexnode1 addnode]$ ./addnode.sh  -silent "CLUSTER_NEW_NODES={flexnode6,flexnode7,flexnode8}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={flexnode6-vip,flexnode7-vip}" "CLUSTER_NEW_NODE_ROLES={hub,hub,leaf}"

Starting Oracle Universal Installer...

 

Checking Temp space: must be greater than 120 MB.   Actual 4575 MB    Passed

Checking swap space: must be greater than 150 MB.   Actual 5210 MB    Passed

[FATAL] PRVG-11408 : API called with unequal sized arrays for nodes, VIPs and node roles

[oragrid@flexnode1 addnode]$

Error message with silent execution of node addition "[FATAL] PRVG-11408" but when we execute the same command with GUI mode the error message is "INS-08107" .

There is no clear information available for this error code on oracle support. But the error message encountered with the command line was little informative as it was mentioning  " API called with unequal sized arrays for nodes, VIPs and node roles" . 

After seeing this issue  we try to add only one leaf node and one hub node to the existing cluster configuration and the node addition completed without any issue.

 

[oragrid@flexnode1 addnode]$ ./addnode.sh   "CLUSTER_NEW_NODES={flexnode6,flexnode8}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={flexnode6-vip}" "CLUSTER_NEW_NODE_ROLES={hub,leaf}"

Starting Oracle Universal Installer...

 

Checking Temp space: must be greater than 120 MB.   Actual 4575 MB    Passed

Checking swap space: must be greater than 150 MB.   Actual 5210 MB    Passed

Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed

This behavior of flex cluster is not clear for me, as the initial install also doesn't consists equal number of Hub and leaf nodes. In initial install we had 3 HUB nodes and 2 LEAF nodes.

The unequal number of HUB nodes and LEAF nodes with addnode  command should not be a problem.  I am currently  working with Oracle support on this issue and once i have a satisfactory input from the MOS then I will update the same article.

Recommendation:  

At this stage I would recommend to use equal number of HUB and LEAF nodes if you're scaling up the existing flex cluster environment. 

Conclusion:

In this article we have seen two major issue which I we encountered during the deployment of the flex cluster.  The errors listed in this article is not listed in my oracle support.  The flex cluster Installation and configuration is simple but the troubleshooting part is little difficult as there not many customers really using the flex cluster option. But I hope this article will help individuals who are planning to deploy Oracle 12c flex cluster .


Viewing all articles
Browse latest Browse all 74

Trending Articles