Quantcast
Channel: zaheer.appsdba's Groups Activities
Viewing all 74 articles
Browse latest View live

ZFS Memory Tuning for Oracle Databases & Application on Oracle Solaris 11

$
0
0

Introduction:

Solaris is one of the widely used Operating Environment for Oracle Databases.  The proper configuration of  operating systems is one of the key factor for Database and Application Performance.  If the database is configured perfectly for optimal performance but the Operating system on which the Database is deployed is not configured properly then definitely the Database performance will not work as expected. So for a better database performance all layers are very Important that includes Operating system, Database, Network and  Virtuallization. 

Starting from Sun Solaris 10 "ZFS" file system has been introduced and ZFS is a very strong file system with multiple capabilities.   In solaris 10 using ZFS file system for "root" file system was optional but starting from Oracle Solaris 11 ZFS is the default "root" file system. ZFS performance is very good compared to traditional UFS file system.

Oracle Solaris ZFS file system uses "Adaptive Replacement Cache" (ZFS ARC) from system main memory for performing faster I/O operations. By default ZFS uses all available free memory on the system for performing I/O operations performed against ZFS file systems. But oracle allows us to limit this parameter by setting to specific value so that ZFS can utilize only the specified amount of memory from the system for performing caching operations.

In this article we will explore how to configure the ZFS ARC appropriately for better system performance.

Environment Details:

Oracle Solaris 11.2 - X86

Oracle Database 12.1.0.2

ZFS file system

Demonstration:

>> Check the memory utilization of OS before starting an Oracle Instance:

>> Check the current memory utilization with Database up and running

Here note the  value of ZFS File Data and Free (freelist). ZFS file data is ZFS cache data which is their in main memory  and "freelist" is amount of free memory currently available on the system.

>> Now we will perform some I/O operation so that we can see the utilization of ZFS file data.

First I/O operation:

root@soltest:/u01/oradb/oracle/oradata# ls
TESTDB TESTDB1
root@soltest:/u01/oradb/oracle/oradata# du -sh *
3K TESTDB
3.7G TESTDB1
root@soltest:/u01/oradb/oracle/oradata# cp -r TESTDB1 TESTDB1.BKP &
[1] 3346
root@soltest:/u01/oradb/oracle/oradata#

- we have copied 3.7 GB of Data on a ZFS file system.

>> Now check the ZFS memory utilization:

The total memory is utilized by ZFS file system now. At this stage certain system starts facing problem, ZFS will not free memory pages to be utilized by Database/Application. This will result in the bad performance at application/database Layer.

The Below graph displays the memory utilization before and after I/O operation on ZFS file system:

 

- The free memory graph is continuously decreasing  after ZFS I/O operations started.

>> Shutdown the Database and check the memory utilization

SQL*Plus: Release 12.1.0.1.0 Production on Thu Mar 31 20:00:40 2016

Copyright (c) 1982, 2013, Oracle. All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show sga

Total System Global Area 1970864128 bytes
Fixed Size 2362648 bytes
Variable Size 469762792 bytes
Database Buffers 1493172224 bytes
Redo Buffers 5566464 bytes
SQL> shut immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>

- Now it freed the memory that was allocated to the SGA

Here if we start the Database it will start without issue. But on certain system I've seen its not able to start due to insufficient memory as ZFS is not freeing the cache memory.

>> Error during startup of database due to Insufficient free memory:

oracle@soltest:/oradb01/oracle/oradata$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sun Jan 24 14:39:25 2016

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORA-27102: out of memory
SVR4 Error: 12: Not enough space
Additional information: 1671
Additional information: 16106127360
Additional information: 4815060992
SQL>

On this system the total system memory is 32 GB from which 26GB is utilized by ZFS file data and its not freeing the ZFS file data memory when Database is requesting memory.

If the database is up and if any of the application process requesting memory from system then that process is not allowed to take memory due to this problem and eventually its resulting in bad application and Database performance.

To overcome this issue we need to reboot the server to freeup the memory.

>> check the ZFS ARC:

root@soltest:~# kstat -p -m zfs -n arcstats
zfs:0:arcstats:buf_size 856576
zfs:0:arcstats:c 79317672
zfs:0:arcstats:c_max 7498948608
zfs:0:arcstats:c_min 67108864
zfs:0:arcstats:class misc
zfs:0:arcstats:crtime 10.045481101
zfs:0:arcstats:data_size 70853312
zfs:0:arcstats:deleted 247266
zfs:0:arcstats:demand_data_hits 4891628
zfs:0:arcstats:demand_data_misses 18829
zfs:0:arcstats:demand_metadata_hits 1050892
zfs:0:arcstats:demand_metadata_misses 10239
zfs:0:arcstats:evict_l2_cached 0
zfs:0:arcstats:evict_l2_eligible 27127086592
zfs:0:arcstats:evict_l2_ineligible 2900732416
zfs:0:arcstats:evict_mfu 11182559232
zfs:0:arcstats:evict_mru 18845259776
zfs:0:arcstats:hash_chain_max 6
zfs:0:arcstats:hash_chains 5221
zfs:0:arcstats:hash_collisions 95890
zfs:0:arcstats:hash_elements 37288
zfs:0:arcstats:hash_elements_max 122845
zfs:0:arcstats:hits 6058515
zfs:0:arcstats:l2_abort_lowmem 0
zfs:0:arcstats:l2_cksum_bad 0
zfs:0:arcstats:l2_evict_lock_retry 0
zfs:0:arcstats:l2_evict_reading 0
zfs:0:arcstats:l2_feeds 0
zfs:0:arcstats:l2_hdr_size 0
zfs:0:arcstats:l2_hits 0
zfs:0:arcstats:l2_io_error 0
zfs:0:arcstats:l2_misses 29068
zfs:0:arcstats:l2_read_bytes 0
zfs:0:arcstats:l2_rw_clash 0
zfs:0:arcstats:l2_size 0
zfs:0:arcstats:l2_write_bytes 0
zfs:0:arcstats:l2_writes_done 0
zfs:0:arcstats:l2_writes_error 0
zfs:0:arcstats:l2_writes_hdr_miss 0
zfs:0:arcstats:l2_writes_sent 0
zfs:0:arcstats:memory_throttle_count 0
zfs:0:arcstats:meta_limit 0
zfs:0:arcstats:meta_max 48748056
zfs:0:arcstats:meta_used 8354032
zfs:0:arcstats:mfu_ghost_hits 19244
zfs:0:arcstats:mfu_hits 5468733
zfs:0:arcstats:misses 261357
zfs:0:arcstats:mru_ghost_hits 15595
zfs:0:arcstats:mru_hits 312947
zfs:0:arcstats:mutex_miss 851
zfs:0:arcstats:other_size 7497456
zfs:0:arcstats:p 35364864
zfs:0:arcstats:prefetch_data_hits 70836
zfs:0:arcstats:prefetch_data_misses 214018
zfs:0:arcstats:prefetch_metadata_hits 45159
zfs:0:arcstats:prefetch_metadata_misses 18271
zfs:0:arcstats:size 79207344
zfs:0:arcstats:snaptime 27590.580059777
root@soltest:~#



To overcome this issue its highly recommended to CAP the ZFS ARC memory configuration to specific amount that it should utilize for caching operations.

Memory configuration for ZFS ARC:

To reserve ZFS ARC memory size there are two options available:

1 - Configure "zfs_arc_max" value in /etc/system

2 - Use script "set_user_reserve.sh" 

option-1 is supported till Oracle Solaris 11.1 and starting from Oracle Solaris 11.2 and higher Oracle recommends to use "set_user_reserve.sh" script  . But option-2 cannot be used on pre solaris 11.2 operating environments.

- Option-1 needs reboot of the system for the parameter to be effective

- Option-2 doesn't need reboot but we need to update "/etc/system" file for the settings to be persistent after reboot.

Script "set_user_reserve.sh " can be downloaded from MOS tech note (Doc ID 1663862.1)

root@soltest:~/scripts# ./set_user_reserve.sh -fp 20
Adjusting user_reserve_hint_pct from 0 to 20
Adjustment of user_reserve_hint_pct to 20 successful.
Make the setting persistent across reboot by adding to /etc/system

#
# Tuning based on MOS note 1663862.1, script version 1.0
# added Friday, April 1, 2016 03:00:55 AM AST by system administrator : <me>
set user_reserve_hint_pct=20

root@soltest:~/scripts#

- 20 value will reserve only 20% of system memory reserved for applications/database. The good thing is this value can be changed any time as desired

>> check the memory utilization:

>> Now change the user_reserve_hint value to 80 and we can see it will change dynamically


root@soltest:~/scripts# ./set_user_reserve.sh -fp 80
Adjusting user_reserve_hint_pct from 20 to 80
Friday, April 1, 2016 01:12:57 PM AST : waiting for current value : 43 to grow to target : 45
Friday, April 1, 2016 01:13:03 PM AST : waiting for current value : 47 to grow to target : 50
Friday, April 1, 2016 01:13:16 PM AST : waiting for current value : 50 to grow to target : 55
Friday, April 1, 2016 01:13:35 PM AST : waiting for current value : 57 to grow to target : 60
Friday, April 1, 2016 01:13:52 PM AST : waiting for current value : 60 to grow to target : 65
Adjustment of user_reserve_hint_pct to 80 successful.
Make the setting persistent across reboot by adding to /etc/system

#
# Tuning based on MOS note 1663862.1, script version 1.0
# added Friday, April 1, 2016 01:14:12 PM AST by system administrator : <me>
set user_reserve_hint_pct=80

>> Now start an I/O operation and check the ZFS file data utilization:

root@soltest:/u01/oradb/oracle/oradata# cp -r TESTDB2 TESTDB2.BKP &
[1] 1828
root@soltest:/u01/oradb/oracle/oradata# jobs
[1]+ Running cp -r TESTDB2 TESTDB2.BKP &
root@soltest:/u01/oradb/oracle/oradata#

Though we are running a copy job for file around 4GB, still ZFS file data utilization is still 480 MB. Before setting this parameter it was utilizing all available memory.

In pre oracle Solaris 11.2 the only option to reserve ZFS ARC size is to set "zfs_arc_max"  parameter to the required value and reboot the server for the parameter to be effective

cat /etc/system
####ZFS ARCH 3.2 GB######
set zfs:zfs_arc_max = 3435973837
oracle@soltest:/oradb01/oracle/oradata$

This parameter will limit the utilization of ARC size upto 3.2 GB. In Solaris 11.2 and Solaris 11.3   if we configure this parameter in "/etc/system" file then it will work but this setting is deprecated. So its recommended to use option-2 as listed above.

Conclusion:

This parameter will update the system how much memory from the server it should utilize for ZFS caching, So its highly recommended to set this value appropriately before deploying any application/database on oracle Solaris operating Environment. In Solaris 11.2 to resever ZFS ARC reboot is not required. By setting this parameter administrators can meet all future application/database memory requirements.

 

 


Oracle RAC on Solaris LDOM Shared Disk configuration

$
0
0

Introduction:

All these days there are certain common words like "cloud, virtualization, consolidation" very frequently used in IT industry.  There are so many organization still not convinced to run the Application and  Database workloads on a virtualized platforms. 

Does really it's not a good idea to run databases and applications on virtualized platform??      

I believe the answer for this question is both "YES" and "NO". If the virtualized environments are poorly configured and deployed then the answer is YES , because business may face bad system performance, business outage's and time to find the root cause for the problem and fixing it. 

If the virtualized environment is properly deployed considering all hardware consideration then the answer will be NO and such environments are far better in terms of availability and better hardware utilization.

This article doesn't cover step by step method for configuring and Installing a Real Application cluster on Oracle Solaris Sparc based virtualization LDOM, rather it covers the best practices you should follow for configuring shared disk devices inside the LDOMS that will be used for RAC Installation.

The below diagram shows the typical deployment of sparc based virtualization. There are two servers and Oracle RAC is Installed on LDOMS using each physical server.

The main purpose of writing this article is to help individuals who are planning to use LDOMS for their RAC deployments and  what things we should consider while configuring the Shared storage devices on LDOM servers.

If the shared devices are not configured properly then we may encounter node eviction issues here and then, so we must be very careful while configuring the shared devices. In this article I will demonstrate one issue which was encountered by at least three customers.

Environments Details:

  • Two Physical Severs - Oracle Sparc T5-4
  • RAC deployed on two ldom from 2 physical servers each
  • Oracle ZFS Storage was used for Shared Storage

S.No.

Servers

Description

1

controlhost01

Controller host domain - server1

2

racnode1

Guest ldom server on server1

3

controlhost02

Controller host domain - server2

4

racnode1

Guest ldom server on server2

 

Oracle Grid Infrastructure 11.2.0.3 was running without issues, but for some maintenance one of the server rebooted and since after node reboot,  that node started evicting from the cluster. 

Observations:

Log message from the operating system : 

Jan 14 03:52:10 racnode1 last message repeated 1 time
Jan 14 04:26:32 racnode1 CLSD: [ID 770310 daemon.notice] The clock on host racnode1 has been updated by the Cluster Time Synchronization Service to be synchronous with th
e mean cluster time.
Jan 14 04:45:22 racnode1 vdc: [ID 795329 kern.notice] NOTICE: vdisk@1 disk access failed
Jan 14 04:49:33 racnode1 last message repeated 5 times
Jan 14 04:50:23 racnode1 vdc: [ID 795329 kern.notice] NOTICE: vdisk@1 disk access failed
Jan 14 04:51:14 racnode1 last message repeated 1 time
Jan 14 05:00:11 racnode1 CLSD: [ID 770310 daemon.notice] The clock on host racnode1 has been updated by the Cluster Time Synchronization Service to be synchronous with th
e mean cluster time.
Jan 14 05:33:34 racnode1 last message repeated 1 time
Jan 14 05:45:25 racnode1 vdc: [ID 795329 kern.notice] NOTICE: vdisk@1 disk access failed
Jan 14 05:48:54 racnode1 last message repeated 4 times
Jan 14 05:49:44 racnode1 vdc: [ID 795329 kern.notice] NOTICE: vdisk@1 disk access failed
Jan 14 05:52:22 racnode1 last message repeated 3 times
Jan 14 06:09:21 racnode1 CLSD: [ID 770310 daemon.notice] The clock on host racnode1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Log message from the GI logs : 

NOTE: cache mounting group 3/0xF98788E3 (OCR) succeeded
NOTE: cache ending mount (success) of group OCR number=3 incarn=0xf98788e3
GMON querying group 1 at 10 for pid 18, osid 8795
Thu Jan 28 02:15:18 2016
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup ARCH was mounted
GMON querying group 2 at 11 for pid 18, osid 8795
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
GMON querying group 3 at 12 for pid 18, osid 8795
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCR was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:0:2} */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:0:2} */
Thu Jan 28 02:15:19 2016
WARNING: failed to online diskgroup resource ora.ARCH.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Thu Jan 28 02:15:36 2016
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR

 

[/u01/grid/bin/oraagent.bin(9694)]CRS-5818:Aborted command 'check' for resource 'ora.ARCH.dg'. Details at (:CRSAGF00113:) {1:57521:2} in /u01/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-01-28 02:18:42.213
[/u01/grid/bin/oraagent.bin(9694)]CRS-5818:Aborted command 'check' for resource 'ora.DATA.dg'. Details at (:CRSAGF00113:) {1:57521:2} in /u01/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-01-28 02:18:42.213
[/u01/grid/bin/oraagent.bin(9694)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER_SCAN3.lsnr'. Details at (:CRSAGF00113:) {1:57521:2} in /u01/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-01-28 02:18:42.410
[/u01/grid/bin/oraagent.bin(9694)]CRS-5016:Process "/u01/grid/opmn/bin/onsctli" spawned by agent "/u01/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log"
2016-01-28 02:18:50.897
[/u01/grid/bin/oraagent.bin(9694)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {1:57521:2} in /u01/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log.

Cause:

On further Investigation we found the logical devices names used by ASM on both LDOM  Cluster nodes are not same and this is the reason one of the Instance is always getting evicted.

RACNODE1

 

root@controlhost01:~# ldm list -o disk racnode

NAME

racnode

DISK

    NAME             VOLUME                       TOUT ID   DEVICE  SERVER         MPGROUP

    OS               OS@racnode                      0      disk@0  primary

    cdrom            cdrom@racnode                   1      disk@1  primary

    DATA             DATA@RACDisk                    5      disk@5  primary

    OCR1             OCR1@RACDisk                    2      disk@2  primary

    OCR2             OCR2@RACDisk                    3      disk@3  primary

    OCR3             OCR3@RACDisk                    4      disk@4  primary

    ARCH             ARCH@RACDisk                    6      disk@6  primary

root@controlhost01:~#

RACNODE2

root@controlhost02:~# ldm list -o disk racnode
NAME
racnode

DISK
NAME              VOLUME                      TOUT ID      DEVICE SERVER MPGROUP


OS              OS@racnode                       0         disk@0 primary
OCR1            OCR1@RACDisk                     2         disk@2 primary
OCR2            OCR2@RACDisk                     3         disk@3 primary
OCR3            OCR3@RACDisk                     4         disk@4 primary
DATA            DATA@RACDisk                     5         disk@5 primary
ARCH            ARCH@RACDisk                     1         disk@1 primary

root@controlhost02:~#

 

If we observe there is difference in the number of devices of racnode on controlhost01 and controlhost02.

controlhost01 has seven devices in total and controlhost02 has six devices in total and if you observe there is difference  in the logical name as well.
controlhost01 ==>ARCH             ARCH@RACDisk                      1    disk@1  primary
controlhost02 ==>ARCH             ARCH@RACDisk                      6   disk@6  primary


On racnode1 the logical device name allocated is "1" and on racnode2 the logical device name allocated is  "6". Lets see the device name allocated on each cluster node:
RACNODE2:
-bash-3.2# echo|format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c0d0 <SUN-DiskImage-150GB cyl 4264 alt 2 hd 96 sec 768>
          /virtual-devices@100/channel-devices@200/disk@0
       1. c0d2 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR1
          /virtual-devices@100/channel-devices@200/disk@2
       2. c0d3 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR2
          /virtual-devices@100/channel-devices@200/disk@3
       3. c0d4 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR3
          /virtual-devices@100/channel-devices@200/disk@4
       4. c0d5 <DGC-VRAID-0533 cyl 44556 alt 2 hd 255 sec 189>  DATA
          /virtual-devices@100/channel-devices@200/disk@5
       5. c0d6 <DGC-VRAID-0533 cyl 63998 alt 2 hd 256 sec 96>  ARCH
          /virtual-devices@100/channel-devices@200/disk@6
Specify disk (enter its number): Specify disk (enter its number):
-bash-3.2#
RACNODE1:
-bash-3.2# echo|format
Searching for disks...done
 
AVAILABLE DISK SELECTIONS:
       0. c0d0 <SUN-DiskImage-150GB cyl 4264 alt 2 hd 96 sec 768>
          /virtual-devices@100/channel-devices@200/disk@0
       1. c0d1 <DGC-VRAID-0533 cyl 63998 alt 2 hd 256 sec 96>  ARCH
          /virtual-devices@100/channel-devices@200/disk@1
       2. c0d2 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR1
          /virtual-devices@100/channel-devices@200/disk@2
       3. c0d3 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR2
          /virtual-devices@100/channel-devices@200/disk@3
       4. c0d4 <DGC-VRAID-0533 cyl 51198 alt 2 hd 256 sec 16>  OCR3
          /virtual-devices@100/channel-devices@200/disk@4
       5. c0d5 <DGC-VRAID-0533 cyl 44556 alt 2 hd 255 sec 189>  DATA
          /virtual-devices@100/channel-devices@200/disk@5
Specify disk (enter its number): Specify disk (enter its number):
-bash-3.2#
 
Let's check the ASM disks from ASM instance's on both rac nodes:
 
racnode2
 
SQL> select name, path from v$asm_disk;
 
NAME                                               PATH
-------------------------------------------------- --------------------
OCR_0002                                           /dev/rdsk/c0d4s4
DATA_0000                                          /dev/rdsk/c0d5s4
OCR_0000                                           /dev/rdsk/c0d2s4
OCR_0001                                           /dev/rdsk/c0d3s4
ARCH_0000                                                   /dev/rdsk/c0d1s4
 
racnode1
 
 
SQL> select name, path from v$asm_disk;
 
NAME                                               PATH
-------------------------------------------------- --------------------
OCR_0002                                           /dev/rdsk/c0d4s4
DATA_0000                                          /dev/rdsk/c0d5s4
OCR_0000                                           /dev/rdsk/c0d2s4
OCR_0001                                           /dev/rdsk/c0d3s4
ARCH_0000                                                   /dev/rdsk/c0d6s4

Here is the problem the ASM disks paths for ARCH Disk group is different and it creating problem for Grid Infrastructure to understand which path is a correct and valid path.
So we must be very careful about the logical device names of shared disks.
If we came across such situation what we should do to overcome this Issue.

On racnode1 there is an additional device allocated - CDROM and it is the one which caused the problem for changing the device names.
root@controlhost01:~# ldm list -o disk racnode
NAME
racnode
 
DISK
    NAME             VOLUME                       TOUT ID   DEVICE  SERVER         MPGROUP
    OS               OS@racnode                      0      disk@0  primary
    cdrom            cdrom@racnode                   1      disk@1  primary
    DATA             DATA@RACDisk                    5      disk@5  primary
    OCR1             OCR1@RACDisk                    2      disk@2  primary
    OCR2             OCR2@RACDisk                    3      disk@3  primary
    OCR3             OCR3@RACDisk                    4      disk@4  primary
    ARCH             ARCH@RACDisk                    6      disk@6  primary
 
root@controlhost01:~#
 
On racnode2 CDROM device doesn't even exists:
 
root@controlhost02:~# ldm list -o disk racnode
NAME
racnode
 
DISK
    NAME             VOLUME                       TOUT ID   DEVICE  SERVER         MPGROUP
    OS               OS@racnode                        0    disk@0  primary
    OCR1             OCR1@RACDisk                      2    disk@2  primary
    OCR2             OCR2@RACDisk                      3    disk@3  primary
    OCR3             OCR3@RACDisk                      4    disk@4  primary
    DATA             DATA@RACDisk                      5    disk@5  primary
    ARCH             ARCH@RACDisk                      1    disk@1  primary
 
root@controlhost02:~#

 

There is no CDROM available for racnode2.

The solution for this issue is to remove the incorrect logical device from the guest domain using controller domain and assign it again with correct  logical name.
We need to even delete the CDROM from racnode1 guest domain.
Remove the CDROM:
root@controlhost01:~# ldm rm-vdisk cdrom racnode1
Check the status of the devices:
root@controlhost01:~# ldm list -o disk racnode
NAME
database
 
DISK
    NAME             VOLUME                      TOUT ID   DEVICE  SERVER         MPGROUP
    OS               OS@RACDisk                        0    disk@0  primary      
    DATA             DATA@RACDisk                      5    disk@5  primary      
    OCR1             OCR1@RACDisk                      2    disk@2  primary      
    OCR2             OCR2@RACDisk                      3    disk@3  primary      
    OCR3             OCR3@RACDisk                      4    disk@4  primary      
    ARCH             ARCH@RACDisk                      6    disk@6  primary      
 
root@controlhost01:~#
 
- The CDROM is now removed from the guest domain.
Now its time  to remove the ASM logical device, so we must ensure that backup is done and it can be restored.  The ASM is not identifying the device once its removed and reconnected again. We must follow the complete procedure of provisioning new LUN to the ASM device.

Action Plan:

 
 
- Perform backup of data residing on the disk group
- Drop the disk group
- remove the device from the guest domain
- Add the device using the correct logical name
- label the device
- Change the ownership and permissions
- Create ASM Disk group
- Restore the data on the ASM disk group
 
 
I will not demonstrate the detailed steps for the above listed action plan. But I will  list down the steps which is required to be performed at the Controller domain and the Guest Domain.
-Remove the  incorrect logical device from the Guest Domain:
 
root@controlhost01:~# ldm rm-vdisk ARCH racnode
root@controlhost01:~# ldm list -o disk racnode
NAME
database
 
DISK
    NAME             VOLUME                      TOUT ID   DEVICE  SERVER                            MPGROUP
    OS               OS@RACDisk                        0    disk@0  primary                         
    DATA             DATA@RACDisk                      5    disk@5  primary                         
    OCR1             OCR1@RACDisk                      2    disk@2  primary                         
    OCR2             OCR2@RACDisk                      3    disk@3  primary                         
    OCR3             OCR3@RACDisk                      4    disk@4  primary                         
 
root@controlhost01:~#
 
There is no ARCH disk now after removal of logical device.
- Add logical device with correct Name
root@controlhost01:~# ldm add-vdisk id=1 ARCH ARCH@RACDisk racnode
root@controlhost01:~# ldm list -o disk racnode
NAME
database
 
DISK
    NAME             VOLUME                      TOUT ID   DEVICE  SERVER         MPGROUP
    OS               OS@RACDisk                        0    disk@0  primary
    DATA             DATA@RACDisk                      5    disk@5  primary
    OCR1             OCR1@RACDisk                      2    disk@2  primary
    OCR2             OCR2@RACDisk                      3    disk@3  primary
    OCR3             OCR3@RACDisk                      4    disk@4  primary
    ARCH             ARCH@RACDisk                      1    disk@1  primary
 
root@host01:~#
 
The newly added device is now available with correct logical device id.
Label the disk at the guest LDOM operating system , change the permission and ownership of the newly added device. After this step the device is ready to be used by ASM disk group.

Conclusion:

The purpose of writing this article is to help individuals who are implementing oracle RAC on Oracle Solaris LDOM's. It took three days for us  and oracle support to do the root cause analysis for this problem. I strongly recommend to verify the logical device names across the all cluster nodes before installing the cluster software by performing  multiple hard/soft reboots and it should also be tested even after cluster installations.
 
If we observed there is difference in the number of devices of racnode on controlhost01 and controlhost02.
controlhost01 has seven devices in total and controlhost02 has six devices in total and if you observe there is difference  in the logical name as well.

Installation of CTXSYS and Oracle text in Oracle Database 11g

$
0
0

Recently there was a requirement to import two prod schema’s on one of the test servers. So we created an empty database manually and imported the schema’s on the target server. The schema failed while importing some of the packages, functions and stored procedure. When we check the logs there are certain dependent database system objects was missing in the database and it is responsible for failing the import.

When i check the registry of the database only below components are listed :

SQL> select COMP_ID, COMP_NAME, STATUS from dba_registry;



COMP_ID                        COMP_NAME                                            STATUS

------------------------------ ------------------------------        --------------------------------------------

CATALOG                        Oracle Database Catalog Views                 VALID

CATPROC                        Oracle Database Packages and Types       VALID

The above components will be installed with database creation scripts :

@?/rdbms/admin/catalog.sql
@?/rdbms/admin/catproc.sql

The schemas which i was trying to import was looking for the objects in database components Oracle Workspace manager and Oracle text. So these components should be installed manually.

Installation of Oracle Workspace manager(OWM):

Execute script:

SQL> @$ORACLE_HOME/rdbms/admin/owminst.plb 

Function created.


Grant succeeded.



PL/SQL procedure successfully completed.


Grant succeeded.


Procedure created.



Type created.


Type created.



PL/SQL procedure successfully completed.


Table created.


Index created.



Index created.



Table created.

......

.....

Installation of Oracle Text:

Execute script:

SQL> spool text_install.txt
SQL> @?/ctx/admin/catctx.sql change_on_install SYSAUX TEMP NOLOCK
...creating user CTXSYS

User created.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
Grant succeeded.
.....
.......
SQL> connect "CTXSYS"/"change_on_install"
Connected.
SQL> @?/ctx/admin/defaults/dr0defin.sql "AMERICAN";
old 1: SELECT DECODE('&nls_language',
new 1: SELECT DECODE('AMERICAN',

LA
--
us

Creating lexer preference...

PL/SQL procedure successfully completed.

Creating wordlist preference...

PL/SQL procedure successfully completed.

Creating stoplist...

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.

Creating default policy...

PL/SQL procedure successfully completed.

SQL>

Lock user account:

SQL> alter user ctxsys account lock password expire;

User altered.

SQL>Verify Installation of Oracle TEXT:

SQL> set pages 1000
col object_name format a40
col object_type format a20
col comp_name format a30
column library_name format a8
column file_spec format a60 wrap
spool text_install_verification.logSQL> SQL> SQL> SQL> SQL> SQL>
SQL>
SQL> select comp_name, status, substr(version,1,10) as version from dba_registry where comp_id = 'CONTEXT';

COMP_NAME STATUS
------------------------------ --------------------------------------------
VERSION
----------------------------------------
Oracle Text VALID
11.2.0.4.0
SQL> select * from ctxsys.ctx_version;

VER_DICT
----------------------------------------
VER_CODE
----------------------------------------
11.2.0.4.0
11.2.0.4.0
SQL> select substr(ctxsys.dri_version,1,10) VER_CODE from dual;

VER_CODE
----------------------------------------
11.2.0.4.0

SQL> select object_type, count(*) from dba_objects where owner='CTXSYS' group by object_type;

OBJECT_TYPE COUNT(*)
-------------------- ----------
INDEX 63
TYPE BODY 6
INDEXTYPE 4
PROCEDURE 2
TABLE 50
TYPE 36
VIEW 77
FUNCTION 2
LIBRARY 1
PACKAGE BODY 63
OPERATOR 6
LOB 2
SEQUENCE 3
PACKAGE 74

14 rows selected.

Check components after installation:

SQL> select COMP_ID, COMP_NAME, STATUS from dba_registry; 

COMP_ID                        COMP_NAME                                             STATUS

------------------------------ ------------------------------            --------------------------------------------

CATALOG                        Oracle Database Catalog Views                  VALID

CATPROC                        Oracle Database Packages and Types        VALID

OWM                                Oracle Workspace Manager                          VALID

CONTEXT                        Oracle Text                                                       VALID

Conclusion:

If you are creating a database manually for importing database objects from any other database then we need to verify the installed database components in source database and we should install all similar components in new database which was created manually.

Purge Oracle Enterprise manager 12c – logfiles

$
0
0

Always its recommended to purge/move unwanted files from the system. Similarly we need to purge the logfiles for Oracle enterprise manager 12c cloud control. If we didn’t purge these logs it will keep on growing.

I received threshold alert for the disk space usage on OEM Server, All database logs/archiv are scheduled to purge are working fine. But still there are certain logfiles that should be purged.

Where are the logfiles which are consuming significant disk space?

/OEM/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1

[oem@oemnode1 EMGC_OMS1]$ ls -lrt
total 518388
drwxr-x--- 2 oem dba      4096 Feb 18  2013 security
drwxr----- 3 oem dba      4096 Feb 18  2013 cache
drwxr-x--- 5 oem dba      4096 Feb 18  2013 data
drwxr----- 3 oem dba      4096 Feb 18  2013 sysman
drwxr----- 3 oem dba      4096 Feb 18  2013 adr
drwxr----- 8 oem dba      4096 Feb 18  2013 stage
drwxr-x--- 5 oem dba      4096 Mar  2 15:53 tmp
drwxr-x--- 5 oem dba     24576 Jun 16 13:48 logs
[oem@oemnode1 EMGC_OMS1]$

Lets see the size of these files:

[oem@oemnode1 EMGC_OMS1]$ du -sh *
5.2G    adr
144K    cache
4.5M    data
25G     logs
4.0K    security
4.2M    stage
91M     sysman
825M    tmp
[oem@oemnode1 EMGC_OMS1]$

The directory “logs” is 25GB and it was never purged since the Installation (around 2 years). 25 GB in 2 years is not bad But still its not required to be on the system.

Check the file count inside logs directory:

[oem@oemnode1 logs]$ ls -l | wc -l
507
[oem@oemnode1 logs]$

Remove logfiles:

[oem@oemnode1 logs]$ rm EMGC_OMS1.out0*

Size after deleting the .out files:

[oem@oemnode1 logs]$ du -sh .
270M    .
[oem@oemnode1 logs]$

If you want to keep these logfiles for any future reference then you can tar these files and move it tape or any other backup server.

[oem@oemnode1 logs]$tar cvzf logs.tar.gz logs

Size after tar:

[oem@oemnode1 EMGC_OMS1]$ du -sh logs*
25G     logs
507M    logs.tar.gz
[oem@oemnode1 EMGC_OMS1]$

We can even schedule a cron job for periodic deletion of these files from the system.

Thanks for reading.

Oracle 12c RAC – olsnodes CLI options

$
0
0

This post will cover multiple options that can be used with “olsnodes” command.   This is one of the most useful command  that cab be used for Oracle Grid Infrastructure management & troubleshooting.

This command will be located under $GRID_HOME/bin and it can be executed with “root”, “GI user” and “DB User”.

  • To list nodes participating in the cluster:
[root@flexnode2 bin]# pwd
/u01/grid/12.1.0/bin
[root@flexnode2 bin]# ./olsnodes
flexnode1
flexnode2
flexnode3
flexnode4
flexnode5
[root@flexnode2 bin]#
  • To list nodes aparticipating in the cluster with node number:
[root@flexnode2 bin]# ./olsnodes -n
flexnode1 1
flexnode2 2
flexnode3 3
flexnode4 100
flexnode5 101
[root@flexnode2 bin]#
  • To list nodes with assigned virtual IP’s:
[root@flexnode2 bin]# ./olsnodes -i
flexnode1 192.168.2.21
flexnode2 192.168.2.25
flexnode3 192.168.2.26
flexnode4 <none>
flexnode5 <none>
[root@flexnode2 bin]#
  • To list the node roles participating in the cluster:
[root@flexnode2 bin]# ./olsnodes -a
flexnode1 Hub
flexnode2 Hub
flexnode3 Hub
flexnode4 Leaf
flexnode5 Leaf
[root@flexnode2 bin]#
  • To list node status:
[root@flexnode2 bin]# ./olsnodes -s
flexnode1 Active
flexnode2 Active
flexnode3 Active
flexnode4 Active
flexnode5 Active
[root@flexnode2 bin]#
  • To list nodes whether its pinned/unpinned:
[root@flexnode2 bin]# ./olsnodes -t
flexnode1 Unpinned
flexnode2 Unpinned
flexnode3 Unpinned
flexnode4 Unpinned
flexnode5 Unpinned
[root@flexnode2 bin]#
  • To list the private interconnect IP:
[root@flexnode2 bin]# ./olsnodes -l -p
flexnode2 10.10.2.82
[root@flexnode2 bin]#

This will only list the private Interconnect IP on the node where command is executed.

Conclusion:

This command is  not only useful for monitoring/managing the cluster, it is even very useful for troubleshooting the Grid Infrastructure operational issue.  This command is even used by Cluster verification utility (CVU) to list node names when “-n” option is provide.

12.1.0.2 – runInstaller precheck failed on kernel version

$
0
0

If you’re Installing an Oracle Database 12.1.0.2 on Oracle Linux 6 with kernel version 2.6.32 then you’re most likely to hit this issue. I was installing it for one of the customer and the runInstaller reported one of the mandatory check as failed as shown in below screen shot:

Error_kernel_examples

 

As per the below mentioned  BUG, community discussion and MOS tech note this warning can be ignored and Installation can be continued.

Bug 19287706 : 12.1.0.2 KERNEL OF PROPER VERSION IS NOT FOUND EXPECTED “2.6.39”

https://community.oracle.com/thread/3879942?start=0&tstart=0

12.1.0.2 Installation on OEL6 Fails with Error KERNEL OF PROPER VERSION IS NOT FOUND Expected = “2.6.39” (Doc ID 1924549.1)

As suggested by support in the MOS note we can simply ignore this pre-requisite condition and proceed with the Installation.

Hope helps :)

regards,

X A H E E R

Oracle

$
0
0
The Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle) is an object-relational database management system (ORDBMS) produced and marketed by Oracle Corporation.

Adding Nodes to Oracle 12c Flex Clustrer

$
0
0

Introduction:

Oracle Database with real application cluster options allows to run multiple database Instances on different physical/virtual servers sharing the database files residing on the shared storage.  In RAC the database is deployed on multiple cluster nodes but to application it will appear as single unified database. These multiple database servers shares workloads across different cluster nodes.

Starting from Oracle 12c oracle introduced a new cluster option- "Flex Cluster" which consists of the Hub nodes and Lead Nodes. In my previous article i have illustrated how we can configure the flex cluster using GNS  service.

http://www.toadworld.com/platforms/oracle/w/wiki/11508.oracle-12c-flex-cluster-installation-using-widows-dnsdhcp-server-part-i

http://www.toadworld.com/platforms/oracle/w/wiki/11509.oracle-12c-flex-cluster-installation-using-widows-dnsdhcp-server-part-ii

One of the major benefit of oracle real application cluster (RAC) is SCALABILITY,  Whenever Database servers runs out of capacity for computing resources then on traditional non cluster environment we need down times for upgrading CPU/RAM. But this is not the case with RAC, new database cluster nodes can be added on a fly without impacting the running operations. Later application load can be distributed across cluster nodes automatically or we can use server pool to define distributing load on cluster nodes.

 In this article we will see how we can add nodes to the existing flex cluster Environment. Adding of nodes to Flex Cluster is almost identical to adding of nodes to standard cluster but there are certain differences between standard cluster and flex cluster node addition.

Environment Details:

  • flexrac1 to flexrac5 arre existing cluster nodes. 
  • flexrac6 and flexrac8 are new cluster nodes.

S.No

Node Name

IP Address

SW Version

Node Description

1

DANODE1

192.168.2.1

2008

DNS/DHCP Server for Node

 

2

 

flexrac1

Public

Private

 

12.1.0.1

 

Cluster Hub-Node

192.168.2.81

10.10.2.81

3

flexrac2

192.168.2.82

10.10.2.82

12.1.0.1

Cluster Hub-Node

4

flexrac3

192.168.2.83

10.10.2.83

12.1.0.1

Cluster Hub-Node

5

flexrac4

192.168.2.84

10.10.2.84

12.1.0.1

Cluster Leaf-Node

6

flexrac5

192.168.2.85

10.10.2.85

12.1.0.1

Cluster Leaf-Node

7

flexrac6

192.168.2.86

10.10.2.86

12.1.0.1

New Cluster Hub-Node

8

Flexrac8

192.168.2.88

10.10.2.88

12.1.0.1

New Cluster Leaf-Node

 

The following diagram illustrates the node addition to an existing flex cluster environment:

Before we proceed with addition of cluster nodes. Lets recall some key points about flex cluster:

1 - HUB nodes only will run ASM instance, so shared storage must be configured for only HUB nodes

2 - LEAF nodes are not required for connecting to shared storage

3 - Virtual IP will be allocated to only HUB nodes

Pre-requisite steps:

1 - The new cluster nodes that are being added to existing cluster environment should have identical computing resources. 

2 - All operating system pre-requisites should be performed ahead  before adding it to the cluster

  •   OS users and groups
  •   Kernel parameters
  •   Install all OS level packages
  •   OS file system directories  for Grid and RDBMS Homes
  •   Configured shared devices on HUB nodes 
  •   Configure network interfaces for public and private network on all Hub and lead nodes.

3 - Enable SSH across all cluster nodes

4 -Verify all public and private network interfaces are able to communicate with each other.

5 - Add entries of new cluster nodes in GNS sub-domain  

5 - Execute cluvfy to identify  any missing pre-requisites,

Note: In flex cluster we should not add the VIP in the /etc/hosts file of any cluster nodes, as it will be assigned to all cluster nodes using GNS Service.


In this demonstration i will not cover the detailed steps of configuring pre-requisites like installation of rpm's, kernel parameters etc. I will try to focus on steps that are specific to the flex cluster.

Add new cluster node entries to the GNS Subdomain:

Flex cluster itself required the GNS configuration to be in place and all Virtual naming resolution is done through the GNS/DNS server. So we must add all new host entries to the GNS Sub domain.

- flexnode6 and flexnode8 are the new cluster nodes that will join the cluster,

Configure shared storage on HUB nodes:

We should attach the same shared devices from other Hub nodes and configure the asm library. 

Configure ASM library:

[root@flexnode6 ~]# oracleasm configure -i
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets ('[]'). Hitting <ENTER> without typing an
answer will keep that current value. Ctrl-C will abort.

Default user to own the driver interface []: oragrid
Default group to own the driver interface []: dbagrid
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
[root@flexnode6 ~]#

 SCAN ASM disks :

[root@flexnode6 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DISK1"
Unable to instantiate disk "DISK1"
Instantiating disk "DISK2"
Unable to instantiate disk "DISK2"
Instantiating disk "DISK3"
Unable to instantiate disk "DISK3"
[root@flexnode6 ~]#

- Still its not able to instantiate the disks.

Restart ASM library service:

[root@flexnode6 ~]# service oracleasm restart
Dropping Oracle ASMLib disks: [ OK ]
Shutting down the Oracle ASMLib driver: [ OK ]
Initializing the Oracle ASMLib driver: [ OK ]
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@flexnode6 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
[root@flexnode6 ~]#

- List ASM disks:

[root@flexnode6 ~]# oracleasm listdisks
DISK1
DISK2
DISK3
[root@flexnode6 ~]#

SSH Setup on all cluster nodes:

SSH should be setup between all cluster nodes, so we must configure SSH between the existing cluster nodes and the new nodes that are joining the cluster. We can use sshsetup.sh script for configuring the SSH from command line or we can setup using the GUI mode using the runInstaller. But in this article we will see only how we can use it from command line.

[oragrid@flexnode1 sshsetup]$ ./sshUserSetup.sh -user oragrid -hosts "flexnode1 flexnode2 flexnode3 flexnode4 flexnode5 flexnode6  flexnode8" -advance -confirm -noPromptPassphrase -confirm -advance
The output of this script is also logged into /tmp/sshUserSetup_2016-06-25-05-21-10.log
Hosts are flexnode1 flexnode2 flexnode3 flexnode4 flexnode5 flexnode6 flexnode7 flexnode8
user is oragrid
Platform:- Linux
Checking if the remote hosts are reachable
PING flexnode1.dbamaze.com (192.168.2.81) 56(84) bytes of data.
64 bytes from flexnode1.dbamaze.com (192.168.2.81): icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from flexnode1.dbamaze.com (192.168.2.81): icmp_seq=2 ttl=64 time=0.015 ms

--- flexnode1.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.000/0.015/0.047/0.017 ms
PING flexnode2.dbamaze.com (192.168.2.82) 56(84) bytes of data.
64 bytes from flexnode2.dbamaze.com (192.168.2.82): icmp_seq=1 ttl=64 time=0.237 ms
64 bytes from flexnode2.dbamaze.com (192.168.2.82): icmp_seq=2 ttl=64 time=0.385 ms

--- flexnode2.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 0.192/0.310/0.436/0.091 ms
PING flexnode3.dbamaze.com (192.168.2.83) 56(84) bytes of data.
64 bytes from flexnode3.dbamaze.com (192.168.2.83): icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from flexnode3.dbamaze.com (192.168.2.83): icmp_seq=2 ttl=64 time=0.973 ms

--- flexnode3.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.195/0.445/0.973/0.293 ms
PING flexnode4.dbamaze.com (192.168.2.84) 56(84) bytes of data.
64 bytes from flexnode4.dbamaze.com (192.168.2.84): icmp_seq=1 ttl=64 time=0.228 ms
64 bytes from flexnode4.dbamaze.com (192.168.2.84): icmp_seq=2 ttl=64 time=0.245 ms

--- flexnode4.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.228/0.362/0.610/0.154 ms
PING flexnode5.dbamaze.com (192.168.2.85) 56(84) bytes of data.
64 bytes from flexnode5.dbamaze.com (192.168.2.85): icmp_seq=1 ttl=64 time=0.286 ms
64 bytes from flexnode5.dbamaze.com (192.168.2.85): icmp_seq=2 ttl=64 time=0.538 ms

--- flexnode5.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 0.244/0.364/0.538/0.101 ms
PING flexnode6.dbamaze.com (192.168.2.86) 56(84) bytes of data.
64 bytes from flexnode6.dbamaze.com (192.168.2.86): icmp_seq=1 ttl=64 time=0.408 ms
64 bytes from flexnode6.dbamaze.com (192.168.2.86): icmp_seq=2 ttl=64 time=0.370 ms

--- flexnode6.dbamaze.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.144/0.285/0.408/0.112 ms
PING flexnode7.dbamaze.com (192.168.2.87) 56(84) bytes of data.

--- flexnode7.dbamaze.com ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4000ms

PING flexnode8.dbamaze.com (192.168.2.88) 56(84) bytes of data.
64 bytes from flexnode8.dbamaze.com (192.168.2.88): icmp_seq=1 ttl=64 time=1.47 ms
64 bytes from flexnode8.dbamaze.com (192.168.2.88): icmp_seq=2 ttl=64 time=0.470 ms

Verify SSH connectivity between nodes:

[oragrid@flexnode1 sshsetup]$ ssh flexnode6 date
Sat Jun 25 05:29:32 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode2 date
Sat Jun 25 05:30:00 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode3 date
Sat Jun 25 05:30:38 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode4 date
Sat Jun 25 05:30:09 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode5 date
Sat Jun 25 05:29:58 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode6 date
Sat Jun 25 05:30:03 AST 2016
[oragrid@flexnode1 sshsetup]$ ssh flexnode8 date
Sat Jun 25 05:30:12 AST 2016
[oragrid@flexnode1 sshsetup]$

Execute cluvfy from any one of the active cluster node

Cluvfy is the utility helpful for identifying the missing pre-requisite on the cluster nodes. This command should be executed from any one of the active cluster node recommend to execute from the HUB node. I have not included the detailed output of cluvfy, only those areas are listed which needs attention.

 

 [oragrid@flexnode1 ]$cluvfy stage -pre nodeadd -n flexnode6, flexnode8 -fixup -verbose 

 

Checking ASMLib configuration.
Node Name Status
------------------------------------ ------------------------
flexnode1 passed
flexnode6 passed
flexnode7 passed
flexnode8 (failed) ASMLib configuration is incorrect.

ERROR:
PRVF-10110 : ASMLib is not configured correctly on the node "flexnode8"
Result: Check for ASMLib configuration failed.

Checking Flex Cluster node role configuration...
Flex Cluster node role configuration check passed


NOTE:
No fixable verification failures to fix

Pre-check for node addition was unsuccessful.
Checks did not pass for the following node(s):
flexnode8
[oragrid@flexnode1 ~]$

 flexnode8 is a leaf node, so this Error can be safely ignored.

 

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "flexrac1.dbamaze.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
Public network subnets "192.168.2.0, 192.168.2.0, 192.168.2.0" match with the GNS VIP "192.168.2.0, 192.168.2.0, 192.168.2.0"
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.2.20" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "flexrac1.dbamaze.com" are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable
Checking status of GNS resource...
Node Running? Enabled?
------------ ------------------------ ------------------------
flexnode1 no yes
flexnode2 yes yes
flexnode3 no yes
flexnode4 no yes
flexnode5 no yes

GNS resource configuration check passed
Checking status of GNS VIP resource...
Node Running? Enabled?
------------ ------------------------ ------------------------
flexnode1 no yes
flexnode2 yes yes
flexnode3 no yes
flexnode4 no yes
flexnode5 no yes

GNS VIP resource configuration check passed.

GNS integrity check passed

Checking Flex Cluster node role configuration...
Flex Cluster node role configuration check passed

- This should be passed as its required for allocating the VIP from GNS, if this check is failed then we may not be able to add the nodes successfully.

Addition of Cluster Nodes:

Once all pre-requisites are in place then we are now ready to add the new cluster nodes. 

Check the existing cluster nodes before addition of nodes:

[oragrid@flexnode1 addnode]$ olsnodes -a
flexnode1 Hub
flexnode2 Hub
flexnode3 Hub
flexnode4 Leaf
flexnode5 Leaf
[oragrid@flexnode1 addnode]$

- Execute addnode.sh script from $GI_HOME/addnode directory

addnode.sh can be executed in GUI or CLI mode, In this demonstration I am using the GUI mode. But for CLI mode we just need to specify "-silent" option with addnode.sh script.

We can add flex node and leaf nodes at the same time by specifying the role of the node in addnode.sh command parameters. 

addnode.sh script should be executed from any one of the active cluster nodes, in this demonstration its executed from "flexnode3"

[oragrid@flexnode1 addnode]$ ./addnode.sh "CLUSTER_NEW_NODES={flexnode6,flexnode8}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={flexnode6-vip}" "CLUSTER_NEW_NODE_ROLES={hub,leaf}"
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 4575 MB Passed
Checking swap space: must be greater than 150 MB. Actual 5210 MB Passed
Checking monitor: must be configured to display at least 256 colors. Actual 16777216 Passed

- We should ensure that HUB and LEAF node roles are allocated appropriately. For HUB nodes we can "AUTO" for virtual hostname, as all virtual hostnames will be allocated automatically.

 .

- Ensure that SSH connectivity is working properly.

- Pre-requisite checks is progress

- All pre-requisite checks are successful and now we are ready to install  GI software on new cluster nodes.

-Copying GI home to remote nodes is in progress

- Copying of GI home completed successfully, now we are ready to execute root.sh script on new cluster nodes.

Execution of orainstRoot.sh:


The scripts should be executed in same sequence as listed in above screen.

[root@flexnode6 oraInventory]# ls
ContentsXML logs oraInst.loc orainstRoot.sh
[root@flexnode6 oraInventory]# sh orainstRoot.sh
Changing permissions of /u01/oracle/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/oracle/oraInventory to dbagrid.
The execution of the script is complete.
[root@flexnode6 oraInventory]#

- Similarly execute on  flexnode8.

Execution of root.sh:


[root@flexnode6 oraInventory]# /u01/grid/12.1.0/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
ORACLE_OWNER= oragrid
ORACLE_HOME= /u01/grid/12.1.0

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/grid/12.1.0/crs/install/crsconfig_params
2016/07/04 19:57:43 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
2016/07/04 19:58:12 CLSRSC-330: Adding Clusterware entries to file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'flexnode6'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'flexnode6'
CRS-2677: Stop of 'ora.drivers.acfs' on 'flexnode6' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'flexnode6' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'flexnode6'
CRS-2672: Attempting to start 'ora.evmd' on 'flexnode6'
CRS-2676: Start of 'ora.mdnsd' on 'flexnode6' succeeded
CRS-2676: Start of 'ora.evmd' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'flexnode6'
CRS-2676: Start of 'ora.gpnpd' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'flexnode6'
CRS-2676: Start of 'ora.gipcd' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'flexnode6'
CRS-2676: Start of 'ora.cssdmonitor' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'flexnode6'
CRS-2672: Attempting to start 'ora.diskmon' on 'flexnode6'
CRS-2676: Start of 'ora.diskmon' on 'flexnode6' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'flexnode6'
CRS-2676: Start of 'ora.cssd' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'flexnode6'
CRS-2672: Attempting to start 'ora.ctssd' on 'flexnode6'
CRS-2676: Start of 'ora.ctssd' on 'flexnode6' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'flexnode6'
CRS-2676: Start of 'ora.asm' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'flexnode6'
CRS-2676: Start of 'ora.storage' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'flexnode6'
CRS-2676: Start of 'ora.crf' on 'flexnode6' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'flexnode6'
CRS-2676: Start of 'ora.crsd' on 'flexnode6' succeeded
CRS-6017: Processing resource auto-start for servers: flexnode6
CRS-2672: Attempting to start 'ora.ons' on 'flexnode6'
CRS-2672: Attempting to start 'ora.proxy_advm' on 'flexnode6'
CRS-2676: Start of 'ora.ons' on 'flexnode6' succeeded
CRS-2676: Start of 'ora.proxy_advm' on 'flexnode6' succeeded
CRS-6016: Resource auto-start has completed for server flexnode6
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2016/07/04 20:02:19 CLSRSC-343: Successfully started Oracle clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/07/04 20:02:36 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexnode6 oraInventory]#


- Similary we should execute on flexnode8.

Node addition completed successfully

Node status after execution of root.sh script:

[oragrid@flexnode1 addnode]$ olsnodes -a
flexnode1 Hub
flexnode2 Hub
flexnode3 Hub
flexnode6 Hub
flexnode4 Leaf
flexnode5 Leaf
flexnode8 Leaf
[oragrid@flexnode1 addnode]$

Verify the cluster services:

[oragrid@flexnode1 addnode]$ crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....SM.lsnr ora....er.type 0/5 0/ ONLINE ONLINE flexnode1
ora.DATA.dg ora....up.type 0/5 0/ ONLINE ONLINE flexnode1
ora.GRID.dg ora....up.type 0/5 0/ ONLINE ONLINE flexnode1
ora....ER.lsnr ora....er.type 0/5 0/ ONLINE ONLINE flexnode1
ora....AF.lsnr ora....er.type 0/5 0/ OFFLINE OFFLINE
ora....N1.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexnode1
ora....N2.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexnode2
ora....N3.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexnode3
ora.MGMTLSNR ora....nr.type 0/0 0/0 ONLINE ONLINE flexnode3
ora.asm ora.asm.type 0/5 0/0 ONLINE ONLINE flexnode2
ora.cvu ora.cvu.type 0/5 0/0 ONLINE ONLINE flexnode3
ora.flexcdb.db ora....se.type 0/2 0/1 ONLINE ONLINE flexnode1
ora....E1.lsnr application 0/5 0/0 ONLINE ONLINE flexnode1
ora....de1.ons application 0/3 0/0 ONLINE ONLINE flexnode1
ora....de1.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexnode1
ora....E2.lsnr application 0/5 0/0 ONLINE ONLINE flexnode2
ora....de2.ons application 0/3 0/0 ONLINE ONLINE flexnode2
ora....de2.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexnode2
ora....E3.lsnr application 0/5 0/0 ONLINE ONLINE flexnode3
ora....de3.ons application 0/3 0/0 ONLINE ONLINE flexnode3
ora....de3.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexnode3
ora....E6.lsnr application 0/5 0/0 ONLINE ONLINE flexnode6
ora....de6.ons application 0/3 0/0 ONLINE ONLINE flexnode6
ora....de6.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexnode6
ora.gns ora.gns.type 0/5 0/0 ONLINE ONLINE flexnode3
ora.gns.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexnode3
ora.mgmtdb ora....db.type 0/2 0/1 ONLINE ONLINE flexnode3
ora....network ora....rk.type 0/5 0/ ONLINE ONLINE flexnode1
ora.oc4j ora.oc4j.type 0/1 0/2 ONLINE ONLINE flexnode3
ora.ons ora.ons.type 0/3 0/ ONLINE ONLINE flexnode1
ora.proxy_advm ora....vm.type 0/5 0/ ONLINE ONLINE flexnode1
ora.scan1.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexnode1
ora.scan2.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexnode2
ora.scan3.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexnode3
[oragrid@flexnode1 addnode]$

startup ASM instance on new HUB node:

ASM instance may be started on newly added HUB node, we may need to startup the ASM instance on newly added HUB node manually.

[root@flexnode6 oraInventory]# ps -ef | grep pmon
oragrid 3800 1 0 Jul04 ? 00:00:01 apx_pmon_+APX4
root 12505 881 0 01:23 pts/2 00:00:00 grep pmon
[root@flexnode6 oraInventory]#


[oragrid@flexnode6 ~]$ . oraenv
ORACLE_SID = [+ASM3] ? +ASM4
The Oracle base has been changed from to /u01/oracle


[oragrid@flexnode6 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.1.0 Production on Tue Jul 5 01:28:49 2016

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup
ASM instance started

Total System Global Area 1135747072 bytes
Fixed Size 2297344 bytes
Variable Size 1108283904 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL>

- Check the status of ASM Instance 

SQL> select INST_ID, INSTANCE_NUMBER, INSTANCE_NAME, HOST_NAME from gv$instance;

INST_ID     INSTANCE_NUMBER    INSTANCE_NAME     HOST_NAME
---------- ---------------  ----------------   ----------------------------------------------------------------
4                4               +ASM4            flexnode6
3                3               +ASM3            flexnode3
2                2               +ASM2            flexnode2
1                1               +ASM1            flexnode1

SQL>

Conclusion:

Oracle RAC allows us to add nodes to existing cluster nodes without disrupting the existing services. The procedure for adding nodes for standard cluster and flex cluster is similar with some minor differences. We can add HUB nodes and LEAF nodes at same time. We should ensure that GNS sub domian delegation has enough range of IP's reserved with respect to the number of nodes that are being added to the existing cluster. If the reserved IP lease is lower then the number of nodes that are being added to the cluster the node addition to the existing cluster may fail. 



 

    

      


adpatch – Relink of module “fndrwxit.so” failed

$
0
0

Recently we were doing EBS patching activity for one of our customer and adpatch was failing while relinking one of the library file.  The error message during patch run time as below:

one with link of fnd executable 'fndrwxit.so' on Tue Jun 28 22:18:39 SAUST 2016
Relink of module "fndrwxit.so" failed.
See error messages above (also recorded in log file) for possible
reasons for the failure. Also, please check that the Unix userid
running adrelink has read, write, and execute permissions
on the directory /oraprod/R12/apps/apps_st/appl/fnd/12.0.0/bin,
and that there is sufficient space remaining on the disk partition
containing your Oracle Applications installation.

Done with link of product 'fnd' on Tue Jun 28 22:18:39 SAUST 2016
adrelink is exiting with status 1

End of adrelink session
Date/time is Tue Jun 28 22:18:39 SAUST 2016
**********************************************************

Line-wrapping log file for readability ...
Done line-wrapping log file.

Original copy is /oraprod/R12/apps/apps_st/appl/admin/PROD/log/adrelink.lsv
New copy is /oraprod/R12/apps/apps_st/appl/admin/PROD/log/adrelink.log
An error occurred while relinking application programs.
Continue as if it were successful [No] :

The error message reports to check the disk space on the server, but there is no problem for disk space.

Cause:

The problem is some of the application process are not stopped before starting the ad patch session.

[oraprod@PRODERP[/oraprod]#ps -ef | grep FND
oraprod 11075672 21102602 0 22:21:30 pts/5 0:00 grep FND
oraprod 11272290 19267706 0 22:02:13 - 0:02 FNDLIBR
oraprod 20578480 1 120 Jun 24 - 1782:52 FNDLIBR
[oraprod@PRODERP[/oraprod]#

Solution:

Kill the running session.

  • We can continue the existing patching session by choosing “Yes” when prompted for “Continue as if it were successful [No] :”
  • We can even abort the existing patching session and start again after killing the running application processes.

I have encountered this issue on EBS R12.1.3 running on AIX 6.1 but same may applicable on other environments as well.

Thank for reading :)

regards,
X A H E E R

adworker failed – XLIFFImporter.class

$
0
0

In one of our recent patching activity adpatch session was hanging without any progress. The patching session was hanging as listed below:

Assigned: file XLIFFImporter.class on worker 21 for product fnd username APPLSYS.
Assigned: file XLIFFImporter.class on worker 22 for product fnd username APPLSYS.
Assigned: file XLIFFImporter.class on worker 23 for product fnd username APPLSYS.
Assigned: file XLIFFImporter.class on worker 24 for product fnd username APPLSYS.
Program completed successfully

Program completed successfully

Completed: file ar12amg.ldt on worker 1 for product ar username APPS.
Completed: file ARXLAAAD.ldt on worker 5 for product ar username APPS.
Assigned: file XLIFFImporter.class on worker 1 for product fnd username APPLSYS.
Assigned: file XLIFFImporter.class on worker 5 for product fnd username APPLSYS.
Program completed successfully

Completed: file ARXLASD.ldt on worker 6 for product ar username APPS.
Assigned: file XLIFFImporter.class on worker 6 for product fnd username APPLSYS.

When we check the logfile the following error messages are reported:

Calling /oraprod/R12/apps/tech_st/10.1.3/appsutil/jdk/jre/bin/java ...
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Broken pipe
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:421)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:531)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:221)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:503)
at java.sql.DriverManager.getConnection(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
at oracle.apps.ad.worker.AdJavaWorker.getAppsConnection(AdJavaWorker.java:1041)
at oracle.apps.ad.worker.AdJavaWorker.main(AdJavaWorker.java:276)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
at java.net.SocketOutputStream.write(SocketOutputStream.java:147)
at oracle.net.ns.DataPacket.send(DataPacket.java:199)
at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:122)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:78)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1179)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1155)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:279)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:366)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:752)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:359)
... 8 more

Cause:

English version of patch completed successfully without any issues but arabic version of patch failed with above mentioned error message

Solution:

  • Abort the current adpatch session.
  • Add pramater “SQLNET.INBOUND_CONNECT_TIME=0” in sqlnet.ora
  • Add paramter “connect_timeout_ = 0” in listener.ora
  • Restart the database and the DB listener
  • Restart adpatch and choose to resume the previous session.

Drop of TEMP tablespace – Hanging

$
0
0

We were trying to drop the temporary Tablespace to free some space on one of the mount point. We have created another TEMP tablespace and make its as a default TEMP tablespace.

When we issue command for dropping TEMP TS its hanging:

SQL> drop tablespace TEMP including contents and datafiles;

Hanging

Cause:

The current acctive session are using the block from the previous table space and hence its unable to proceed with the DROP statement.

SQL> select count(1) from v$sort_usage;

COUNT(1)
----------
21
SQL> select USERNAME, TS# TABLESPACE, SQL_ID from v$sort_usage;
USERNAME                              TS# TABLESPACE                     SQL_ID
------------------------------ ---------- ------------------------------ -------------
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           7x520x1zf28y6
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           5d0ymkhc1f6qb
XMETA                                   3 TEMP                           5d0ymkhc1f6qb
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           7x520x1zf28y6
XMETA                                   3 TEMP                           5d0ymkhc1f6qb

USERNAME                              TS# TABLESPACE                     SQL_ID
------------------------------ ---------- ------------------------------ -------------
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           3hkjq203y2wth
XMETA                                   3 TEMP                           5d0ymkhc1f6qb
XMETA                                   3 TEMP                           7x520x1zf28y6
XMETA                                   3 TEMP                           bz7kpsapkc870
DSODB                                   3 TEMP                           dv7xcynf4yy74
DSODB                                   3 TEMP                           60vb41snxxabf

Solution:

  • Kill all session occupied in the old temporary tablespace and then try to drop it.
  • If its a TEST/DEV DB and downtime is possible you can even restart the Instance.

RC-50013: Fatal: Instantiate driver – EBS R12.2 Clone

$
0
0

The cloning process of Oracle EBS is simple but always the troubleshooting the clone issue are always not easy. Recently I was performing clone configuration steps on 12.2.5 (dbTier) on target system and encountered below listed issue:

WARNING: [AutoConfig Error Report]
The following report lists errors AutoConfig encountered during each
phase of its execution. Errors are grouped by directory and phase.
The report format is:

[APPLY PHASE]
AutoConfig could not successfully execute the following scripts:
Directory: /u01/ora_prod/PROD/12.1.0/perl/bin/perl -I /u01/ora_prod/PROD/12.1.0/perl/lib/5.14.1 -I /u01/ora_prod/PROD/12.1.0/perl/lib/site_perl/5.14.1 -I /u01/ora_prod/PROD/12.1.0/appsutil/perl /u01/ora_prod/PROD/12.1.0/appsutil/clone
ouicli.pl INSTE8_APPLY 1

AutoConfig is exiting with status 1

WARNING: RC-50013: Fatal: Instantiate driver did not complete successfully.
/u01/ora_prod/PROD/12.1.0/appsutil/driver/regclone.drv

If you search google then this error will point you to the incorrect version of perl which is not the case here.

Cause:

The script was trying to register the oracle home in inventory but its failing to update it.

Solution:

We need to ensure there are no files currently located in the Inventory directory, If there are any files in this directory then we should delete these files.

[root@erpnode1 ora_prod]# cd /u01/ora_prod/oraInventory
[root@erpnode1 oraInventory]# ls
ContentsXML logs oui
[root@erpnode1 oraInventory]# ls -lrt
total 12
drwxrwx--- 2 oracle dba 4096 Jun 16 01:34 oui
drwxrwx--- 2 oracle dba 4096 Jun 16 01:40 ContentsXML
drwxrwx--- 2 oracle dba 4096 Jul 15 09:56 logs
[root@erpnode1 oraInventory]# rm -rf *
[root@erpnode1 oraInventory]#

Execute adfcglcone.pl again and it should complete without issues.

Beginning database tier Apply - Fri Jul 15 10:06:46 2016

/u01/ora_prod/PROD/12.1.0/appsutil/clone/bin/../jre/bin/java -Xmx600M -DCONTEXT_VALIDATED=false -Doracle.installer.oui_loc=/u01/ora_prod/PROD/12.1.0/oui -classpath /u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/xmlparserv2.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/ojdbc6.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/java:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/oui/OraInstaller.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/oui/ewt3.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/oui/share.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/oui/srvm.jar:/u01/ora_prod/PROD/12.1.0/appsutil/clone/jlib/ojmisc.jar oracle.apps.ad.clone.ApplyDBTier -e /u01/ora_prod/PROD/12.1.0/appsutil/PROD_erpnode1.xml -stage /u01/ora_prod/PROD/12.1.0/appsutil/clone -showProgress
APPS Password : Log file located at /u01/ora_prod/PROD/12.1.0/appsutil/log/PROD_erpnode1/ApplyDBTier_07151006.log
| 0% completed
Log file located at /u01/ora_prod/PROD/12.1.0/appsutil/log/PROD_erpnode1/ApplyDBTier_07151006.log
| 15% completed

Completed Apply...
Fri Jul 15 10:15:32 2016

Thanks for reading :)

regards,
X A H E E R

addnode Oracle 12c RAC – Error on Saving cluster Inventory

$
0
0

In one of the recent assignment we were adding 2 cluster nodes to an existing 8 node cluster and we encountered an error while saving a cluster Inventory on the new cluster nodes. The following is the screen shot where it encountered the error.

F08

 

F10

Cause:

The Inventory location doesn’t exists on the new cluster nodes and oracle clusterware add node command is  trying to update the inventory on the same location of the existing cluster nodes.

Solution:

We can manually execute the below listed command after creating the required directories on the new cluster nodes.

/u01/grid/12.1.0/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/u01/grid/12.1.0 ORACLE_HOME_NAME=OraGI12Home1
CLUSTER_NODES=flexnoded1,flexnode2,flexnode3,flexnode6,flexnode8 CRS=true "INVENTORY_LOCATION=/u01/oracle/oraInventory" -invPtrLoac "/etc/orainst.loc"
LOCAL_NODE=flexnode6
  • Here in CLUSTER_NODES we should specify all nodes that will be participating in a cluster, including the new nodes that are being added to the cluster
  • LOCAL_NODE is the node from which the Inventory is being updated.
  • ORACLE_HOME is the Grid Infrastructure Home

Conclusion:

Its recommended to create Inventory files and directories with proper privileges on new cluster servers that are being added to the existing cluster nodes.

 

Oracle EBS R12.2 Installation Issue with Start CD 51

$
0
0

Oracle EBS 12.2 Installation is performed by using the start here CD and oracle keeps on releasing the newer version of start CD’s to include bug fixes and new enhancements. Currently start CD 51 is the latest available start CD .Earlier I’ve posted a blog about start CD 51 please refer i want to know more information about start CD 51:

using start CD 12.2.0.51 with existing EBS Stage R12.2

This post I will discuss the issues which i encountered while Installing on Oracle Linux 6.1.

The pre-Install check for rapidwiz was completing without issues and the database tier installation cannot be proceed further due to an Invalid kernel version.

Error_R122

Error in appsutil logfile:

Executing command: /u01/stage/stage_R122_51/startCD/Disk1/rapidwiz/jre/Linux_x64/1.6.0//bin/java/bin/java -cp /u01/ora_prod/PROD/12.1.0/temp/PROD_erpnode1/DBInstallHome/fnd/../j22065856_fnd.zip:/u01/ora_prod/PROD/12.1.0/temp/PROD_erpnode1/DBInstallHome/fnd/java/3rdparty/stdalone/xmlparserv2.zip -Doracle.apps.fnd.txk.env_home=/u01/ora_prod/PROD/12.1.0/temp/PROD_erpnode1/ -Doracle.apps.fnd.txk.runtime.config=/u01/ora_prod/PROD/12.1.0/temp/PROD_erpnode1/xmldocs/instDB.xml oracle.apps.fnd.txk.config.InstallService

Fatal Error: TXK Install Service

oracle.apps.fnd.txk.config.ProcessStateException: OUI process failed : Exit=253 See log for details. CMD= /u01/stage/stage_R122_51/TechInstallMedia/database/examples/runInstaller -waitForCompletion -ignoreSysPrereqs -force -silent -responseFile /u01/ora_prod/PROD/12.1.0/temp/PROD_erpnode1/cfgHome/response/DB_HOME/txkDB12cR1_12102_examples.rsp

at oracle.apps.fnd.txk.config.OUIPatchActionNode.processState(OUIPatchActionNode.java:160)

at oracle.apps.fnd.txk.config.PatchActionNode.processState(PatchActionNode.java:187)

at oracle.apps.fnd.txk.config.PatchNode.processState(PatchNode.java:338)

at oracle.apps.fnd.txk.config.PatchesNode.processState(PatchesNode.java:79)

at oracle.apps.fnd.txk.config.InstallNode.processState(InstallNode.java:68)

at oracle.apps.fnd.txk.config.TXKTopology.traverse(TXKTopology.java:594)

at oracle.apps.fnd.txk.config.InstallService.doInvoke(InstallService.java:224)

at oracle.apps.fnd.txk.config.InstallService.invoke(InstallService.java:237)

at oracle.apps.fnd.txk.config.InstallService.main(InstallService.java:291)

When we check the installation logfile exact same error is reported without any clue why the rapid Install is failing on the database tier and there is no document available in MOS.

On further investigation for Installation logfiles of “Inventory” then noticed that as pre-requisite database check its listing the kernel version as invalid, though its the supported version.

more /u01/ora_prod/oraInventory/logs/installActions2016-06-15_10-56-24AM.log
INFO: -----------------------------------------------
INFO: Verification Result for Node:erpnode1
INFO: Expected Value:x86_64
INFO: Actual Value:x86_64
INFO: -----------------------------------------------
INFO: *********************************************
INFO: OS Kernel Version: This is a prerequisite condition to test whether the system kernel version is at least "2.6.39".
INFO: Severity:CRITICAL
INFO: OverallStatus:VERIFICATION_FAILED
INFO: -----------------------------------------------
INFO: Verification Result for Node:erpnode1
INFO: Expected Value:2.6.39
INFO: Actual Value:2.6.32-100.34.1.el6uek.x86_64INFO: Error Message:Kernel of proper version is not found on node "erpnode1" [Expected = "2.6.39" ; Found = "2.6.32-100.34.1.el6uek.x86_64"]
INFO: Cause:Cause Of Problem Not Available
INFO: Action:User Action Not Available
INFO: -----------------------------------------------

The Installation was not proceeding further, so  modified the RDBMS pre-requisite file as below for Installation to continue further:

$STAGE/stage_R122_51/TechInstallMedia/database/database/stage/cvu/cvu_prereq.xml

We should update the kernel pre-requisite check in the above listed file “KERNEL_VER VALUE=”2.6.39″ to KERNEL_VER VALUE=”2.6.32”

File Entries before modification:

<OPERATING_SYSTEM RELEASE="OEL6"><VERSION VALUE="6"/><ARCHITECTURE VALUE="x86_64"/><NAME VALUE="Linux"/><VENDOR VALUE="enterprise"/><KERNEL_VER VALUE="2.6.39"/><KERNEL><PROPERTY NAME="semmsl" NAME2="semmsl2" VALUE="250" SEVERITY="IGNORABLE"/><PROPERTY NAME="semmns" VALUE="32000" SEVERITY="IGNORABLE"/><PROPERTY NAME="semopm" VALUE="100" SEVERITY="IGNORABLE"/><PROPERTY NAME="semmni" VALUE="128" SEVERITY="IGNORABLE"/><PROPERTY NAME="shmmax" SEVERITY="IGNORABLE">

File Entry After Modification:

<OPERATING_SYSTEM RELEASE="OEL6"><VERSION VALUE="6"/><ARCHITECTURE VALUE="x86_64"/><NAME VALUE="Linux"/><VENDOR VALUE="enterprise"/><KERNEL_VER VALUE="2.6.32"/><KERNEL><PROPERTY NAME="semmsl" NAME2="semmsl2" VALUE="250" SEVERITY="IGNORABLE"/><PROPERTY NAME="semmns" VALUE="32000" SEVERITY="IGNORABLE"/><PROPERTY NAME="semopm" VALUE="100" SEVERITY="IGNORABLE"/><PROPERTY NAME="semmni" VALUE="128" SEVERITY="IGNORABLE"/><PROPERTY NAME="shmmax" SEVERITY="IGNORABLE"><STEPS><STEP NAME="PHYSICAL_MEMORY" GREATER_THAN="1024" UNIT="MB" MULTIPLE="0.5"/></STEPS>

After modifying this the check will be logged successful in database Inventory logs as shown below:

INFO: -----------------------------------------------
INFO: Verification Result for Node:erpnode1
INFO: Expected Value:x86_64
INFO: Actual Value:x86_64
INFO: -----------------------------------------------
INFO: *********************************************INFO: OS Kernel Version: This is a prerequisite condition to test whether the system kernel version is at least "2.6.32".
INFO: Severity:CRITICAL
INFO: OverallStatus:SUCCESSFUL
INFO: -----------------------------------------------INFO: Verification Result for Node:erpnode1
INFO: Expected Value:2.6.32
INFO: Actual Value:2.6.32-100.34.1.el6uek.x86_64
INFO: -----------------------------------------------
INFO: *********************************************
INFO: OS Kernel Parameter: semmsl: This is a prerequisite condition to test whether the OS kernel parameter "semmsl" is properly set.
INFO: Severity:IGNORABLE

But unfortunately the Installation interrupted again  with same issue  :(

But this time it failed for examples CD. Exact similar error in Inventory logfiles. So we must modify the file in examples staging directory as well.

Conclusion:

This issue is reported in a BUG “Bug 19287706 : 12.1.0.2 KERNEL OF PROPER VERSION IS NOT FOUND EXPECTED “2.6.39”” and its suggest we can safely ignore this warning and proceed with the Installation. But this is not the case with rapid Install and its not showing it as a warning during pre-install checks. So we must update the values accordingly.

I’ve opened S “SR 3-12937598811 : Fatal Error: TXK Install Service – rapidwiz install R12.2 start CD 51” reporting this issue and the solution which i have followed and now oracle officially release MOS note “Rapidwiz Install R12.2 Start CD 51 Fails With Fatal Error: TXK Install Service in Database Pre-install Checks (Doc ID 2155494.1)“reporting the issue with the solution what i have provided.

Oracle 12c RAC Flex Cluster Mystery

$
0
0

Introduction

As discussed earlier Oracle Flex ASM and Flex Cluster is the new option introduced from oracle 12c Grid infrastructure.  In my previous articles we have seen how we can install, configure  and scale the Flex Cluster.  This article will demonstrate some of the hidden facts about Oracle 12c Flex Cluster.

In recent projects we worked on Oracle 12c Flex cluster deployment and we came across some of the issues which are not documented. This article will cover  two major issue we encountered while working with oracle flex cluster.

Before we continue further I would like to clarify the key difference between the standard cluster Installation and flex cluster Installation. 

 

  • Standard cluster installation will be performed using the non GNS  configuration that means all virtual host names and virtual IP addresses should be configured manually.
  • Flex cluster Installation will be performed using the GNS configuration and in this configuration all Virtual host names and Virtual IP addresses will be assigned using the GNS sub domain delegation.

Issue-1 :

During the Installation of Flex cluster the software has been copied  on all cluster nodes and Installer prompted to execute "root.sh" script on all participating cluster nodes as shown in below screen shot.

 

 

-Executed script "root.sh" on flexrac1 and it failed with error.

 

[root@flexrac1 /]# /u01/grid/12.1.0/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/12.1.0

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/12.1.0/crs/install/crsconfig_params
2015/03/27 01:36:43 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
2015/03/27 01:37:40 CLSRSC-330: Adding Clusterware entries to file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'flexrac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'flexrac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'flexrac1'
CRS-2676: Start of 'ora.mdnsd' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.evmd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'flexrac1'
CRS-2676: Start of 'ora.gpnpd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'flexrac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'flexrac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'flexrac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'flexrac1'
CRS-2676: Start of 'ora.diskmon' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'flexrac1' succeeded

ASM created and started successfully.

Disk Group GRID created successfully.

CRS-2672: Attempting to start 'ora.crf' on 'flexrac1'
CRS-2672: Attempting to start 'ora.storage' on 'flexrac1'
CRS-2676: Start of 'ora.storage' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.crf' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'flexrac1'
CRS-2676: Start of 'ora.crsd' on 'flexrac1' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk b924385202a04f41bfa64e62712717e8.
Successfully replaced voting disk group with +GRID.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE b924385202a04f41bfa64e62712717e8 (/dev/oracleasm/disks/DATA1) [GRID]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'flexrac1'
CRS-2677: Stop of 'ora.crsd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crf' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'flexrac1'
CRS-2677: Stop of 'ora.storage' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'flexrac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'flexrac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'flexrac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'flexrac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'flexrac1'
CRS-2677: Stop of 'ora.cssd' on 'flexrac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'flexrac1'
CRS-2677: Stop of 'ora.gipcd' on 'flexrac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'flexrac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

The script was hanging at this stage and after some time it completed with "failed" status.

On further investigation the following error has been reported in the Installation log file:

crsd(23378)]CRS-2772:Server 'flexrac1' has been assigned to pool 'Free'.
2015-03-28 22:49:22.488:
[gnsd(23641)]CRS-10001:CLSGN-0121: Trace level set to 1.
2015-03-28 22:49:24.002:
[gnsd(23641)]CRS-10001:CLSGN-0125: GNSD started on node flexrac1.
2015-03-28 22:50:17.736:
[/u01/grid/12.1.0/bin/orarootagent.bin(23516)]CRS-5017:The resource action "ora.flexrac1.vip start" encountered the following error:
CRS-5005: IP Address: 192.168.1.13 is already in use in the network
. For details refer to "(:CLSN00107:)" in "/u01/grid/12.1.0/log/flexrac1/agent/crsd/orarootagent_root/orarootagent_root.log".
2015-03-28 22:50:19.785:
[crsd(23378)]CRS-2807:Resource 'ora.flexrac1.vip' failed to start automatically.
2015-03-28 22:51:36.498:
[/u01/grid/12.1.0/bin/orarootagent.bin(23516)]CRS-5017:The resource action "ora.flexrac1.vip start" encountered the following error:
CRS-5005: IP Address: 192.168.1.13 is already in use in the network
. For details refer to "(:CLSN00107:)" in "/u01/grid/12.1.0/log/flexrac1/agent/crsd/orarootagent_root/orarootagent_root.log".
2015-03-28 22:59:26.674:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.676:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.678:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.678:
[gnsd(23641)]CRS-10001:CLSGN-0000: no error

CLSGN-00178: Resolution of name "GNSTESTHOST.flex-cluster.oralabs.com" failed.
2015-03-28 22:59:26.680:
[gnsd(23641)]CRS-10001:(:CLSGN00002:)CLSGN-0201: first self-check name resolution failed.

After  analyzing this piece of log we tried to reach IP address "192.168.1.13" then it was reachable.

root@flexrac1 /]# ping 192.168.1.13
PING 192.168.1.13 (192.168.1.13) 56(84) bytes of data.
64 bytes from 192.168.1.13: icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from 192.168.1.13: icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from 192.168.1.13: icmp_seq=3 ttl=64 time=0.026 ms
64 bytes from 192.168.1.13: icmp_seq=4 ttl=64 time=0.024 ms
--- 192.168.1.13 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.018/0.024/0.028/0.003 ms
[root@flexrac1 /]#

The DHCP lease defined in the scope of this cluster is starting from 192.168.1.13 - 192.168.1.25. By default GNS is trying to assign 192.168.1.13 as its the first starting IP address of the DHCP  lease. 

root.sh script is failing as its unable to allocate the first IP address  from the DHCP lease. This is a strange behavior if any of the IP is not free then it should assign the next available IP address from the lease. 

Oracle grid infrastructure GNS configuration was trying to allocate VIP address "192.168.1.13" to the virtual host flexrac1-vip. But the target IP address is not free to be allocate it.All cluster nodes are configured with two active Ethernet  cards, one for Public network and another for private network.  But when we check the server there were three active Ethernet cards.

"eth0" was configured to be used as public

"eth1" was configured to used as private

"eth2" to should be inactive


But when we check  "eth2" was active and configured to assign IP address using the DHCP and due to this reason the VIP  "192.168.1.13" that are supposed to assign it to virtual interface was assigned to the physical Interface and hence Grid Infrastructure is unable to allocate the required virtual IP address to flexrac1.

After pointing out this we disabled interface "eth2" and executed root.sh script again.

Execution of "root.sh" script after fixing the problem:

PRKO-2188 : All the node applications already exist. They were not recreated.
PRKF-1107 : GNS server already configured
PRKZ-1072 : SCAN name "flexrac-cluster-scan.flex-cluster.oralabs.com" is already registered on network 1
PRCS-1028 : Single Client Access Name (SCAN) listener resources already exist on network 1
PRCN-3004 : Listener LISTENER_LEAF already exists
PRCA-1095 : Unable to create ASM resource because it already exists.
PRCN-3004 : Listener ASMNET1LSNR_ASM already exists
CRS-5702: Resource 'ora.GRID.dg' is already running on 'flexrac1'
PRCR-1086 : resource ora.cvu is already registered
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'flexrac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'flexrac1'
CRS-2673: Attempting to stop 'ora.GRID.dg' on 'flexrac1'
...........
...........
...........
CRS-2672: Attempting to start 'ora.scan3.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.scan2.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.scan1.vip' on 'flexrac1'
CRS-2672: Attempting to start 'ora.flexrac1.vip' on 'flexrac1'
CRS-2676: Start of 'ora.scan3.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.scan2.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.scan1.vip' on 'flexrac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'flexrac1'
CRS-2676: Start of 'ora.flexrac1.vip' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.oc4j' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'flexrac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'flexrac1' succeeded
CRS-6016: Resource auto-start has completed for server flexrac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2015/03/28 23:20:24 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded. 

Once we execute root.sh script on first node then GNS will assign one virtual IP on the cluster node and 3 scan  IP's to the scan name.

[oracle@flexrac1 ~]$ olsnodes -i
flexrac1 192.168.1.13
[oracle@flexrac1 ~]$

[oracle@flexrac1 ~]$ srvctl config scan
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 0 IPv4 VIP: -/scan1-vip/192.168.1.14
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 1 IPv4 VIP: -/scan2-vip/192.168.1.15
SCAN name: flexrac-cluster-scan.flex-cluster.oralabs.com, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 2 IPv4 VIP: -/scan3-vip/192.168.1.16
[oracle@flexrac1 ~]$

If we observe the VIP allocation here "192.168.1.13" was allocated to flexrac1-vip and subsequent IP's 192.168.1.14/15/16 is allocated to SCAN. 

List of VIP's after completion of root.sh script:


[oracle@flexrac1 ~]$ olsnodes -i
flexrac1 192.168.1.13
flexrac3 192.168.1.17
flexrac2 192.168.1.18
flexrac4 <none>
flexrac5 <none>
[oracle@flexrac1 ~]$

Recommendation:

We should ensure that we are using the equal number of active interfaces on all participating cluster nodes. If there are additional interfaces exists on cluster nodes then we must ensure that these interfaces are not configured with DHCP configuration. Additional interfaces with DHCP configuration will fail the execution of "root.sh" script.

Issue-2 :

The addition of nodes failed with the following error:

[oragrid@flexnode1 addnode]$ ./addnode.sh  -silent "CLUSTER_NEW_NODES={flexnode6,flexnode7,flexnode8}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={flexnode6-vip,flexnode7-vip}" "CLUSTER_NEW_NODE_ROLES={hub,hub,leaf}"

Starting Oracle Universal Installer...

 

Checking Temp space: must be greater than 120 MB.   Actual 4575 MB    Passed

Checking swap space: must be greater than 150 MB.   Actual 5210 MB    Passed

[FATAL] PRVG-11408 : API called with unequal sized arrays for nodes, VIPs and node roles

[oragrid@flexnode1 addnode]$

Error message with silent execution of node addition "[FATAL] PRVG-11408" but when we execute the same command with GUI mode the error message is "INS-08107" .

There is no clear information available for this error code on oracle support. But the error message encountered with the command line was little informative as it was mentioning  " API called with unequal sized arrays for nodes, VIPs and node roles" . 

After seeing this issue  we try to add only one leaf node and one hub node to the existing cluster configuration and the node addition completed without any issue.

 

[oragrid@flexnode1 addnode]$ ./addnode.sh   "CLUSTER_NEW_NODES={flexnode6,flexnode8}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={flexnode6-vip}" "CLUSTER_NEW_NODE_ROLES={hub,leaf}"

Starting Oracle Universal Installer...

 

Checking Temp space: must be greater than 120 MB.   Actual 4575 MB    Passed

Checking swap space: must be greater than 150 MB.   Actual 5210 MB    Passed

Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed

This behavior of flex cluster is not clear for me, as the initial install also doesn't consists equal number of Hub and leaf nodes. In initial install we had 3 HUB nodes and 2 LEAF nodes.

The unequal number of HUB nodes and LEAF nodes with addnode  command should not be a problem.  I am currently  working with Oracle support on this issue and once i have a satisfactory input from the MOS then I will update the same article.

Recommendation:  

At this stage I would recommend to use equal number of HUB and LEAF nodes if you're scaling up the existing flex cluster environment. 

Conclusion:

In this article we have seen two major issue which I we encountered during the deployment of the flex cluster.  The errors listed in this article is not listed in my oracle support.  The flex cluster Installation and configuration is simple but the troubleshooting part is little difficult as there not many customers really using the flex cluster option. But I hope this article will help individuals who are planning to deploy Oracle 12c flex cluster .


Upgrading Oracle 12c Flex Cluster - PARTI

$
0
0

Introduction

In software industry there are frequent changes to the version of software that are running in production and to fix the bugs and add the new enhancements to the software stack we need to keep on upgrading the version of software. The software version upgrade is not a daily operational activity, so this requires some additional efforts and planning for a successful rollout of a upgrade. 

In this article we will see how we can upgrade Oracle 12c Grid Infrastructure software version from 12.1.0.1 to 12.1.0.2. The existing cluster is configured with 5 nodes ( 3 Hub nodes and 2 leaf Nodes). This article will cover step by step method that needs to be performed for upgrading the 12.1.0.1 Grid Infrastructure and it covers some special scenario that can trigger while performing an upgrade, consider you are performing a cluster upgrade for 10 nodes while the upgrade is running and suddenly one/multiple nodes in a cluster is not reachable due to HW failure or any other reason. In such situations what will be our action as a DBA, this grey area is also covered in detailed in this article.

Cluster nodes details :


The following are the cluster nodes that will be upgraded from 12.1.0.1 to 12.1.0.2

S.No

Node Name

Current Version

Target Version

Node Mode

1

flexrac1

12.1.0.1

12.1.0.2

Cluster Hub-Node

2

flexrac2

12.1.0.1

12.1.0.2

Cluster Hub-Node

3

flexrac3

12.1.0.1

12.1.0.2

Cluster Hub-Node

4

flexrac4

12.1.0.1

12.1.0.2

Cluster Leaf-Node

5

flexrac5

12.1.0.1

12.1.0.2

Cluster Leaf-Node

 

Verify the existing Grid Infrastructure version:

Before we begin with the upgrade process lets check the existing version of the Grid Infrastructure:

[oracle@flexrac2 ~]$ . oraenv
ORACLE_SID = [+ASM2] ? +ASM3
The Oracle base has been set to /u01/oracle
[oracle@flexrac2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.1.0]
[oracle@flexrac2 ~]$

Create the new directory for 12.1.0.2 grid software:

The new directory should be created on across all cluster nodes and ownership/group should be assigned to "grid" user.

[root@flexrac2 ~]# mkdir -p /u01/grid/grid_1212
[root@flexrac2 ~]# chown -R oracle:dba /u01/grid/grid_1212

Oracle 12.1.0.2 Grid Infrastructure will be Installed in this directory.

Installation of 12.1.0.2 grid infrastructure:

We can install the grid infrastructure software on the  all cluster nodes while all services are up and running, the services will not be  impacted until the execution of  "rootupgrade.sh" . The actual upgrade will happen only after execution of "rootupgrade.sh" script.

The Installation of  12.1.0.2 binaries can be performed from any one of the active cluster node.

-  Choose option upgrade Grid Infrastructure 'or' ASM. 

- Make sure all participating cluster nodes are being selected.

- Ensure SSH connectivity is working fine across all cluster nodes

- If OEM cloud control is configured we can configure this home to register in OMS

- Select the appropriate OS group for different roles.

 - Provide the location of Oracle Base ad New GI home.

- Skip this option if you want to execute the scripts manully.

- Here it will validate the pre-requisited on cluster nodes.

- Certain items GI installer can fix it and for certain items we need to manually fix it.  In the above example  kernel parameters and avahi daemon can be fixed by installer whereas  nfs rpm package should be fixed manually.

- Execute "fixup" script on all cluster nodes:

[root@flexrac1 ~]# /tmp/CVU_12.1.0.2.0_oracle/runfixup.sh
All Fix-up operations were completed successfully.
[root@flexrac1 ~]#

Similarly it should be execute on all other cluster nodes.

- GI 12.1.0.2 is now ready to install on all cluster nodes

- Installation in progress .

- The rootupgrade.sh script is now ready for performing an upgrade. 

Execution of rootupgrade.sh - Upgrade of  Grid Inrastructure

In the above text its mentioned we must execute the script on local nodes first then it can be executed on other cluster nodes in parallel. Here it also mentioned we should complete the upgrade first on Hub nodes then we should proceed on Leaf nodes.

- Execution of script on flexrac1 (node1)

[root@flexrac1 ~]# /u01/grid/grid_1212/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/grid_1212

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/grid_1212/crs/install/crsconfig_params
2016/09/17 19:39:44 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 19:40:19 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 19:40:26 CLSRSC-464: Starting retrieval of the cluster configuration data

2016/09/17 19:40:39 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2016/09/17 19:40:39 CLSRSC-363: User ignored prerequisites during installation

2016/09/17 19:40:59 CLSRSC-515: Starting OCR manual backup.

2016/09/17 19:41:04 CLSRSC-516: OCR manual backup successful.

2016/09/17 19:41:10 CLSRSC-468: Setting Oracle Clusterware and ASM to rolling migration mode

2016/09/17 19:41:10 CLSRSC-482: Running command: '/u01/grid/12.1.0/bin/crsctl start rollingupgrade 12.1.0.2.0'

CRS-1131: The cluster was successfully set to rolling upgrade mode.
2016/09/17 19:41:25 CLSRSC-482: Running command: '/u01/grid/grid_1212/bin/asmca -silent -upgradeNodeASM -nonRolling false -oldCRSHome /u01/grid/12.1.0 -oldCRSVersion 12.1.0.1.0 -nodeNumber 1 -firstNode true -startRolling false'


ASM configuration upgraded in local node successfully.

2016/09/17 19:41:33 CLSRSC-469: Successfully set Oracle Clusterware and ASM to rolling migration mode

2016/09/17 19:41:33 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2016/09/17 19:42:39 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully completed.

OLR initialization - successful
2016/09/17 19:46:14 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/09/17 19:49:48 CLSRSC-472: Attempting to export the OCR

2016/09/17 19:49:49 CLSRSC-482: Running command: 'ocrconfig -upgrade oracle dba'

2016/09/17 19:51:02 CLSRSC-473: Successfully exported the OCR

2016/09/17 19:51:10 CLSRSC-486:
At this stage of upgrade, the OCR has changed.
Any attempt to downgrade the cluster after this point will require a complete cluster outage to restore the OCR.

2016/09/17 19:51:10 CLSRSC-541:
To downgrade the cluster:
1. All nodes that have been upgraded must be downgraded.

2016/09/17 19:51:10 CLSRSC-542:
2. Before downgrading the last node, the Grid Infrastructure stack on all other cluster nodes must be down.

2016/09/17 19:51:10 CLSRSC-543:
3. The downgrade command must be run on the node flexrac3 with the '-lastnode' option to restore global configuration data.

2016/09/17 19:51:36 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/09/17 19:52:03 CLSRSC-474: Initiating upgrade of resource types

2016/09/17 19:52:14 CLSRSC-482: Running command: 'upgrade model -s 12.1.0.1.0 -d 12.1.0.2.0 -p first'

2016/09/17 19:52:14 CLSRSC-475: Upgrade of resource types successfully initiated.

2016/09/17 19:52:20 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexrac1 ~]#

Execution of rootupgrade.sh on  flexrac2 (node2):

[root@flexrac2 ~]# /u01/grid/grid_1212/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/grid_1212

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/grid_1212/crs/install/crsconfig_params
2016/09/17 20:02:04 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:02:39 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:02:43 CLSRSC-464: Starting retrieval of the cluster configuration data

2016/09/17 20:02:55 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2016/09/17 20:02:55 CLSRSC-363: User ignored prerequisites during installation


ASM configuration upgraded in local node successfully.

2016/09/17 20:03:13 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2016/09/17 20:04:57 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully completed.

OLR initialization - successful
2016/09/17 20:05:24 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/09/17 20:07:35 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/09/17 20:07:45 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexrac2 ~]#

 Node is not reachable "flexrac3"

At this moment "rootupgrade.sh" script  should be executed on 3rd node whihc is a Hub node, but the node is not reachable due t hardware failure.

[root@flexrac2 ~]# ping flexrac3
PING flexrac3.oralabs.com (192.168.1.83) 56(84) bytes of data.
From flexrac2.oralabs.com (192.168.1.82) icmp_seq=9 Destination Host Unreachable
From flexrac2.oralabs.com (192.168.1.82) icmp_seq=10 Destination Host Unreachable
From flexrac2.oralabs.com (192.168.1.82) icmp_seq=11 Destination Host Unreachable
From flexrac2.oralabs.com (192.168.1.82) icmp_seq=14 Destination Host Unreachable
^C
--- flexrac3.oralabs.com ping statistics ---
15 packets transmitted, 0 received, +4 errors, 100% packet loss, time 14011ms
, pipe 3
[root@flexrac2 ~]#

But still other  2 nodes flexrac4 and flexrac5 which are leaf nodes are available and upgrade script is currently pending for those nodes, So we must proceed with execution of script on remaining cluster nodes.

Execution of rootupgrade.sh on  flexrac4 (node4 - Leaf Node): 

[root@flexrac4 bin]# /u01/grid/grid_1212/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/grid_1212

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/grid_1212/crs/install/crsconfig_params
2016/09/17 20:10:33 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:11:08 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:11:12 CLSRSC-464: Starting retrieval of the cluster configuration data

2016/09/17 20:11:25 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2016/09/17 20:11:26 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
2016/09/17 20:12:43 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/09/17 20:14:27 CLSRSC-343: Successfully started Oracle Clusterware stack

Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/09/17 20:14:37 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexrac4 bin]#

Execution of rootupgrade.sh on  flexrac5 (node5 - Leaf Node): 

[root@flexrac5 ~]# /u01/grid/grid_1212/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/grid_1212

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/grid_1212/crs/install/crsconfig_params
2016/09/17 20:16:31 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:17:14 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:17:18 CLSRSC-464: Starting retrieval of the cluster configuration data

2016/09/17 20:17:30 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2016/09/17 20:17:30 CLSRSC-363: User ignored prerequisites during installation

2016/09/17 20:17:40 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2016/09/17 20:18:21 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully completed.

OLR initialization - successful
2016/09/17 20:19:09 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/09/17 20:20:51 CLSRSC-343: Successfully started Oracle Clusterware stack

Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/09/17 20:20:59 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexrac5 ~]#

Now the cluster upgrade script "rootupgrade.sh" has successfully completed on flexrac1, flexrac2, fkexrac4 and flexrac5. The remaining node is flexrac3 and script has not been executed due to non availability of that node.

Let's see what we have to do in such situations.

Query the crs version from any of the upgraded cluster node:

We can check the version from any one of the upgraded cluster node.

[oracle@flexrac2 grid_1212]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.1.0]
[oracle@flexrac2 grid_1212]$

<<< Version is still 12.1.0.1 >>>>>>>>>>>>>>

If we are not able to get that node back then we must finish the upgrade. To finish the upgrade we have to force clusterware to complete the upgrade skipping that node.

We should execute "rootupgrade.sh" script on any of the upgraded node again with "-force" option, this will update the version of GI in registry and finishes the upgrade on currently available cluster nodes.

[root@flexrac2 ~]# /u01/grid/grid_1212/rootupgrade.sh -force
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/grid/grid_1212

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/grid_1212/crs/install/crsconfig_params
2016/09/17 20:28:42 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:28:42 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:29:15 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:29:26 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2016/09/17 20:29:29 CLSRSC-464: Starting retrieval of the cluster configuration data

2016/09/17 20:29:44 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2016/09/17 20:29:44 CLSRSC-363: User ignored prerequisites during installation

2016/09/17 20:30:01 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2016/09/17 20:30:11 CLSRSC-478: Setting Oracle Clusterware active version on the last node to be upgraded

2016/09/17 20:30:11 CLSRSC-482: Running command: '/u01/grid/grid_1212/bin/crsctl set crs activeversion -force'

Attempting to forcibly upgrade the Oracle Clusterware using only the nodes flexrac1, flexrac2, flexrac4, flexrac5.
Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
The CSS was successfully upgraded.
Started to upgrade Oracle ASM.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Forcibly upgraded the Oracle Clusterware.
Oracle Clusterware operating version was forcibly set to 12.1.0.2.0
CRS-1121: Oracle Clusterware was forcibly upgraded without upgrading nodes flexrac3.
2016/09/17 20:31:18 CLSRSC-479: Successfully set Oracle Clusterware active version

2016/09/17 20:31:28 CLSRSC-476: Finishing upgrade of resource types

2016/09/17 20:31:36 CLSRSC-482: Running command: 'upgrade model -s 12.1.0.1.0 -d 12.1.0.2.0 -p last'

2016/09/17 20:31:36 CLSRSC-477: Successfully completed upgrade of resource types

PRCN-3004 : Listener MGMTLSNR already exists
2016/09/17 20:32:17 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@flexrac2 ~]#

Now check again the version of  Grid Infrastructure:

[root@flexrac5 bin]# ./crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.2.0]
[root@flexrac5 bin]#

Now we are ready to click "OK" on the screen "rootupgrade.sh"  and it will proceed further.

- Its not able to update the Inventory as flexrac3 is not reachable which is ok.

- Post upgrade in progress.

- It can be ignored this error is due to unavailability of cluster node.

- Click "yes" to proceed.

The upgrade process is now completed.

Verify the services of cluster nodes:

[oracle@flexrac2 grid_1212]$ crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....SM.lsnr ora....er.type 0/5 0/ ONLINE ONLINE flexrac1
ora.GRID.dg ora....up.type 0/5 0/ ONLINE ONLINE flexrac1
ora....ER.lsnr ora....er.type 0/5 0/ ONLINE ONLINE flexrac1
ora....AF.lsnr ora....er.type 0/5 0/ OFFLINE OFFLINE
ora....N1.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexrac2
ora....N2.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexrac1
ora....N3.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE flexrac1
ora.MGMTLSNR ora....nr.type 0/0 0/0 ONLINE ONLINE flexrac1
ora.asm ora.asm.type 0/5 0/0 ONLINE ONLINE flexrac1
ora.cvu ora.cvu.type 0/5 0/0 ONLINE ONLINE flexrac2
ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE flexrac1
ora....ac1.ons application 0/3 0/0 ONLINE ONLINE flexrac1
ora....ac1.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexrac1
ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE flexrac2
ora....ac2.ons application 0/3 0/0 ONLINE ONLINE flexrac2
ora....ac2.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexrac2
ora....ac3.vip ora....t1.type 0/0 0/0 ONLINE ONLINE flexrac2
ora.gns ora.gns.type 0/5 0/0 ONLINE ONLINE flexrac2
ora.gns.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexrac2
ora.mgmtdb ora....db.type 0/2 0/1 ONLINE ONLINE flexrac1
ora....network ora....rk.type 0/5 0/ ONLINE ONLINE flexrac1
ora.oc4j ora.oc4j.type 0/1 0/2 ONLINE ONLINE flexrac1
ora.ons ora.ons.type 0/3 0/ ONLINE ONLINE flexrac1
ora.scan1.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexrac2
ora.scan2.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexrac1
ora.scan3.vip ora....ip.type 0/0 0/0 ONLINE ONLINE flexrac1
[oracle@flexrac2 grid_1212]$

The failed node can be joined to upgraded cluster when the node is available. But we will explore the options available to join that node to the cluster in the next part of article.

Conclusion:

The above upgrade scenario has been created after we faced a similar issue in one of our PRODUCTION Cluster upgrade. That upgrade was on 12 node cluster with 6 Hub nodes and 6 leaf nodes.  At the time of execution of upgrade script on cluster nodes, 2 of the hub nodes became unreachable due to hardware failure and we have to proceed with the upgrade.  To conclude "rootupgrade.sh -force" completed upgrade successfully though one/multiple cluster nodes are not reachable. There is an option introduced from 12c Grid Infrastructure using that we can join the failed cluster node to an existing upgraded cluster. This option we will explore in the next part of this article.

- We can execute "rootupgrade.sh" script in parallel after completion of script on first node from the upgrade is initiated.

- We should complete "rootupgrade.sh" first on Hub nodes and after completion of Hub nodes execute it on Leaf nodes.

- Before starting the upgrade ensure all cluster/database/operating system logfiles are clear and there are no major errors recorded.

Creating a Local repository in Oracle Linux

$
0
0

If Linux repository is not configured on the system, then installing of rpm’s is difficult part. We need to spend lot of time in finding dependencies in between rpm’s. Using yum repository will overcome this problem and it simplifies the rpm installation. But by default yum repository is configured to use oracle public repository and server should have connection to the internet. In most environments servers will not be connected to internet, in such situation we can configure YUM repository locally on the file system using the packages media.

In this post we will see how quickly we can configure the repository on the local file system using Linux Media.

– Create Directory

[root@racnode2]#mkdir -p /OEL/repo

Copy “Packages” from media to “/OEM/repo” directory

– verify createrepo package exists on the system:

If this rpm doesn’t exist then we should install it.

[root@racnode2 repo]# rpm -ivh createrepo-0.9.9-24.el6.noarch.rpm
warning: createrepo-0.9.9-24.el6.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
error: Failed dependencies:
python-deltarpm is needed by createrepo-0.9.9-24.el6.noarch
[root@racnode2 repo]# rpm -ivh python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
warning: python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
error: Failed dependencies:
deltarpm = 3.5-0.5.20090913git.el6 is needed by python-deltarpm-3.5-0.5.20090913git.el6.x86_64
[root@racnode2 repo]#

– If you encounter dependencies while installing this rpm then follow the below sequence of rpm installation.

[root@racnode2 repo]# rpm -ivh deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
warning: deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing... ########################################### [100%]
1:deltarpm ########################################### [100%]
[root@racnode2 repo]# rpm -ivh python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
warning: python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing... ########################################### [100%]
1:python-deltarpm ########################################### [100%]
[root@racnode2 repo]# rpm -ivh createrepo-0.9.9-24.el6.noarch.rpm
warning: createrepo-0.9.9-24.el6.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing... ########################################### [100%]
1:createrepo ########################################### [100%]
[root@racnode2 repo]#

– Run createrepo command for creation of repository:

[root@racnode2 repo]# createrepo -v /OEL/repo/
Spawning worker 0 with 3934 pkgs
Worker 0: reading tuned-0.2.19-16.el6.noarch.rpm
Worker 0: reading cyrus-imapd-2.3.16-13.el6_6.x86_64.rpm
Worker 0: reading kde-i18n-Hungarian-3.5.10-11.el6.noarch.rpm
Worker 0: reading samba-3.6.23-33.0.1.el6.x86_64.rpm
Worker 0: reading wpa_supplicant-0.7.3-8.el6.x86_64.rpm
Worker 0: reading libICE-1.0.6-1.el6.i686.rpm
Worker 0: reading sat4j-2.2.0-4.0.el6.noarch.rpm
Worker 0: reading gnome-icon-theme-2.28.0-8.el6.noarch.rpm
Worker 0: reading xmltex-20020625-16.el6.noarch.rpm
Worker 0: reading sblim-cmpi-nfsv4-1.1.0-1.el6.x86_64.rpm
Worker 0: reading python-libs-2.6.6-64.0.1.el6.i686.rpm
Worker 0: reading pcsc-lite-openct-0.6.19-4.el6.x86_64.rpm
Worker 0: reading gzip-1.3.12-22.el6.x86_64.rpm
----

----

Worker 0: reading xorg-x11-fonts-misc-7.2-11.el6.noarch.rpm
Workers Finished
Gathering worker results

Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Starting other db creation: Sun Oct 30 13:35:05 2016
Ending other db creation: Sun Oct 30 13:35:06 2016
Starting filelists db creation: Sun Oct 30 13:35:07 2016
Ending filelists db creation: Sun Oct 30 13:35:11 2016
Starting primary db creation: Sun Oct 30 13:35:11 2016
Ending primary db creation: Sun Oct 30 13:35:13 2016
Sqlite DBs complete
[root@racnode2 repo]#

– Configure the repository file:

[root@racnode2 repo]# vi /etc/yum.repos.d/myrepo.repo
[root@racnode2 repo]# cat /etc/yum.repos.d/myrepo.repo
[Local_Repo]
name=myrepo
baseurl=file:/OEL/repo
enabled=1
gpgcheck=0
[root@racnode2 repo]#

– Check the configured repositories:

[root@racnode2 repo]# yum list
Loaded plugins: refresh-packagekit, security, ulninfo
Local_Repo | 2.9 kB 00:00 ...
Local_Repo/primary_db | 4.1 MB 00:00 ...
http://yum.oracle.com/repo/OracleLinux/OL6/UEKR4/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'yum.oracle.com'"
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: public_ol6_UEKR4. Please verify its path and try again
[root@racnode2 repo]#

Still its listing the old oracle public repository. We should remove this repository from the list.

– remove repository:

[root@racnode2 yum.repos.d]# ls
myrepo.repo old
[root@racnode2 yum.repos.d]# mv public-yum-ol6.repo old^C
[root@racnode2 yum.repos.d]# cd old
[root@racnode2 old]# ls
packagekit-media.repo public-yum-ol6.repo
[root@racnode2 old]#

– Check the repositories:

[root@racnode2 yum.repos.d]# yum repolist
Loaded plugins: refresh-packagekit, security, ulninfo
repo id repo name status
Local_Repo myrepo 3,934
repolist: 3,934
[root@racnode2 yum.repos.d]#

Now its listing only locally configured repository and we are ready to use this repository for installing required rpm’s.

A Way with Words

Celebrity

$
0
0

At least 10 blog posts have been created

Guru

$
0
0

At least 20 blog posts have been created

Viewing all 74 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>