Finished reading 12c Concepts manual

I decided to read the entire Oracle 12c Concepts manual.  It’s kind of overkill to read the entire manual if you just want to learn the new 12c features but it has been a long time since I’ve read the entire Concepts manual on the current version of Oracle.  I’m not sure that I remember correctly, but I think the last version I read entirely was the 9i version.  I definitely read the Oracle 7 concepts and several other manuals when I was first getting started.

The 12c manual has things in it that seem virtually unchanged from what I remember from earlier versions.  But, it also has a lot more technical depth than I remember.  It was boring at times reading over sections filled with things I’ve worked on for almost 20 years now.  But, these monotonous sections still had reminders of things I probably don’t have in the front of my brain because I don’t use them every day.  Then there were whole sections about features that are either new in 12c or new to me such as the multitenant features.

I’ve heard many people say that a new DBA (database administrator) should start with the Concepts manual.  Reading it again just confirms this often stated advice.  Concepts has pointers to all these other manuals where you can find details on the things you found interesting or needed for your job.  For example, in the sections relating to partitioned tables there are references to the VLDB and Partitioning manual.  A new DBA who will be working on partitioned tables might not realize that this manual existed, but the Concepts manual will send you there.

I realize that a lot of people don’t like to read manuals, but if you had to force yourself to sit down and read one Oracle manual cover to cover so you could be a better DBA you should read Concepts.  It’s the starting point and after 19 years and counting as an Oracle DBA I still got a lot out of reading it.

– Bobby

Posted in Uncategorized | Leave a comment

Sqlplus script to check for row chaining

Here is the sqlplus scripting I use to check whether row chaining is degrading a query’s performance:

VARIABLE monitored_sid number;

begin

SELECT sid into :monitored_sid 
from v$session 
where audsid=USERENV('SESSIONID');

end;
/

select b.value "table fetch continued rows"
from
V$SESSTAT b
where b.SID = :monitored_sid
and b.STATISTIC# = (SELECT STATISTIC# FROM V$STATNAME 
WHERE NAME='table fetch continued row');

I create a test script with the problem query and put this code after it to see how much chaining affected it.

Here is an example of a fast query:

Elapsed: 00:00:15.87

table fetch continued rows
--------------------------
                     19723

Here is an example of a slow query:

Elapsed: 00:03:17.46

table fetch continued rows
--------------------------
                  23775056

This was the same query for two different date ranges.  The second range had a lot of row chaining and the query was much slower.

– Bobby

Posted in Uncategorized | Leave a comment

Blog second anniversary

Today is the second anniversary of my first post on this blog.

Some time ago I played with a free blog that came with a yahoo email account that my wife and I share but didn’t get very far with that.  Finally for this blog I decided to spend a small amount of money to get my own hosting and domain name.

I’m using iPage for the hosting.  I’ve been happy with their service.  There were a couple of billing issues, but I opened tickets with them and they were quickly resolved in a fair way.

I evaluated a few blog sites and chose WordPress because it allowed me to do what was most important.  WordPress’s software enabled me to upload sqlplus scripts and their logs along with PowerPoint, Microsoft Word, and PDF files.  WordPress also enabled me to include output from sqlplus and have the columns line up nicely like this:

SQL> select table_name, blocks
  2  from dba_tables
  3  where owner='SYS'
  4  order by table_name;

TABLE_NAME                         BLOCKS
------------------------------ ----------
ACCESS$                               759
ALERT_QT
APPLY$_CHANGE_HANDLERS                  0
APPLY$_CONF_HDLR_COLUMNS                0
APPLY$_CONSTRAINT_COLUMNS               0
APPLY$_DEST_OBJ                         0
APPLY$_DEST_OBJ_CMAP                    0
APPLY$_DEST_OBJ_OPS                     0
APPLY$_ERROR                            0

I’m pretty happy with the results.  I’ve had some nice interaction with the Oracle community and I’ve used links to my posts with coworkers on the job.  Plus, I actually query the site myself when I forget how to do something I wrote about.

I.e. If I forget where to plus sign on an outer join I search for it on my blog!

Overall it has been an enjoyable experience and I think helpful to me and my employer as I try to communicate with others the things I’m learning about Oracle database technology, especially in the performance tuning area.

– Bobby

Posted in Uncategorized | Leave a comment

Finished reading multitenant parts of db admin manual

I’m trying to get up to speed on 12c, especially the multitentant features.  So, I decided to read the sections of the manuals that related to using multitentant.  Here is the name of the manual I was looking at:

Oracle® Database Administrator’s Guide 12c Release 1 (12.1)

It isn’t great reading.  It is kind of slow-paced and really more of a reference but since it is a large new feature I wanted to go over the practical details of using it.  Some of the sections had a command line sqlplus version and a web-based cloud control version so I skipped the cloud version.  I’m more of a command line, DBA and v$ view kind of DBA.

Here are the chapter numbers and titles of the sections I read:

36 Overview of Managing a Multitenant Environment
37 Creating and Configuring a CDB
38 Creating and Removing PDBs with SQL*Plus
40 Administering a CDB with SQL*Plus
42 Administering PDBs with SQL*Plus
43 Viewing Information About CDBs and PDBs with SQL*Plus
44 Using Oracle Resource Manager for PDBs with SQL*Plus
46 Using Oracle Scheduler with a CDB

I haven’t really used what I’ve learned yet but I feel like I have a good overview and know where to go to get more information.

I’m also working on reading through the 12c concepts manual on my Kindle but I haven’t gotten to the multitenant part.  Once I’m done with that I hope to get serious about studying up for the 12c OCP upgrade exam.  It looks like you can sign up for the tests now, but I don’t have any practice exam software or a book and I usually get one of these before I take the test.  I see one book that is supposed to be available in June so I may have to wait until then but I have plenty to do between now and then anyway so I may not be ready to take the test before the book is available so that should work out.

In any case, certification or no, it was helpful to read the db admin guide multitenant sections to get a feel for how it works.  Probably chapter 44 was the most interesting because it talked about using resource manager to divide resource usage between the pluggable databases that are sharing the same instance.

– Bobby

Posted in Uncategorized | Leave a comment

Good blog post on learning Oracle performance

Saw this blog post about how Cary Millsap learned about Oracle performance tuning and I thought it was interesting:

http://carymillsap.blogspot.com/2014/02/how-did-you-learn-so-much-stuff-about.html

It is interesting because I’ve learned so much from Cary Millsap and Jeff Holt’s book and it is cool to see how Cary got to the position he is in as a well-known Oracle performance expert.  I guess I’ve always found the performance tuning part of my Oracle DBA job the most interesting so it is intriguing to see someone else who feels the same way about it and how they pursued their interest and made a living doing it.

Cary’s blog post contains good advice to someone who wants to be an Oracle DBA/performance analyst.  Read the Concepts manual and other relevant manuals.  Read books by the performance experts.  Find opportunities to work on real performance problems.  Pretty much these are the steps I’ve taken, although I’ve always worked as an employee for a company instead of being a consultant.  Still, working for a large company has been a good situation for me because we have a variety of systems and performance is consistently a problem that needs work.

– Bobby

Posted in Uncategorized | 1 Comment

Updates of tables with query high compression slow

In an earlier post about how query high compression eliminated row chaining in tables with more than 255 columns I got question about whether updates to tables with more than 255 columns and query high compression ran slower with this type of compression than on an uncompressed table.

The answer is that, at least in the tests I ran, the updates of a table with more than 255 columns took much longer on a table with query high compression than on the uncompressed table.

Here is a zip of my test scripts and their logs: zip

I took a table with 256 columns and populated it with 1,000,000 rows filling all columns with data.  Then I updated the first column, the last column, and then all columns in three separate transactions.

Here are the results:

                   Column 1    Column 256  All Columns

Uncompressed run 1 00:00:11.81 00:00:24.20 00:00:57.89
Uncompressed run 2 00:00:12.25 00:00:23.94 00:00:59.28

Compressed run 1   00:02:27.20 00:02:27.65 00:02:50.08
Compressed run 2   00:02:26.83 00:02:26.33 00:02:47.88

I don’t have the time or motivation to dig any deeper into the causes of this but here is what I think it going on based on these results.

  1. Row chaining in the uncompressed table with more than 255 columns causes the update of the last column or all columns to be slower than updating just the first column.
  2. Updating any or all columns of the table when it is compressed for query high uncompresses the table (I think it remains compressed for OLTP but did not check).
  3. The time it takes to uncompress the table during the update is much more than the time taken to update the uncompressed table, even in situations where we have row chaining.

Maybe someone could construct a situation where the time taken by row chaining exceeds the time taken to uncompress the rows for the update, but in my test case it isn’t close.  The uncompress takes more than twice the time of the updates that experience row chaining.

– Bobby

Posted in Uncategorized | 2 Comments

Production RMAN recovery

I was on call again last week and again had to do a database recovery using RMAN.  I thought I might as well document some lessons learned even though it went quite smoothly.

First, here are the steps I took.  I’ve modified the commands and script so that our real usernames, passwords,  database names, and host names are obscured, but otherwise it is exactly what I ran.

Environment wise we are on 11.2.0.3 Oracle database running on HP-UX 11.31 on 64 bit Itanium.

First I had to build the Unix script that I would run to do the recovery:

rman target / catalog CATUSER/CATPASSWORD@CATDB <<EOF

run {
allocate channel 'dev_0' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=OURDB,OB2BARLIST=VTL_ourhost_OURDB_ora4)';
allocate channel 'dev_1' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=OURDB,OB2BARLIST=VTL_ourhost_OURDB_ora4)';
allocate channel 'dev_2' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=OURDB,OB2BARLIST=VTL_ourhost_OURDB_ora4)';
allocate channel 'dev_3' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=OURDB,OB2BARLIST=VTL_ourhost_OURDB_ora4)';
allocate channel 'dev_4' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=OURDB,OB2BARLIST=VTL_ourhost_OURDB_ora4)';
set until time "to_date('Feb 20 2014 21:38:10','Mon DD YYYY HH24:MI:SS')";
restore database;
recover database;
}

exit
EOF

In my previous recovery scenario that I blogged about I didn’t use the catalog because I was recovering to a different database just to retrieve data from before the failure I was trying to recover from.  In this case I was recovering the production database in place to our Thursday night backup that we took just before embarking on a PeopleTools upgrade.  So, first thing I had to do was look up the user, password, and connect string for the catalog.  Then I just tested connecting without doing anything.

rman target / catalog CATUSER/CATPASSWORD@CATDB
exit

Next I had to figure out the allocate channel commands.  In our case we have a graphical front end to our data protector backup system and I was able to extract the log from Thursday night’s backup.  I copied the allocate channel commands unchanged from the log to my script.

The hardest part of setting up this script was knowing what time to use in the set until time command.  I decided to use the last time that appeared on the backup output.  I knew that the application was down and so not much updating was occurring so I wasn’t worried about going past the earliest possible point that I could use.  I knew that our upgrade didn’t start until after the backup finished so I reasoned that the last backup message would be safe.  But, the backup program spits out messages in a different time zone than our database is in.  (I’m in Arizona, but my database is on Chicago time…).  So, getting the timezone right was key otherwise I could be off by an hour.  The date and time in the last backup message looked like this in Arizona time:

Time: 2/20/2014 8:38:10 PM

I had to translate it to

Feb 20 2014 21:38:10

This got it into central time and 24 hour format.

Lastly, I considered whether we had enough space for all the archive logs that would be retrieved for the recovery.  Since we were recovering from just after a hot backup and with the application down I decided that it couldn’t be more than one or two archive logs.  As it turns out the archive logs were on the disk when the recovery ran so they didn’t even need to be restored from tape.

To actually run the script I first manually put the database in mount mode:

sqlplus / as sysdba
shutdown immediate
startup mount

Then I ran the recovery nohup in the background:

nohup ./recover.sh > recover.out &

Lastly, after carefully reviewing the output from the recovery to make sure it looked right I opened the database resetting logs:

sqlplus / as sysdba
alter database open resetlogs;

It went very smoothly.  It was a little nerve-racking recovering the production system in place, but we did make two different backups of it before doing the recovery so presumably we could have gone back to the way it was before the recovery.

– Bobby

Posted in Uncategorized | Leave a comment

Bug 14383007 workaround

We appear to be hitting this bug on our production Exadata RAC system:

Bug 14383007 – Sort runs out of temp space in RAC even when temp space is available

One particular job errors out with ORA-01652 errors on the TEMP tablespace even though there is tons of free space in the sort segments.  So, I got the idea of building a work around to have this job only login to the node that has the most free temp space.  Normally space just gets reallocated from the node that has it to the node that needs it.  But, I guess the bug is that in certain cases this doesn’t happen and you get the ORA-01652.

Here is my example unix script (actually this is run on an HP-UX box, don’t ask me why).

# This script demonstrates how to login to the instance
# of a RAC database that has the most free space in 
# the TEMP tablespace.  It takes three arguments
# which are the connection information for the RAC
# database:
# 1 - Oracle database username
# 2 - password
# 3 - connect string
#
# in sqlplus this would be like connecting as username/password@connectstring
#
# Step one - login to the RAC database and extract a connect string that
# connects to the node with the most free temp space.  
# This has several assumptions:
# 1 - temporary tablespace is named TEMP
# 2 - all of the temp space is allocated to sort segments
# 3 - the port number is 1521
NEW_CONNECT_STRING=`sqlplus -s /nolog<<EOF
connect $1/$2@$3
set heading off
set feedback off
set linesize 32000
set trimspool on
select 
'(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(Host = '||
host_name||
')(Port = 1521))) (CONNECT_DATA = (SID = '||
INSTANCE_NAME ||
')))'
from
gv\\$instance
where inst_id=
(select 
inst_id
from 
gv\\$sort_segment
where 
TABLESPACE_NAME='TEMP' and
FREE_BLOCKS =
(select max(FREE_BLOCKS)
from 
gv\\$sort_segment
where 
TABLESPACE_NAME='TEMP'));
exit
EOF
`
#trim the leading space from the connect string
NEW_CONNECT_STRING=`echo $NEW_CONNECT_STRING|sed 's/^ //'`
echo connect string = $1/$2@$NEW_CONNECT_STRING
# Now connect to the node using this connect string just to show
# that it works and query the sort segment view to show that this
# really is the instance with the most free temp space.
sqlplus /nolog<<EOF
set echo on
set termout on
set pagesize 1000
connect $1/$2@$NEW_CONNECT_STRING
-- show current instance
select instance_name from v\$instance;
-- list instances in descending order
-- of free blocks.  current instance
-- should be listed first.
select inst_id,free_blocks
from gv\$sort_segment
where 
TABLESPACE_NAME='TEMP'
order by free_blocks desc;
EOF

You wouldn’t want to use this except for a special case like this where you need a workaround for the bug.  Otherwise you would just end up running on one node and all the temp space would get allocated to it.  But, if you are hitting bug 14383007 this may be helpful.

– Bobby

Posted in Uncategorized | Leave a comment

Library cache lock scripts for RAC

I’ve been having issues for a long time now with an Exadata RAC database that has user reports experiencing library cache lock waits.  The challenge is to figure out what is holding the library cache locks that the queries are waiting on.

My starting point on library cache locks has always been this Oracle support document:

How to Find which Session is Holding a Particular Library Cache Lock (Doc ID 122793.1)

But it doesn’t tell you how to find the session across nodes of a RAC database.

I also found this helpful blog post that briefly addresses finding the session across RAC nodes: Library cache lock and library cache pin waits

I’ve spent many hours over more than a year now dealing with these waits without a lot of success so I finally tried to build a script that I could run regularly to try to capture information about the sessions holding the library cache locks.

First, I knew from Oracle’s document that the x$kgllk table could be used to find the blocking session on a single node so I included queries against this table in my script and set it up so I would ssh into every node of the cluster and run a queries like this against each node:

-- sessions on this instance that are waiting on
-- library cache lock waits
-- unioned with
-- sessions on this instance that are holding locks that other
-- sessions on this instance are waiting on.

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
KGLLKHDL,
KGLLKREQ,
USER_NAME,
KGLNAOBJ,
sql_id,
SQL_FULLTEXT
)
(select
'X\$KGLLK',
'N',
sysdate,
(select INSTANCE_NUMBER from v\$instance),
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
x.KGLLKHDL,
x.KGLLKREQ,
x.USER_NAME,
x.KGLNAOBJ,
s.sql_id,
q.SQL_FULLTEXT
from 
v\$session s, x\$kgllk x, v\$sql q
where
x.kgllkses=s.saddr and
s.sql_id=q.sql_id(+) and
s.event='library cache lock' and
x.KGLLKREQ > 0 and
q.child_number(+)=0)
union all
(select
'X\$KGLLK',
'Y',
sysdate,
(select INSTANCE_NUMBER from v\$instance),
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
x.KGLLKHDL,
x.KGLLKREQ,
x.USER_NAME,
x.KGLNAOBJ,
s.sql_id,
q.SQL_FULLTEXT
from 
v\$session s, x\$kgllk x, v\$sql q,
x\$kgllk x2
where
x.kgllkses=s.saddr and
s.sql_id=q.sql_id(+) and
x.KGLLKREQ = 0 and
x2.KGLLKREQ > 0 and
x2.KGLLKHDL = x.KGLLKHDL and
q.child_number(+)=0);

commit;

The dollar signs are escaped with a backslash because these queries are part of a Unix shell script.  I picked a few columns that I thought would be helpful from v$session and joined to v$sql to get the text of the blocking and blocked SQL.  Note that I ran these queries as SYSDBA.  Here is an example of my test case where the blocker and blocked sessions are both on one node:

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
X$KGLLK                        N 2014-02-17 17:19:01          1       1183 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:57            5 00000005F9E7D148          2 MYUSER1                        TEST                                                         g4b4j3a8mms0z                                                               select sum(b) from test
X$KGLLK                        Y 2014-02-17 17:19:03          1        995 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:52           10 00000005F9E7D148          0 MYUSER1                        TEST                                                         gv7dyp7zvspqg                                                               alter table test modify (a char(100))

Next, I noticed that on gv$session when a session was waiting on library cache lock waits sometimes FINAL_BLOCKING_INSTANCE and FINAL_BLOCKING_SESSION were populated and that might lead me to the session holding the lock.  Also, this query and the ones following can run in a less privileged account – you don’t need SYSDBA.

drop table lcl_blockers;

create table lcl_blockers as
select distinct
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id
from
gv\$session s, 
gv\$session s2
where
s2.FINAL_BLOCKING_INSTANCE=s.INST_ID and
s2.FINAL_BLOCKING_SESSION=s.SID and
s2.event='library cache lock';

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
sql_id,
SQL_FULLTEXT
)
select
'GV\$SESSION',
'Y',
sysdate,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
q.SQL_FULLTEXT
from 
lcl_blockers s, gv\$sql q
where
s.sql_id=q.sql_id(+) and
s.INST_ID=q.INST_ID(+) and
q.child_number(+)=0
order by s.INST_ID,s.sid;

commit;

When this works – sporadically in my tests – it shows the same sort of information the previous queries do for same node locking.  Here is an example of these gv$session queries catching the blocker:

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
GV$SESSION                     Y 2014-02-17 17:19:05          1        995 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:52           12                                                                                                                         gv7dyp7zvspqg                                                               alter table test modify (a char(100))

Lastly, I got a cross node query working that uses the view gv$ges_blocking_enqueue.  The key to making this query was that the pid column in gv$ges_blocking_enqueue is the same as the spid column in gv$process.

-- join gv$ges_blocking_enqueue, gv$session, gv$process to show 
-- cross node library cache lock blockers.  Blocked session will 
-- have event=library cache lock.

drop table ges_blocked_blocker;

create table ges_blocked_blocker as
(select distinct
'N' blocker,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
s.process,
p.spid,
e.RESOURCE_NAME1,
e.RESOURCE_NAME2
from
gv\$session s, 
gv\$process p,
gv\$ges_blocking_enqueue e
where
s.event='library cache lock' and 
s.inst_id=p.inst_id and
s.paddr=p.addr and
p.inst_id=e.inst_id and
p.spid=e.pid and
e.blocked > 0)
union
(select distinct
'Y',
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
s.process,
p.spid,
e.RESOURCE_NAME1,
e.RESOURCE_NAME2
from
gv\$session s, 
gv\$process p,
ges_blocked b,
gv\$ges_blocking_enqueue e
where
s.inst_id=p.inst_id and
s.paddr=p.addr and
p.inst_id=e.inst_id and
p.spid=e.pid and
b.RESOURCE_NAME1=e.RESOURCE_NAME1 and
b.RESOURCE_NAME2=e.RESOURCE_NAME2 and
e.blocker > 0);

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
sql_id,
SQL_FULLTEXT,
RESOURCE_NAME1,
RESOURCE_NAME2
)
select
'GV\$GES_BLOCKING_ENQUEUE',
s.blocker,
sysdate,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
q.SQL_FULLTEXT,
s.RESOURCE_NAME1,
s.RESOURCE_NAME2
from 
ges_blocked_blocker s, gv\$sql q
where
s.sql_id=q.sql_id(+) and
s.INST_ID=q.INST_ID(+) and
q.child_number(+)=0
order by s.INST_ID,s.sid;

commit;

Here is some example output from my gv$ges_blocking_enqueue script.  I edited my username, machine name, etc. to obscure these.

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
GV$GES_BLOCKING_ENQUEUE        N 2014-02-17 17:19:55          2        301 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:19:46            7                                                                                                                         g4b4j3a8mms0z [0x426d0373][0x224f1299],[LB][ 1114440563,575607449,LB        select sum(b) from test
GV$GES_BLOCKING_ENQUEUE        Y 2014-02-17 17:19:55          1        497 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:19:41           13                                                                                                                         gv7dyp7zvspqg [0x426d0373][0x224f1299],[LB][ 1114440563,575607449,LB        alter table test modify (a char(100))

The alter table command on node 1 is holding the lock while the select statement on node 2 is waiting on the library cache lock.

So, I’ve got this going on a script that runs every 15 minutes in production.  It worked great in my test case but time will tell if it yields any useful information for our real problems.

– Bobby

p.s. I’ve uploaded a zip of my scripts: zip

Here is a description of the included files:

Testcase to create a library cache lock:

create.sql – creates a table with one character first column CHAR(1)
alter.sql – alters table expanding CHAR column
query.sql – queries table – waits on library cache lock wait if run while alter.sql is running

all.sh – top level script – you will need to edit to have the host names for your RAC cluster and to have your own userid and password

lcl.sh – x$ table script that is run on each node.  Only key thing is that our profile required a 1 to be entered to choose the first database from a list.  You may not need that line.

resultstable.sql – create table to save results

dumpresults.sql – dump out all results

dumpresultsnosql.sql – dump out all results except sql text so easier to read.

Here is the definition of the results table:

create table myuser1.library_cache_lock_waits
(
 SOURCETABLE    VARCHAR2(30),
 BLOCKER        VARCHAR2(1),
 SAMPLE_TIME    DATE,
 INST_ID        NUMBER,
 SID            NUMBER,
 USERNAME       VARCHAR2(30),
 STATUS         VARCHAR2(8),
 OSUSER         VARCHAR2(30),
 MACHINE        VARCHAR2(64),
 PROGRAM        VARCHAR2(48),
 LOGON_TIME     DATE,
 LAST_CALL_ET   NUMBER,
 KGLLKHDL       RAW(8),
 KGLLKREQ       NUMBER,
 USER_NAME      VARCHAR2(30),
 KGLNAOBJ       VARCHAR2(60),
 SQL_ID         VARCHAR2(13),
 RESOURCE_NAME1 VARCHAR2(30),
 RESOURCE_NAME2 VARCHAR2(30),
 SQL_FULLTEXT   CLOB
);

P.P.S. This was all tested only on Exadata running 11.2.0.2.

Oracle documentation on Library Cache:

The library cache is a shared pool memory structure that stores executable SQL and PL/SQL code. This cache contains the shared SQL and PL/SQL areas and control structures such as locks and library cache handles. In a shared server architecture, the library cache also contains private SQL areas.

Oracle 12c Concepts manual diagram with library cache

ppps. Someone emailed me to point out that I left out the definition of the ges_blocked table. I need to go back and review this because I am not sure that I didn’t make a mistake in my all.sh script. I found a script from this time that included the following create table statement for the ges_blocked table:

create table ges_blocked as
select
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
s.process,
p.spid,
e.RESOURCE_NAME1,
e.RESOURCE_NAME2
from
gv\$session s, 
gv\$process p,
gv\$ges_blocking_enqueue e
where
s.event='library cache lock' and 
s.inst_id=p.inst_id and
s.paddr=p.addr and
p.inst_id=e.inst_id and
p.spid=e.pid and
e.blocked > 0;

4/11/22

Standalone script on GitHub:

librarycachelocks.sql

Posted in Uncategorized | 2 Comments

Parse time by day of the week

I wanted to find out if queries against a particular table were experiencing parse time during the week.  We had previously seen issues where queries on the table would be stuck parsing for tens of minutes before they started actually running.  I think this has been resolved but I’m looking for some validation from ASH.  I know that on Sundays we recompress some subpartitions for the table and this leads to higher parse times if the users run queries then, but I’m not trying to resolve that at this time.  I just care about the other days of the week.  So, here is the query I used:

select 
to_char(sample_time,'DAY') sample_day,
IN_PARSE,
count(*)*10 total_seconds
from DBA_HIST_ACTIVE_SESS_HISTORY a,sqlids s
where 
a.sql_id= s.sql_id
group by to_char(sample_time,'DAY'),in_parse
order by to_char(sample_time,'DAY'),in_parse;

The table “sqlids” has a list of all the SQL statements that refer to the table.  I populated sqlids like this:

create table sqlids as select SQL_ID
        FROM DBA_HIST_SQLTEXT 
        where 1=2;

DECLARE 
    CURSOR SQL_CURSOR IS 
        SELECT DISTINCT 
            SQL_ID,
            DBID
        FROM DBA_HIST_SQLSTAT; 
    SQL_REC SQL_CURSOR%ROWTYPE;
    CURSOR TEXT_CURSOR(SQL_ID_ARGUMENT VARCHAR2,DBID_ARGUMENT NUMBER)
      IS 
        SELECT  
            SQL_ID
        FROM DBA_HIST_SQLTEXT
        WHERE 
            SQL_TEXT like '%MYTABLE%' and
            SQL_ID = SQL_ID_ARGUMENT and
            DBID = DBID_ARGUMENT;
    TEXT_REC TEXT_CURSOR%ROWTYPE;
BEGIN
    OPEN SQL_CURSOR;
    LOOP
        FETCH SQL_CURSOR INTO SQL_REC;
        EXIT WHEN SQL_CURSOR%NOTFOUND;

        OPEN TEXT_CURSOR(SQL_REC.SQL_ID,SQL_REC.DBID);
        LOOP
            FETCH TEXT_CURSOR INTO TEXT_REC;
            EXIT WHEN TEXT_CURSOR%NOTFOUND;
            insert into sqlids values (TEXT_REC.SQL_ID);
            commit;
         END LOOP;
        CLOSE TEXT_CURSOR;
     END LOOP;
    CLOSE SQL_CURSOR;
END;
/

Here is the output of the query I mentioned earlier with the parse time in seconds by day of the week:

SAMPLE_DA I TOTAL_SECONDS
--------- - -------------
FRIDAY    N       1092220
FRIDAY    Y        281980
MONDAY    N       2158620
MONDAY    Y         77860
SATURDAY  N       1483420
SATURDAY  Y        259680
SUNDAY    N      18939770
SUNDAY    Y      65665540
THURSDAY  N       1180780
THURSDAY  Y        300800
TUESDAY   N       1286980
TUESDAY   Y        108510
WEDNESDAY N       1399100
WEDNESDAY Y        306890

As I hoped Sunday is the only day of the week that parse time exceeds non-parse time for the sql statements against the given table.  I’m still validating this in other ways – most importantly by trying to contact the users to see if they perceive the problem to be gone.

But, I thought I’d pass along the query in case others find it useful.

– Bobby

Posted in Uncategorized | Leave a comment