Table Statistics With Histograms procedure – TSWH

I’ve uploaded the most recent version of a stored procedure that I use for gathering optimizer statistics on a table in Oracle.  You can download a zip here.  See the file buildandtest.bat to see how to build the needed tables and procs and run the supplied tests.

By default the proc tab_stats_with_hists will gather statistics on a large table with a small estimate percentage and no histograms.  I like this as a default because if you just run dbms_stats.gather_table_stats with all the defaults sometimes  it will run forever on a large table.  If you run tab_stats_with_hists with the defaults it will run quickly on a large table.

If you want to override the defaults you do so by inserting rows into the tables ESTIMATE_TABLES and HIST_COLUMNS.  So, if you want to set the estimate percentage to 1 percent on a table you would do this:

insert into ESTIMATE_TABLES values (‘YOURSCHEMA’,’YOURTABLE’,1);

If you wanted a histogram on a column you would do this:

insert into HIST_COLUMNS values (‘YOURSCHEMA’,’YOURTABLE’,’YOURCOLUMN’);

To run the proc on a table you would run it like this:

execute tab_stats_with_hists(‘YOURSCHEMA’,’YOURTABLE’,NULL);

To gather stats on one partition you would add the partition name:

execute tab_stats_with_hists(‘YOURSCHEMA’,’YOURTABLE’,’YOURPARTITION’);

The zip includes a test of a non-partitioned table, a partitioned table, and a subpartitioned table.

– Bobby

Posted in Uncategorized | Leave a comment

Testing maximum number of Oracle sessions supported by shared servers

I’ve been testing a shared servers configuration and I was asked to push a test database to its limits to see how many sessions it would support.  It was really just a test of how many could login, not how many could actively use the database at the same time.

The first thing I found out that was on my platform, HP-UX Itanium 11.1.0.7, the maximum value that I could set the init parameter sessions to and have the database come up was 65535.  It appears that session id is a 16 bit unsigned number which results in this limit, at least on the tested platform.  I got this interesting message in the alert log when I set sessions to 100000:

ORA-41009: Session with session number 34463, serial number 1 does not exist
PMON (ospid: 20330): terminating the instance due to error 41009

I got this fun error when I set sessions to 65536:

ORA-00600: internal error code, arguments: [kews_init_ses - bad sesid], [], [],
[], [], [], [], [], [], [], [], []

Otherwise I tested the limits by running one or more Java programs that open up a bunch of sessions and give me a session count after each 100 sessions opened.  I couldn’t open more than around 3700 sessions from my laptop no matter how many Java processes I ran so I must have hit some client networking limit.  Then I was able to run about 10 Java processes on a test database server and maxed out at around 11,000 sessions open.  I couldn’t open  more than about 2000 per Java process on the db server.  Shared pool memory appeared to be my limiting factor.  I had 7 gig sga, 2 gig large pool 2 gig shared pool.  I had 32 dispatchers and 32 shared servers.  At around 11,000 connections I started getting shared pool memory errors in the alert log.

Here is the Java program:

/**

 Attempt to max out the session count on a database.

 */

import java.sql.*;
import javax.sql.*;
import oracle.jdbc.*;
import oracle.jdbc.pool.OracleDataSource;

public class MaxSessions
{
  public static void main (String args [])
    throws SQLException
  {
    // Create a OracleDataSource instance explicitly
    OracleDataSource ods = new OracleDataSource();
    ods.setURL("jdbc:oracle:oci:TEST/TEST@testdb");

    // Retrieve a connection
    Connection conn = ods.getConnection();

    // Array of connections
    int num_connections=1000;

    Connection carray[]=new Connection[num_connections];

    // fill array with connections

    for (int i=0; i<num_connections; i++)
      {
      carray[i] = ods.getConnection();
      if ((i % 100)==0)
        getSessionCount(conn);
      }

    getSessionCount(conn);

    // close connections

    for (int i=0; i<num_connections; i++)
      carray[i].close();

    // Close the connection
    conn.close();
    conn = null;
    System.out.println ("All sessions closed, MaxSessions complete");
  }
  static void getSessionCount(Connection conn)
       throws SQLException
  {
    // Create a Statement
    Statement stmt = conn.createStatement ();

    ResultSet rset = stmt.executeQuery ("select count(*) from v$session");

    while (rset.next ())
      System.out.println ("Session count is " + rset.getString (1));

    // Close the RseultSet
    rset.close();
    rset =  null;

    // Close the Statement
    stmt.close();
    stmt = null;
  }
}

Here is my Unix script to kick 10 of these off at once:

$ cat run10.sh
export JDK_HOME=$ORACLE_HOME/jdk
export JAVAC=$JDK_HOME/bin/javac
export JAVA=$JDK_HOME/bin/java
export CLASSPATH15=.:$ORACLE_HOME/jdbc/lib/ojdbc5.jar:$ORACLE_HOME/jlib/fscontext.jar:$ORACLE_HOME/jlib/orai18n.jar
export LD_LIBRARY_PATH=$ORACLE_HOME/lib32:$LD_LIBRARY_PATH

$JAVAC -classpath $CLASSPATH15 -g  MaxSessions.java
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run1.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run2.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run3.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run4.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run5.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run6.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run7.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run8.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run9.out &
nohup $JAVA -classpath $CLASSPATH15 MaxSessions > run10.out &

Note:  I hacked the Oracle supplied Java demo program DataSource.java and the batch file rundemo.bat to build this test.  I don’t write much Java so I usually start with an Oracle demo.

Here is the output from out of the Java processes showing the sessions over 11,000:

$ cat run9.out
Session count is 46
Session count is 1065
Session count is 2081
Session count is 3095
Session count is 4060
Session count is 5036
Session count is 5995
Session count is 6979
Session count is 7996
Session count is 8974
Session count is 9988
Session count is 10874
Session count is 10685
Session count is 11565
Exception in thread "main" java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 248 bytes of shared memory ("shared pool","unknown object","kgsp-heap","kgllk")

– Bobby

Posted in Uncategorized | 1 Comment

Lessons from RMAN restore and recovery

I was on call last week and had to recover a production database to a particular point in time to recover some data that was lost.  I had to leave the production database intact but use a development server to restore a copy of production as of the needed date and time so the development team could extract the needed data.  Before I forget what I did last week I wanted to document some of the lessons I learned.

Open severity 1 ticket with Oracle support

It was nice to have Oracle support holding my hand even though my pride would have had me try to go it alone.  But, we are paying for support so I might as well use it and they were helpful.  They pointed me to document 1338193.1 “How to Move/Restore DB to New Host and File System using RMAN”.  They also told me not to use our recovery catalog but to only use a control file since we are cloning production to another host instead of recovering it in place.

Recover production on dev host as same name

The recovered database will have the same name as production after the recovery is complete.  So, when I logged in as oracle on my development box (hp-ux) I had to set ORACLE_SID to the production SID:

export ORACLE_SID=PRDDBxx

But, everything else was the same in terms of the environment that was already setup for the development databases.

Create prod copy spfile from dev spfile

I wanted to use a lot of the dev database settings, i.e. memory settings, for our production copy so I created the new production copy’s spfile from the existing development DB spfile.  I did a create pfile from spfile while connected to the dev database.   Then I edited the three parameters db_name, db_unique_name, and service_names in the pfile to have PRDDBxx instead of DEVDBxx but left the rest alone.  Next I created the prod copy spfile using create spfile from pfile pointing to the edited pfile while connected to the new prod copy database.

Create password file for prod copy

I used the same SYS password for the copy of production as I had used for the dev database it was replacing using orapwd.  It was something like this:

/opt/oracle/product/db/11.2.0.3/bin/orapwd file=/opt/oracle/product/db/11.2.0.3/dbs/orapwPRDDBxx password=xyz force=y

Now I think you can do a startup nomount, but not a startup mount because you have the spfile and pwd file but not the control files that the spfile points to.

Copy production control file to development control file locations

No need to try to restore the control file from RMAN backups.  Just do

alter database backup controlfile to ‘xxx’;

while connected to the production database.  Then ftp the resulting file to the new host and copy the file to the locations pointed to by the control_files parameter in new spfile.  Now you can do a startup mount on the new copy of PRDDBxx on the new host.

Get tape system parameters from prod backup logs

I saved off the output for the production full hot backup and log backups so I had any information I needed for the recovery.  Mainly what I needed was the parameters for the channels like this:

allocate channel ‘dev_0’ type ‘sbt_tape’
parms ‘SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDBxx,OB2BARLIST=VTL_prdhost_PRDDBxx_ora1)’;

This includes the library that our backup system uses and information about the backup that we will be restoring from.  I just copied these parameters from the production backup script.

Rename datafiles and tempfiles to fit in new  locations on target

set newname for datafile 1 to  ‘/var/opt/oracle/db01/DEVDBxx/dbf/system01.dbf’;

set newname for tempfile 1 to  ‘/var/opt/oracle/db06/DEVDBxx/dbf/temp01.dbf’;

I documented the script I used to build these commands in my previous post.

Figure out how to rename redo logs

I was able to restore the redo logs to the dev db server in the same directory path as production.  But, if there wasn’t enough room I would would have to figure out how to put them somewhere else.  I guess that after you did a startup mount on your control files you could so something to re-point the logs.  But, I just ended up creating the directory that we had on production on dev and I had enough space free to use that filesystem.  I.e.

cd /var/opt/oracle/log01
mkdir PRDDBxx
cd PRDDBxx
mkdir log

Set archive log destination or delete logs after their use

I didn’t do either of these things but got lucky.  The recovery brings in all the archive logs you need and leaves them in the archive log destination, even after applying them for the recovery.  The default archive log destination was $ORACLE_HOME/dbs and there was enough space free to hold all the needed logs.  I should have set the archive log destination in the spfile before doing the recovery.  Also, it looks like you can use the DELETE ARCHIVELOG option of the RECOVER RMAN command and then RMAN won’t fill up your archivelog destination because it will delete the archive logs as soon as it uses them.  But, I haven’t verified this.  I just got lucky that I had enough space.

Use nid command to rename new database

I renamed the recovered production database like this:

nid target=sys/xyz dbname=DEVDBxx

This is a lot easier than what I’m used to doing which is to recreate the control file by backing it up to trace and editing the SQL to have the new DB name.

Order of switch commands

Not sure if this matters but this is where I put the switch commands in my RMAN script:

restore database;
switch datafile all;
switch tempfile all;
recover database;

I wasn’t sure if it mattered if they came before or after the recover command but this order worked for me.

Work with backup admin to free devices

This is another thing I didn’t do that probably would have resulted in a faster restore.  I was using several channels but it appeared that only one virtual tape device at a time was being used.  Presumably all the other devices were doing backups and one or more backups could have been delayed or cancelled to free up devices for the restore.

My final script looked something like this:

rman target / <<EOF

run {
allocate channel 'dev_0' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDB01,OB2BARLIST=VTL_prodhost_PRDDB01_ora1)';
allocate channel 'dev_1' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDB01,OB2BARLIST=VTL_prodhost_PRDDB01_ora1)';
allocate channel 'dev_2' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDB01,OB2BARLIST=VTL_prodhost_PRDDB01_ora1)';
allocate channel 'dev_3' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDB01,OB2BARLIST=VTL_prodhost_PRDDB01_ora1)';
allocate channel 'dev_4' type 'sbt_tape'
parms 'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=PRDDB01,OB2BARLIST=VTL_prodhost_PRDDB01_ora1)';
set until time "to_date('Jun 18 2013 08:00:00','Mon DD YYYY HH24:MI:SS')";
set newname for datafile 1 to  '/var/opt/oracle/db01/DEVDB01/dbf/system01.dbf';
...
set newname for datafile 150 to  '/var/opt/oracle/db09/DEVDB01/dbf/psimage_5.dbf';
set newname for tempfile 1 to  '/var/opt/oracle/db06/DEVDB01/dbf/temp01.dbf';
restore database;
switch datafile all;
switch tempfile all;
recover database;
}

exit
EOF

I ran this nohup as a script:

nohup ./recover.sh > recover.out &

– Bobby

 

Posted in Uncategorized | 7 Comments

Renaming files for RMAN restore

Here is a script I hacked together Tuesday night to figure out how to fit all my production data files and temp files on a development server for a RMAN restore of a production backup onto a new server:

DECLARE 
    CURSOR DF_CURSOR IS 
        SELECT
            *
        FROM dba_data_files
        order by FILE_ID;

    DF_REC DF_CURSOR%ROWTYPE;

    CURSOR TF_CURSOR IS 
        SELECT
            *
        FROM dba_temp_files
        order by FILE_ID;

    TF_REC TF_CURSOR%ROWTYPE;

    db01 number;
    db02 number;
    db03 number;
    db04 number;
    db05 number;
    db06 number;
    db07 number;
    db08 number;
    db09 number;

BEGIN
db08 := 16561550;
db07 := 19548242;  
db06 := 91252087;  
db05 := 29913520;  
db04 := 18507885;  
db03 := 6206062;   
db02 := 64145394;  
db01 := 265206680; 
db09 := 323990034;

    OPEN DF_CURSOR;
LOOP
    FETCH DF_CURSOR INTO DF_REC;
    EXIT WHEN DF_CURSOR%NOTFOUND;

    IF (DB01 > ((DF_REC.BYTES/1024)+1024)) THEN

       dbms_output.put_line('set newname for datafile '||DF_REC.FILE_ID||' to  '''||
         substr(DF_REC.FILE_NAME,1,16)||'db01/NEWDBXX'||substr(DF_REC.FILE_NAME,29)||
         ''';');
       DB01 := DB01 - (DF_REC.BYTES/1024);

    ELSIF (DB09 > ((DF_REC.BYTES/1024)+1024)) THEN

       dbms_output.put_line('set newname for datafile '||DF_REC.FILE_ID||' to  '''||
         substr(DF_REC.FILE_NAME,1,16)||'db09/NEWDBXX'||substr(DF_REC.FILE_NAME,29)||
         ''';');
       DB09 := DB09 - (DF_REC.BYTES/1024);

    ELSIF (DB06 > ((DF_REC.BYTES/1024)+1024)) THEN

       dbms_output.put_line('set newname for datafile '||DF_REC.FILE_ID||' to  '''||
         substr(DF_REC.FILE_NAME,1,16)||'db06/NEWDBXX'||substr(DF_REC.FILE_NAME,29)||
         ''';');
       DB06 := DB06 - (DF_REC.BYTES/1024);

    end if;

END LOOP;
CLOSE DF_CURSOR;

    OPEN TF_CURSOR;
LOOP
    FETCH TF_CURSOR INTO TF_REC;
    EXIT WHEN TF_CURSOR%NOTFOUND;    
   IF (DB06 > ((TF_REC.BYTES/1024)+1024)) THEN

       dbms_output.put_line('set newname for tempfile '||TF_REC.FILE_ID||' to  '''||
         substr(TF_REC.FILE_NAME,1,16)||'db06/NEWDBXX'||substr(TF_REC.FILE_NAME,29)||
         ''';');
       DB06 := DB06 - (TF_REC.BYTES/1024);

    end if;

END LOOP;
CLOSE TF_CURSOR;
END;
/

The filesystems and paths to the datafiles and temp files looked like this:

/var/opt/oracle/db01/OLDDBXX/dbf/system01.dbf

/var/opt/oracle/db06/NEWDBXX/dbf/system01.dbf

Lots of stuff is hard coded such as the space available in each filesystem like this:

db08 := 16561550;

The units are kilobytes which is the output of bdf in HP-UX.

Might not be useful since it isn’t that parametrized and automatic but if you need some code for fitting a bunch of datafiles in a new set of filesystems it might help you find a place to start.  Also, I tried the three filesystems with the most free space first and didn’t really need any others so that is why the if statements only have db01, db09, and db06 but you could add more.

Output is RMAN commands like these:

set newname for datafile 1 to  ‘/var/opt/oracle/db01/NEWDBXX/dbf/system01.dbf’;

set newname for tempfile 1 to  ‘/var/opt/oracle/db06/NEWDBXX/dbf/temp01.dbf’;

– Bobby

Posted in Uncategorized | 10 Comments

Craig Shallahamer

I’ve been thinking about how to use my blog to describe different individuals who have helped me learn about Oracle performance tuning.  I was thinking about having a single blog entry or page describing several people but have decided that will be too long and overwhelming to read or to write.  So, I got the idea of writing about a single individual at a time.  Now, the challenge about writing about someone is that this is only my perspective of them and I could have some details wrong.  So, if there are any errors or negative implications then that is on me and no reflection on the individual I’m describing.  I’m including this person because their work has helped me in my job as an Oracle DBA in the performance tuning area so the intent is to be positive and to encourage other people to benefit from their work.  In most cases these individuals have free or inexpensive materials available that can be of great worth to an Oracle DBA who is working on tuning.

So, I’m starting with Craig Shallahamer because he was the first individual who really helped me learn about Oracle performance.  I started as an Oracle DBA in December of 1994 while working with Campus Crusade for Christ.  At CCC we migrated all of our applications off of a small IBM mainframe and on to an Oracle database running PeopleSoft applications.  But, at some point our new donations system was having bad performance.  It was slower than the mainframe system we had just replaced.  A couple of managers in our IT area ran into Craig and connected him up with me.  It ended up that we were saturating one of our disk arrays with random I/O while our other arrays were underutilized.  Craig’s free papers on Oracle waits and the use of operating system monitoring utilities such as sar -d really helped me understand how to diagnose our disk I/O issues.

If you want to check out Craig’s resources go to his OraPub website.  I have read several of his papers – most notably ones about Oracle waits and about response time/queuing theory.  Also, I’ve learned from his free tool called OSM which records V$ table values and operating system tool outputs.  Craig sells a new tool called Stori which I know he is excited about, and he has a number of other free tools as well. I haven’t been able to take one of his paid courses but I’ve read the materials for his firefighting class several years ago and it was very helpful.  I also read Craig’s first book which is on forecasting performance.

It is hard to do justice to someone’s work and I’m not sure I’m explaining it that well.  I think of Craig in terms of operating system/hardware, Oracle internals, waits, mathematical models, performance tuning theory.  It has been helpful for me to just take one of his excel spreadsheets and play with the values and think about what the curve means in terms of queuing and performance.  Most DBAs I know aren’t that comfortable with operating system utilities like sar and don’t know what values to look for.  And, few people seem to understand queuing theory and other mathematical models of Oracle systems.

So, Craig is a good source for Oracle performance information that may not be readily available to the average DBA unless they know where to look.  I hope others find his work as helpful in their jobs as I have.

– Bobby

Posted in Uncategorized | 2 Comments

Comparing Characters and Numbers in Oracle

In Oracle when you compare a character and a number in a SQL WHERE clause Oracle converts the character value to a number rather than the other way around.  I built a simple example to help me understand how this works.  I wanted to see if the results were different depending on whether you converted the number to a character or the other way around, and found out that the results are different.

Here is how I setup the two tables I would join:

create table ntest (ncolumn number);
create table ctest (ccolumn varchar2(2000));

insert into ntest values (1);
insert into ctest values ('1');
insert into ctest values ('x');

commit;

Here is the query with the comparison of the numeric column ntest.ncolumn to the character column ctest.ccolumn and the results:

SQL> select ncolumn,ccolumn
  2  from ntest,ctest
  3  where
  4  ntest.ncolumn=ctest.ccolumn;
ERROR:
ORA-01722: invalid number

This is the equivalent of adding the to_number conversion function explicitly:

SQL> select ncolumn,ccolumn
  2  from ntest,ctest
  3  where
  4  ntest.ncolumn=to_number(ctest.ccolumn);
ERROR:
ORA-01722: invalid number

In both cases you get the error because it tries to convert the value ‘x’ to a number.

But, if you explicitly convert the number column to a character you don’t get the error:

SQL> select ncolumn,ccolumn
  2  from ntest,ctest
  3  where
  4  to_char(ntest.ncolumn)=ctest.ccolumn;

   NCOLUMN CCOLUMN
---------- ------------
         1 1

The two choices of which column to convert produce two different results.  When you design your application if you have to compare character and number values you should put in the to_char or to_number to force the conversion to be the way you need it to be, or at least you should be aware that Oracle converts a character to a number by default.

– Bobby

Posted in Uncategorized | 4 Comments

Yet Another Bind Variable Type Mismatch (YABVTM)

Hopefully this isn’t too redundant.  Saw another performance issue today caused by a type mismatch between a bind variable and the column it was compared to in a SQL statement.  I saw this in some real code and then built a simple testcase to prove out the concept: zip of testcase SQL and log.

The column being compared against was type VARCHAR2 and the bind variable was NUMBER.  Converting the variable to a character type with TO_CHAR resolved the issue.

Table definition:

create table test
(OWNER  VARCHAR2(30),
 TABLE_NAME VARCHAR2(30),
 TEST_COLUMN VARCHAR2(2000));

create index testi on test(TEST_COLUMN);

PL/SQL variable of the wrong type (number):

declare
V_NUMBER number;
begin

V_NUMBER := 1;

Execute immediate passing the bind variable that does not match the type of the column TEST_COLUMN:

execute immediate 'UPDATE TEST SET TABLE_NAME=''UPDATED'' 
WHERE TEST_COLUMN=:1' using V_NUMBER;

Note that I put in a carriage return so it would fit the blog width.  It is one line in the script.

Execute immediate converting the variable to a character before passing it so it matches the column in the where clause:

execute immediate 'UPDATE TEST SET TABLE_NAME=''UPDATEDAGAIN'' 
WHERE TEST_COLUMN=:1' using to_char(V_NUMBER);

With the TO_CHAR the update is much faster:

No TO_CHAR:

Elapsed: 00:00:01.45

With TO_CHAR:

Elapsed: 00:00:00.04

Plan without the TO_CHAR is full scan:

SQL_ID  41vfab6v87w4g, child number 0
-------------------------------------
UPDATE TEST SET TABLE_NAME='UPDATED' WHERE TEST_COLUMN=:1

Plan hash value: 3859524075

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |      |       |       |  2579 (100)|          |
|   1 |  UPDATE            | TEST |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     2 |    46 |  2579   (4)| 00:00:37 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER("TEST_COLUMN")=:1)

Plan with the TO_CHAR is index range scan:

SQL_ID  3229aq5w36kst, child number 0
-------------------------------------
UPDATE TEST SET TABLE_NAME='UPDATEDAGAIN' WHERE TEST_COLUMN=:1

Plan hash value: 3736755925

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |       |       |       |     5 (100)|          |
|   1 |  UPDATE           | TEST  |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| TESTI |     2 |    46 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEST_COLUMN"=:1)

– Bobby

 

Posted in Uncategorized | 2 Comments

How to figure out disk network latency using cached blocks

I was working on some disk performance problems and came across a simple way to test the latency or delay built into the networks that connect my database servers to their disk storage.  All I have to do is read some disk blocks from a table several times to be sure they are cached in the disk storage server and then flush the database buffer cache and read from the table once more.  Since the blocks are not cached in the database buffer cache but are cached on the disk array the time it takes to read the blocks approaches the time it takes to copy the blocks over the network.  Of course there is some CPU on both the source and target to copy the blocks in memory but the physical disk read is eliminated and you see close to the minimum time it is possible for a transfer to take.

So, here is my simple test script.  It assumes the user, password, and tnsnames.ora name are passed as parameters and that the user is a DBA user like SYSTEM.

connect &1/&2@&3

-- create test that will show how fast reads
-- can be when cached on the disk subsystem

-- document version

select * from v$version;

-- create small table

drop table test;
create table test as select * from dba_tables where rownum < 1000;

-- query three times to get cached in the disk system

select sum(blocks) from test;
select sum(blocks) from test;
select sum(blocks) from test;

-- flush from database cache

alter system flush buffer_cache;

-- reconnect so session waits are cleared

disconnect
connect &1/&2@&3

select sum(blocks) from test;

-- show average scattered read
-- should be best time you can
-- get since blocks are cached
-- in the disk subsystem

VARIABLE monitored_sid number;

begin

SELECT sid into :monitored_sid from v$session 
where audsid=USERENV('SESSIONID');

end;
/

select  
total_waits,
time_waited_micro/total_waits
FROM V$SESSION_EVENT a
WHERE a.SID= :monitored_sid
and event='db file scattered read';

Here is what the output looks like on a slow system with network problems (over NFS in my case):

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          4                       1410.25

Here is what it looks like on a nice fiber network:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          4                          40.5

40 microseconds for a disk read is sweet compared to 1410!

I’ve uploaded the script and three runs I made with it here.

– Bobby

P.S.  I realized this morning that the first example with 40 microseconds for a read is really just pulling from the operating system’s filesystem cache because the database I was testing on doesn’t use direct IO.  Here is a test on a system with the SAN that uses direct IO:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          5                         478.4

I ran this several times and this was the best result.  The others were in the 600-700 microsecond range mostly.  So, I guess when you run the test script you are really testing everything behind the scenes that could benefit from caching except for the database’s buffer cache.  Without direct IO you may never reach your SAN network because you will have cached the blocks at the OS filesystem cache level.

 

Posted in Uncategorized | Leave a comment

Subscribing to Oak Table blogs feed

I’ve seen some very good information posted in this feed which combines blog postings from many different Oracle performance experts who are part of what is called the “Oak Table”

http://www.oaktable.net/feed/blog-rss.xml (NO LONGER EXISTS)

I’ve been using Internet Explorer to keep track of new posts in its “Feeds” section of the Favorites.  Here is how to add the Oak Table blog feed to Internet Explorer:

i1

Go to the URL listed above and click on “Subscribe to this feed”

i2

Click on Subscribe button

i3

Success!  Now click on Favorites and then Feeds

i4

For any feed in your list if you see the feed name in a darker font it means there is a new post.  So, as I have time, I’ll go to my feeds and see which of the ones I’ve subscribed to have new posts.  If you are looking for performance tuning information I highly recommend the Oak Table feed.

– Bobby

Posted in Uncategorized | Leave a comment

inode lock contention

In my forum discussion about free buffer waits I came across a term that I didn’t understand: “inode lock contention”.  I’m pretty sure I had seen this same term years ago on one of Steve Adams’ pages on IO.  But, I didn’t really understand what the term meant and so it was hard to understand whether this was something I was seeing on our production system that was experiencing “free buffer waits”.

First I had to figure out what an inode was.  I knew that it had something to do with the way Unix filesystems work but reading this article(NO LONGER EXISTS) really helped clear up what inodes are at least on HP-UX.  Inodes are small chunks of bytes that are used to define a Unix filesystem.  On HP-UX’s VxFS filesystems a type 1 inode can point to up to 10 extents of one or more contiguous 8K blocks on a large filesystem.  The filesystem I’ve been testing on appears to have 32 meg extents if I’m reading this output from lvdisplay correctly:

LV Size (Mbytes)            1472000
Current LE                  46000

Total size of 1,472,000 meg divided by 46,000 logical extents = 32 meg per extent.

Since the inode can point to 1 to 10 extents it could point to between 32 and 320 meg.

My test case had 15 tables that were more than 1 gigabytes each.  It seems like each table should span multiple inodes so even if there is locking at the inode level it looks like it won’t lock the entire table at once.  Still, it seems unlikely to me that every time a table is updated that reads from all the other parts of the table pointed to by the same inode are really blocked by an inode lock.  Yet that is what this document(NO LONGER EXISTS) suggests:

“During a read() system call, VxFS will acquire the inode lock in shared mode, allowing many processes to read a single file concurrently without lock contention. However, when a write() system call is made, VxFS will attempt to acquire the lock in exclusive mode. The exclusive lock allows only one write per file to be in progress at a time, and also blocks other processes reading the file. These locks on the VxFS inode can cause serious performance problems when there are one or more file writers and multiple file readers.”

It uses the term “file” but I assume if you have a large file that has multiple inodes it means it will lock just the pieces associated with the one inode that points to the blocks that are being written.  The article goes on to explain how you can use the “cio” option to enable concurrent IO and eliminate this inode contention preventing writers from blocking readers.  But, I’ve been testing with just the direct IO options and not the cio option and seeing great results.  So, would I see even better improvement with concurrent io?

I didn’t want to mess with our current filesystem mount options since testing had proven them to be so effective but I found that in glance, a performance monitoring tool like top, you have an option to display inode waits.  So, I took a test that was running with direct IO and had 15 merge statements loading data into the same empty table at once and ran glance to see if there were any inode waits.  There were not:

inodewaits

So, I don’t know if I can depend on this statistic in glance or not.  It appears that the direct IO mount options are all we need:

mincache=direct,convosync=direct

filesystemio_options=DIRECTIO

There may be some case within Oracle 11.2.03 on HP-UX 11.31 where you can be hampered by inode lock contention despite having direct IO enabled but my tests have not confirmed it and I’ve banged pretty hard on my test system with a couple of different types of tests.

– Bobby

Posted in Uncategorized | 2 Comments