SQLT Emergency Plan Change

I’m on call this week so I didn’t plan to do any blog posting but I had an interesting performance issue this morning that I thought would be helpful to document.  This is somewhat redundant to earlier posts but maybe another real example would be helpful to others (and myself as I try to explain it).

So, I got paged at 5:16 AM today because the session count had gone up on a key database.  I connected using Toad and saw many active sessions.  A number were hung on row locks but others were running a normal query.  At first I thought the locks were the issue until I looked at the AWR report.

I executed DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT, waited a couple of minutes, and then ran it again.  This gave me a short time period while the problem was occurring.  These were the top events:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX – row lock contention 11 517 47038 24.97 Application
db file scattered read 74,786 513 7 24.75 User I/O
db file sequential read 61,708 448 7 21.62 User I/O
DB CPU 431 20.82
read by other session 47,482 416 9 20.08 User I/O

So, this pointed to the locks, but then I looked at the top sql:

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id
1,447.68 106 13.66 69.86 25.24 73.68 b9ad7h2rvtxwf

This is a pretty normal query but there isn’t anything on our system that runs for more than a fraction of a second normally so a top query running for 13 seconds means there was a problem.

So, then I ran my sqlstat.sql(updated) script for the one query b9ad7h2rvtxwf.  Here were the results (edited to fit on the blog):

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms
------------- --------------- ------------------------- ---------------- ------------------ -------------- -------------
b9ad7h2rvtxwf      3569562598 22-OCT-13 04.00.09.971 PM             5058         147.520418     55.2906287    84.5061609
b9ad7h2rvtxwf      3569562598 22-OCT-13 05.00.29.056 PM             3553         138.164556     52.4176752    81.5798179
b9ad7h2rvtxwf      3569562598 22-OCT-13 06.00.48.108 PM             1744         131.999904     50.7224771    78.9727196
b9ad7h2rvtxwf      3569562598 22-OCT-13 07.00.07.435 PM             1151         129.623057     42.0590791    85.9294535
b9ad7h2rvtxwf      3569562598 22-OCT-13 08.00.28.121 PM              872         120.398335     36.7775229    83.2977339
b9ad7h2rvtxwf      3569562598 22-OCT-13 09.00.49.553 PM              604         132.635639     33.8245033    97.5104901
b9ad7h2rvtxwf      3569562598 22-OCT-13 10.00.09.004 PM              495         133.393253     33.1111111    99.2305293
b9ad7h2rvtxwf       405424828 22-OCT-13 11.00.29.567 PM              121         34966.0669     5395.86777    28047.5075
b9ad7h2rvtxwf      3569562598 22-OCT-13 11.00.29.567 PM              310         194.172239     34.7419355    159.561652
b9ad7h2rvtxwf       405424828 23-OCT-13 12.00.49.580 AM              367         13515.5776     4255.88556    9348.67928
b9ad7h2rvtxwf       405424828 23-OCT-13 01.00.08.933 AM              198         10156.8853     4131.56566    6100.66929
b9ad7h2rvtxwf       405424828 23-OCT-13 02.00.28.515 AM              121         13047.2503     4077.27273    9055.64753
b9ad7h2rvtxwf       405424828 23-OCT-13 03.00.48.829 AM              106         14410.4077     4095.75472    10457.2538
b9ad7h2rvtxwf       405424828 23-OCT-13 04.00.08.426 AM              132         13294.6225     3334.62121    10180.9116
b9ad7h2rvtxwf       405424828 23-OCT-13 05.00.29.523 AM              408         13075.3841     3327.35294    9828.64507
b9ad7h2rvtxwf       405424828 23-OCT-13 06.00.51.458 AM             1321         13314.1561     3119.74262    10091.0076
b9ad7h2rvtxwf       405424828 23-OCT-13 07.00.13.248 AM             3023         15692.5471     3126.99636     12361.605
b9ad7h2rvtxwf       405424828 23-OCT-13 07.39.33.767 AM             1924         16933.1762     3159.12682      13704.74
b9ad7h2rvtxwf       405424828 23-OCT-13 07.41.17.993 AM              106         13657.3272     3446.88679    10062.0426

So, between 10 and 11 pm last night the new plan 405424828 started being used and it resulted in the query that normally runs for 133 milliseconds (.1 seconds) running instead for 10091 ms = 10 seconds.  Not good when the query is run over 1000 times per hour.

So, I copied the sqlt scripts from another database server and got into the utl directory and ran the following commands:

-- build script to load profile

@coe_xfr_sql_profile.sql b9ad7h2rvtxwf 3569562598

-- run generated script

@coe_xfr_sql_profile_b9ad7h2rvtxwf_3569562598.sql

Then the performance issues all went away because b9ad7h2rvtxwf went back to running in .1 seconds.  Here is a previous post related to using sqlt this way.

The thing that strikes me about this scenario is that it really isn’t very hard to use this method to resolve these kind of issues.  All you need to do is use the awr report and sqlstat.sql script to identify the sql_id and the good plan_hash_value and then you can force it to run this way.

– Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

7 Responses to SQLT Emergency Plan Change

  1. umyal2001 says:

    Bobby – Thanks for sharing. I am supporting large PeopleSoft client and was looking for better tool to deal with those pesky Cobol queries from payroll. Sounds like this seems to be the perfect candidate? Have you done any calc tuning with SQLT yet? Thanks again.

    • Bobby says:

      I haven’t used SQLT and its sql profile scripts to tune payroll batch processes but I think you are right that this is a good candidate. The payroll processes should have static sql statements with bind variables. So, if you find that the plan changes for the worse on some SQL that is part of a calc job you could use the same process to force the previous good plan to run the next time you run the job. In the past I’ve used hints to do the same thing but that requires a change to the code. I’ve been tuning PeopleSoft since 1995, FYI.

  2. Pingback: Using hints with coe_xfr_sql_profile.sql | Bobby Durrett's DBA Blog

  3. Vineeth says:

    It really helped in fixing bad plan with good plan in my environment

  4. Randhir says:

    Hi Bobby
    This is really awesome post . Many time we have faced this issue . I did this in lower environment and both phv to DBA to import profile from one sqlid to another. Could you please share content of coe_xfr_sql_profile.sql .Or explain what this script is doing . Thanks in advance.

    Randhir

    • Bobby says:

      coe_xfr_sql_profile.sql is part of SQLT which you can download from Oracle support. Look at this support note:

      SQLT (SQLTXPLAIN) – Tool that helps to diagnose a SQL statement performing poorly or one that produces wrong results [ID 215187.1]

      coe_xfr_sql_profile.sql is in the utl directory. You pass it a sql_id to identify a SQL statement and a plan_hash_value to identify the plan that you want to lock it. coe_xfr_sql_profile.sql generates a script that will create a SQL Profile which forces the optimizer to choose the plan for the SQL statement.

      You have to have access to Oracle’s support site to download SQLT. I can’t share it because it is Oracle’s script.

      I have several blog posts about my use of coe_xfr_sql_profile.sql that might be helpful as well. You can search for coe_xfr_sql_profile.sql on my main page.

      Bobby

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.