I’m on call this week so I didn’t plan to do any blog posting but I had an interesting performance issue this morning that I thought would be helpful to document. This is somewhat redundant to earlier posts but maybe another real example would be helpful to others (and myself as I try to explain it).
So, I got paged at 5:16 AM today because the session count had gone up on a key database. I connected using Toad and saw many active sessions. A number were hung on row locks but others were running a normal query. At first I thought the locks were the issue until I looked at the AWR report.
I executed DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT, waited a couple of minutes, and then ran it again. This gave me a short time period while the problem was occurring. These were the top events:
Top 5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
enq: TX – row lock contention | 11 | 517 | 47038 | 24.97 | Application |
db file scattered read | 74,786 | 513 | 7 | 24.75 | User I/O |
db file sequential read | 61,708 | 448 | 7 | 21.62 | User I/O |
DB CPU | 431 | 20.82 | |||
read by other session | 47,482 | 416 | 9 | 20.08 | User I/O |
So, this pointed to the locks, but then I looked at the top sql:
Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Id | ||
---|---|---|---|---|---|---|---|---|
1,447.68 | 106 | 13.66 | 69.86 | 25.24 | 73.68 | b9ad7h2rvtxwf |
This is a pretty normal query but there isn’t anything on our system that runs for more than a fraction of a second normally so a top query running for 13 seconds means there was a problem.
So, then I ran my sqlstat.sql(updated) script for the one query b9ad7h2rvtxwf. Here were the results (edited to fit on the blog):
SQL_ID PLAN_HASH_VALUE END_INTERVAL_TIME EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms ------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- b9ad7h2rvtxwf 3569562598 22-OCT-13 04.00.09.971 PM 5058 147.520418 55.2906287 84.5061609 b9ad7h2rvtxwf 3569562598 22-OCT-13 05.00.29.056 PM 3553 138.164556 52.4176752 81.5798179 b9ad7h2rvtxwf 3569562598 22-OCT-13 06.00.48.108 PM 1744 131.999904 50.7224771 78.9727196 b9ad7h2rvtxwf 3569562598 22-OCT-13 07.00.07.435 PM 1151 129.623057 42.0590791 85.9294535 b9ad7h2rvtxwf 3569562598 22-OCT-13 08.00.28.121 PM 872 120.398335 36.7775229 83.2977339 b9ad7h2rvtxwf 3569562598 22-OCT-13 09.00.49.553 PM 604 132.635639 33.8245033 97.5104901 b9ad7h2rvtxwf 3569562598 22-OCT-13 10.00.09.004 PM 495 133.393253 33.1111111 99.2305293 b9ad7h2rvtxwf 405424828 22-OCT-13 11.00.29.567 PM 121 34966.0669 5395.86777 28047.5075 b9ad7h2rvtxwf 3569562598 22-OCT-13 11.00.29.567 PM 310 194.172239 34.7419355 159.561652 b9ad7h2rvtxwf 405424828 23-OCT-13 12.00.49.580 AM 367 13515.5776 4255.88556 9348.67928 b9ad7h2rvtxwf 405424828 23-OCT-13 01.00.08.933 AM 198 10156.8853 4131.56566 6100.66929 b9ad7h2rvtxwf 405424828 23-OCT-13 02.00.28.515 AM 121 13047.2503 4077.27273 9055.64753 b9ad7h2rvtxwf 405424828 23-OCT-13 03.00.48.829 AM 106 14410.4077 4095.75472 10457.2538 b9ad7h2rvtxwf 405424828 23-OCT-13 04.00.08.426 AM 132 13294.6225 3334.62121 10180.9116 b9ad7h2rvtxwf 405424828 23-OCT-13 05.00.29.523 AM 408 13075.3841 3327.35294 9828.64507 b9ad7h2rvtxwf 405424828 23-OCT-13 06.00.51.458 AM 1321 13314.1561 3119.74262 10091.0076 b9ad7h2rvtxwf 405424828 23-OCT-13 07.00.13.248 AM 3023 15692.5471 3126.99636 12361.605 b9ad7h2rvtxwf 405424828 23-OCT-13 07.39.33.767 AM 1924 16933.1762 3159.12682 13704.74 b9ad7h2rvtxwf 405424828 23-OCT-13 07.41.17.993 AM 106 13657.3272 3446.88679 10062.0426
So, between 10 and 11 pm last night the new plan 405424828 started being used and it resulted in the query that normally runs for 133 milliseconds (.1 seconds) running instead for 10091 ms = 10 seconds. Not good when the query is run over 1000 times per hour.
So, I copied the sqlt scripts from another database server and got into the utl directory and ran the following commands:
-- build script to load profile @coe_xfr_sql_profile.sql b9ad7h2rvtxwf 3569562598 -- run generated script @coe_xfr_sql_profile_b9ad7h2rvtxwf_3569562598.sql
Then the performance issues all went away because b9ad7h2rvtxwf went back to running in .1 seconds. Here is a previous post related to using sqlt this way.
The thing that strikes me about this scenario is that it really isn’t very hard to use this method to resolve these kind of issues. All you need to do is use the awr report and sqlstat.sql script to identify the sql_id and the good plan_hash_value and then you can force it to run this way.
– Bobby
Bobby – Thanks for sharing. I am supporting large PeopleSoft client and was looking for better tool to deal with those pesky Cobol queries from payroll. Sounds like this seems to be the perfect candidate? Have you done any calc tuning with SQLT yet? Thanks again.
I haven’t used SQLT and its sql profile scripts to tune payroll batch processes but I think you are right that this is a good candidate. The payroll processes should have static sql statements with bind variables. So, if you find that the plan changes for the worse on some SQL that is part of a calc job you could use the same process to force the previous good plan to run the next time you run the job. In the past I’ve used hints to do the same thing but that requires a change to the code. I’ve been tuning PeopleSoft since 1995, FYI.
Pingback: Using hints with coe_xfr_sql_profile.sql | Bobby Durrett's DBA Blog
It really helped in fixing bad plan with good plan in my environment
Great! SQL Profiles are great for quick fixes.
Hi Bobby
This is really awesome post . Many time we have faced this issue . I did this in lower environment and both phv to DBA to import profile from one sqlid to another. Could you please share content of coe_xfr_sql_profile.sql .Or explain what this script is doing . Thanks in advance.
Randhir
coe_xfr_sql_profile.sql is part of SQLT which you can download from Oracle support. Look at this support note:
SQLT (SQLTXPLAIN) – Tool that helps to diagnose a SQL statement performing poorly or one that produces wrong results [ID 215187.1]
coe_xfr_sql_profile.sql is in the utl directory. You pass it a sql_id to identify a SQL statement and a plan_hash_value to identify the plan that you want to lock it. coe_xfr_sql_profile.sql generates a script that will create a SQL Profile which forces the optimizer to choose the plan for the SQL statement.
You have to have access to Oracle’s support site to download SQLT. I can’t share it because it is Oracle’s script.
I have several blog posts about my use of coe_xfr_sql_profile.sql that might be helpful as well. You can search for coe_xfr_sql_profile.sql on my main page.
Bobby