Alerting on plans that change for the worse

I’ve uploaded a monitoring script that I have worked on: zip.

The script alerts you when the optimizer runs a SQL statement with a potentially new and inefficient plan so you can intervene.  This script improves upon my earlier script which only alerts you to SQL statements running with new plans.  The new script compares the average elapsed time of the current plan with the average of the most often executed plan.  If the new plan averages more than ten times the most often executed plan then the script alerts you to a possible new slow plan.  The elapsed time for the current plan comes from V$SQL and the time for the most often executed plan comes from DBA_HIST_SQLSTAT.

Here is an example output from the first test case:

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
a1fw5xjcwkjqx      1357081020

There are two test cases included in the script.  The first test case has a query which uses an index and the plan changes when I drop the index.

If you look at the output for the first test case you see the good plan:

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 268773832

---------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |    58 (100)|
|*  1 |  INDEX RANGE SCAN| TESTI | 16222 |   221K|    58   (0)|
---------------------------------------------------------------

The good plan uses the index.  Here is the bad plan.  Note how the sql_id and plan_hash_value correspond to the output of the monitor script.

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 1357081020

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       | 15708 (100)|
|*  1 |  TABLE ACCESS FULL| TEST |  8111 |   110K| 15708   (1)|
---------------------------------------------------------------

I have this running in a cron job on a development server and I plan to put the script into our production server next week and set it up to alert me with an email when the optimizer runs SQL statements with potentially  new and inefficient plans.

– Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

13 Responses to Alerting on plans that change for the worse

  1. Ashwani says:

    Hi Bobby,

    could your share shell(sh/ksh) script which we need to schdule in cron ?, actually am looking script which will trigger email when plan change for any sql_id, if no change then no email

    • Bobby says:

      I cannot share the entire script for security reasons. I took out the environment variables and paths. But here is as much of my script as I can share:

      rm planchangemonitor.log
      sqlplus system/MYPASSWORD << EOF
      set linesize 160
      set pagesize 1000
      set echo on
      set termout on
      set trimspool on
      set define off
      -- status active
      -- this plan isn't the most commonly
      -- run plan.
      -- average this plan in v\$sql > 10 x average of
      -- most common plan
      -- status active sessions
      -- sql_id and plan_hash_value
      -- elapsed and executions
      -- max elapsed for the sql
      -- eliminate pl/sql which has
      -- plan hash value of 0
      -- 05/01/2019 Bobby Durrett
      -- Added filter to require sql statement
      -- to have been used at least 3 days ago
      drop table active_sql;
      create table active_sql as
      select
      vs.sql_id,
      sq.plan_hash_value,
      max(sq.ELAPSED_TIME) elapsed,
      max(sq.executions) executions
      from
      v\$session vs,
      v\$sql sq
      where
      vs.sql_id=sq.sql_id and
      vs.SQL_CHILD_NUMBER=sq.child_number and
      vs.status='ACTIVE' and
      sq.plan_hash_value <> 0 and
      vs.sql_id in
      (
      select
      ss.sql_id
      from
      dba_hist_sqlstat ss,
      dba_hist_snapshot sn
      where
      ss.SNAP_ID = sn.SNAP_ID and
      ss.DBID = sn.DBID and
      ss.INSTANCE_NUMBER = sn.INSTANCE_NUMBER and
      to_date(to_char(sn.BEGIN_INTERVAL_TIME,'YYYY-MM-DD'),'YYYY-MM-DD') < sysdate - 3
      )
      group by
      vs.sql_id,
      sq.plan_hash_value;
      -- to get the most frequently
      -- used plan first get the number
      -- of exections by plan
      -- for each of the active sqls
      drop table plan_executions;
      create table plan_executions as
      select
      ss.sql_id,
      ss.plan_hash_value,
      sum(ss.executions_delta) total_executions
      from
      dba_hist_sqlstat ss,
      active_sql a
      where
      a.sql_id=ss.sql_id
      group by
      ss.sql_id,
      ss.plan_hash_value;
      -- use the previous table to get
      -- the plans that are most frequently
      -- used. note that two plans could
      -- have the same number of executions
      -- but this is unlikely.
      drop table most_frequent_executions;
      create table most_frequent_executions as
      select
      pe1.sql_id,
      pe1.plan_hash_value
      from plan_executions pe1
      where
      pe1.total_executions =
      (select max(pe2.total_executions)
      from plan_executions pe2
      where
      pe1.sql_id=pe2.sql_id);
      -- handle special case of two plans with
      -- same number of executions.
      -- pick one with highest plan value
      -- just to eliminate dups.
      drop table most_frequent_nodups;
      create table most_frequent_nodups as
      select
      mfe1.sql_id,
      mfe1.plan_hash_value
      from most_frequent_executions mfe1
      where
      mfe1.plan_hash_value =
      (select max(mfe2.plan_hash_value)
      from most_frequent_executions mfe2
      where
      mfe1.sql_id=mfe2.sql_id);
      -- get list of active sql that
      -- are not running the most
      -- frequently executed plan
      drop table not_most_freq;
      create table not_most_freq as
      select * from active_sql
      where
      (sql_id,plan_hash_value) not in
      (select sql_id,plan_hash_value from most_frequent_nodups);
      -- working on this part of the logic:
      -- average this plan in v\$sql > 10 x average of
      -- most common plan
      -- get average elapsed of most
      -- frequently executed plans
      -- add 1 to handle case of 0 executions
      drop table avg_elapsed_most_f;
      create table avg_elapsed_most_f as
      select
      ss.sql_id,
      ss.plan_hash_value,
      sum(ss.ELAPSED_TIME_DELTA)/(sum(ss.executions_delta)+1) avg_elapsed
      from DBA_HIST_SQLSTAT ss,
      most_frequent_nodups nd
      where
      ss.sql_id = nd.sql_id and
      ss.plan_hash_value = nd.plan_hash_value
      group by
      ss.sql_id,
      ss.plan_hash_value;
      -- get list of the sqls that are running
      -- the plan that isn't most frequently
      -- executed and has an average elapsed
      -- more than 10 times the average of
      -- the most frequently executed plan
      -- add 1 to executions to prevent
      -- divide by zero
      drop table more_than_10x;
      create table more_than_10x as
      select
      n.sql_id,
      n.plan_hash_value
      from
      not_most_freq n,
      avg_elapsed_most_f m
      where
      (n.elapsed/(n.executions+1)) > 10 * m.avg_elapsed and
      n.sql_id=m.sql_id;
      spool planchangemonitor.log
      select name db_name from v\$database;
      -- The listed sql_id and plan_hash_value items correspond to
      -- sql statements that have plans that may be
      -- inefficient and need to be investigated.
      -- The session id and username are included if a
      -- session is currently running the sql with the plan.
      select
      'CHANGED '||'PLAN' flag,
      m.sql_id,
      m.plan_hash_value,
      s.sid,
      s.username
      from
      more_than_10x m,
      v\$session s,
      v\$sql q
      where
      m.sql_id=s.sql_id(+) and
      m.plan_hash_value=q.plan_hash_value(+) and
      s.sql_id=q.sql_id and
      s.SQL_CHILD_NUMBER=q.child_number and
      m.sql_id not in ('564jfmh27jw42','2216nkkfppytv','5k2b3qsy3b30r','21g9kxb9z02xb','g0v5p7v7cdjc5','1j2dz1yfp7xb5','8gtvzq6yjmhfa','5crdk2jjaw300') and
      s.username <> 'MYDBAUSER'
      order by
      m.sql_id,
      m.plan_hash_value,
      s.sid,
      s.username;
      spool off
      EOF
      #=====================================================================
      # Alert on failure
      #=====================================================================
      if cat planchangemonitor.log | grep -q "CHANGED PLAN"
      then
      cat planchangemonitor.mailheader planchangemonitor.log | mail myemail@mydomain.com
      fi

      Bobby

      • Anonymous says:

        Thanks bobby for the brilliant scripts. Can we use the script directly?
        We are not allowed to create any table in prod but we can use centralised server to create table using data via dblink and feed it

        In the script do we need to add ? As you said its portion of scripts correct ?

  2. Pingback: December 26 Plan Change | Bobby Durrett's DBA Blog

  3. Anonymous says:

    Thanks

  4. Murugan says:

    Bobby ,
    Thanks for the script. May i know why there is discrepancies with respect to monitor query above and zip file ?

    • Bobby says:

      The zip file contains a sqlplus script that I wrote do monitor for SQL plan changes. The shell script listed above is that same script converted to run as a shell script on HP Unix. The logic is the same. I try to find queries that are running a plan that is taking 10 times longer than the best plan it ran in the past.

  5. Krish says:

    Thanks for the scripts. Why there is discrepancies with scripts?
    Zip file scripts only taking live sql however posted scripts filtering and taking only specific days of interest.

    • Bobby says:

      The shell script which I posted in a comment in 2021 is based on the real script I ran as of that date. The original zip was from 2014 which I had just written but did not have experience using it in production. Since I have been using this in production for several years I have added filters to make it more useful by eliminating false positives. If someone wants to use it, try it without the filters and then add your own if you see it alert on plan changes that do not matter. What is interesting is that plans change all the time. I had no idea the system was constantly changing in this way until I started running this monitoring script.

      Bobby

  6. Krish says:

    Thanks. Do you have any scripts after identifying plan flip it will invoke coe_xfr_sql_profile automatically and pin the best plan with out manual intervention?

    • Bobby says:

      No. I do not want to make a change in production without first evaluating the plan change. There are some queries that have to change plans based on the values passed to them in bind variables. I just use the script for alerting. I’m not comfortable just letting is put in the profile automatically.

      Thanks for the question. It is a good one and I have thought about it but so far have rejected automatically putting in the profile.

      Bobby

  7. Krish says:

    Hi Bobby,
    Thank you for the response. Appreciate it.

    We are trying to target few databases where literals are in play, unfortunately application code can’t be modified. Thus causes plan flip which can be fixed with force_match=true using CoE script. However quite often they are coming up with different queries with literals and some times profiles are getting obsolete due to changes in underlying object or statistics. To arrest plan flip we are trying to automate this CoE. So thought of checking with you. Ideas and inputs are greatly appreciated. Anyway once again thanks for your stunning blog post and supporting oracle community. Keep hustling and doing awesome job.

    You are one of my inspirations. Especially yours PT skills and python scripts are outstanding. Kudos to you.

Leave a Reply to Ashwani Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.