Bobby Durrett's DBA Blog

Don’t use TIME_WAITED in ASH views

Posted on April 9, 2013 by Bobby

I attended John Beresniewicz’s Active Session History (ASH) talk at Collaborate 13 on Monday. One simple point from it was that he commonly sees queries of the ASH views that use the TIME_WAITED column incorrectly and result in incorrect results. The ASH views are V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY.

I’ve used DBA_HIST_ACTIVE_SESS_HISTORY frequently to diagnose Oracle performance problems and I don’t believe that I use TIME_WAITED. All I do is count up the number of rows in DBA_HIST_ACTIVE_SESS_HISTORY and count each row as 10 seconds of time. This would be either time on the CPU or wait time. Here is a query I recently used on a real performance problem:

select 
case SESSION_STATE
when 'WAITING' then event
else SESSION_STATE
end TIME_CATEGORY,
(count(*)*10) seconds
from DBA_HIST_ACTIVE_SESS_HISTORY a,
V$INSTANCE i,
dba_users u
where 
a.user_id = u.user_id and
a.instance_number = i.instance_number and
a.user_id = u.user_id and
sample_time 
between 
to_date('2013-04-02 00:00','YYYY-MM-DD HH24:MI')
and 
to_date('2013-04-02 22:00','YYYY-MM-DD HH24:MI')
and
a.sql_id = 'c12m4zxj3abm6'
group by SESSION_STATE,EVENT
order by seconds desc;

Disregard the joins to dba_users and v$instance. These are left over from previous uses of this query. Here is the output:

TIME_CATEGORY                     SECONDS
------------------------------ ----------
free buffer waits                   49110
db file parallel read               11310
write complete waits                 1810
db file sequential read              1600
ON CPU                                720
read by other session                 220
PX qref latch                          50
db file scattered read                 20
direct path read temp                  20
latch: redo allocation                 10
latch: cache buffers lru chain         10

An AWR report from 00:00 to 21:00 the same day showed this sql_id with one execution of length 50,540.80 seconds that didn’t finish. So, this roughly corresponds to the numbers in the ASH profile query above. About 49,000 of the 60,000 seconds of run time are accounted for by free buffer waits. We believe this is caused by too frequent checkpointing but haven’t verified it yet with a fix in production.

Anyway, this type of query was useful to me and it didn’t use the TIME_WAITED column so it validates to some extent the notion that you don’t need to use TIME_WAITED on the ASH views to have a useful query.

– Bobby

Posted in Uncategorized | 3 Comments

Lessons from preparing my Exadata talk

Posted on April 5, 2013 by Bobby

Well, I’m giving this talk related to Exadata at the Collaborate 13 usergroup conference on Monday. I’ve spent a lot of time – probably too much time – preparing the slides and practicing the talk. I first gave this talk a year ago at our office and then again last October at a conference in North Carolina. But, I’ve never been happy with it. I feel a lot better about it now because the slides and the associated notes have a lot of information on them. I got a lot of great feedback from several people and I’ve made changes accordingly.

But having spent too much time reviewing this I think there are a small number of important concepts that I’m really trying to get across:

Exadata Smart Scans bypass the block buffer cache
Exadata Smart Scans happen instead of FULL scans
Make indexes invisible or increase optimizer_index_cost_adj to encourage FULL scans
Exadata Smart Scans tend to be part of a HASH JOIN
HASH JOINs can be sped up by adding PGA memory – pga_aggregate_target
You can free memory by reducing the size of the block buffer cache – sga_max_size

So, this is the short version of my 45 minute talk More PGA, less SGA – my talk in four words!

– Bobby

Posted in Uncategorized | Leave a comment

Yet another Exadata slides update

Posted on March 28, 2013 by Bobby

Link to the latest version of my Exadata talk slides: zip

Here are the details of my presentation time and place if you are going to Collaborate 13 in Denver:

Session title: Exadata Distinctives
Room: Mile High Ballroom 1C
Date and Time: 1:15PM-2:15PM

I look forward to seeing you there!

– Bobby

PS. Minor revision today 04/04/2013. Got to practice with a few coworkers listening. Feel like this is it for the slides.

Posted in Uncategorized | Leave a comment

Exadata flash cache latency .4 milliseconds

Posted on March 26, 2013 by Bobby

The central point of my Exadata presentation is that data flows from the disks and into and out of memory caches in a different way on Exadata when using a Smart Scan than in non-Exadata Oracle databases. An interesting consequence of the way data is cached in Smart Scans is that it is cached in flash memory which is slower than regular RAM. Non-Smart Scan table accesses may use the faster block buffer cache in the database server’s RAM. I have on my presentation that it takes about 1 millisecond to read from flash memory versus 10 nanoseconds on regular RAM. I got those numbers from an Exadata class put on by Oracle and from memory statistics published on the internet. But, I couldn’t remember if I had verified the 1 millisecond number for flash cache access experimentally so I did a simple test that came back with about .4 milliseconds to read 32K from the flash cache. This is still much slower than RAM but faster than I thought.

Here are my test scripts, logs, and a spreadsheet calculating the result: zip

I slightly modified the script from my previous post to set autotrace on and timing on.

SQL> select sum(BLOCKS) from test;

SUM(BLOCKS)
-----------
 1983309312

Elapsed: 00:00:00.73

The query ran in .73 seconds.

NAME                                 VALUE
------------------------------- ----------
physical read total IO requests       1841
cell flash cache read hits            1837

Almost every IO was cached in flash memory so I’ll do the calculation as if all the execution time for the query is accounted for by the flash cache reads.

.73 seconds X 1000 milliseconds per second = 730 ms

730 ms/1837 flash cache reads = .4 milliseconds/flash cache read (rounded up from .397)

So, this isn’t perfect but it is one piece of evidence that the flash cache reads are about .4 milliseconds on the Exadata V2 system this was tested on.

– Bobby

PS. I determined that the flash cache reads were 32 K using the following information:

Statistics
----------------------------------------------------------
  7239  physical reads

NAME                                MB
--------------------------- ----------
physical read total bytes   56.5546875

NAME                                 VALUE
------------------------------- ----------
physical read total IO requests       1841

56 megabytes read/1841 physical IOs is about 32K. Also, 7239 physical blocks read/1841 physical IOs is about 4 8K blocks per read = 32 K. So, these tests appear to show that 32K flash cache reads take about .4 milliseconds.

Posted in Uncategorized | Leave a comment

Flash cache used when smart scans are disabled

Posted on March 20, 2013 by Bobby

I just wanted to double check something on my Exadata slides. I think I had checked this before but became unsure if something I was saying was true, namely that when you run a normal non-smart scan query on Exadata the cell storage servers still query the flash cache memory for cached disk blocks. So,I tried a full scan with smart scans disabled and it appears, based on statistics kept by the database, that the flash cache was used. Here is the (edited for clarity) output of my test script:

SQL> alter session set cell_offload_processing = FALSE;

SQL> select sum(BLOCKS) from test;

SUM(BLOCKS)
-----------
 2213273152

SQL> select a.name,b.value/1024/1024 MB from v$sysstat a,
  2  v$mystat b
  3  where a.statistic# = b.statistic# and
  4  (a.name in ('physical read total bytes',
  5  'physical write total bytes',
  6  'cell IO uncompressed bytes') or a.name like 'cell phy%');

NAME                                                              MB
--------------------------------------------------------------- ----
physical read total bytes                                       56.5
physical write total bytes                                         0
cell physical IO interconnect bytes                             56.5
cell physical IO bytes sent directly to DB node to balanceCPU u    0
cell physical IO bytes saved during optimized file creation        0
cell physical IO bytes saved during optimized RMAN file restore    0
cell physical IO bytes eligible for predicate offload              0
cell physical IO bytes saved by storage index                      0
cell physical IO interconnect bytes returned by smart scan         0
cell IO uncompressed bytes                                         0

SQL> select a.name,b.value from v$sysstat a,
  2  v$mystat b
  3  where a.statistic# = b.statistic# and
  4  (a.name like '%flash cache read hits' or
  5   a.name ='physical read total IO requests');

NAME                                                           VALUE
-------------------------------------------------------------- -----
physical read total IO requests                                 1839
cell flash cache read hits                                      1805

The alter session turned off the smart scans.

The zero in the “cell physical IO interconnect bytes returned by smart scan” statistic indicates that the smart scans were not used.

The “cell flash cache read hits” statistic > zero indicates that the flash cache is used.

So, assuming we can trust these statistics this test shows that non-smart scan reads of data blocks from cell storage servers can access blocks from flash cache.

– Bobby

Posted in Uncategorized | 1 Comment

Disabling cardinality feedback

Posted on March 18, 2013 by Bobby

I ran into a case today where I had tuned a query by putting a cardinality hint into a view, but the optimizer changed the plan anyway by overriding the cardinality hint with cardinality feedback. So, I found out how to turn cardinality feedback off in case you want the cardinality hint to stick. I built a simple testcase for this post.

Here is the test query:

select /*+ cardinality(test,1) */ count(*) from test;

The first time it runs the plan shows that the optimizer thinks there is one row in the test table:

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |   292 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST |     1 |   292   (1)| 00:00:04 |
-------------------------------------------------------------------

But the second time cardinality feedback tells the optimizer the truth:

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |   292 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST | 31467 |   292   (1)| 00:00:04 |
-------------------------------------------------------------------

Note
-----
   - cardinality feedback used for this statement

How annoying! It ignored my cardinality hint. But you can add this hint to turn off cardinality feedback:

opt_param('_optimizer_use_feedback' 'false')

and then you are back to the original plan with rows = 1. This doesn’t prove that this will help improve performance just that the plan will show the cardinality I’m trying to make it use.

– Bobby

Posted in Uncategorized | Leave a comment

Finding bind variable values using DBA_HIST_SQLBIND

Posted on March 15, 2013 by Bobby

Whenever I need to test a query that has bind variables I usually go to the development team to ask them what typical values are or what the values were the last time the query ran. I’m pretty sure that in the past when I went looking for a DBA_HIST view that held bind variables that I came up empty. Today I’m working on tuning a query with a bind variable and I tried to find the value of the bind variable last Sunday using the DBA_HIST_SQLBIND view. Strangely enough it had exactly what I wanted so I’m not sure if there are cases where this doesn’t capture the variables and cases where it does, but it worked for me today so it may be useful to others in certain cases.

I had the sql_id of the query from an AWR report spanning the period of high CPU usage on Sunday: 40wpuup08vws6. I ran this query to get the bind variable for all executions of this sql_id.

select 
sn.END_INTERVAL_TIME,
sb.NAME,
sb.VALUE_STRING 
from 
DBA_HIST_SQLBIND sb,
DBA_HIST_SNAPSHOT sn
where 
sb.sql_id='40wpuup08vws6' and
sb.WAS_CAPTURED='YES' and
sn.snap_id=sb.snap_id
order by 
sb.snap_id,
sb.NAME;

It produced this output for the bind variable B1.

END_INTERVAL_TIME          NAM VALUE_STRING
-------------------------- --- -----------------
03-FEB-13 02.00.32.733 AM  :B1 02/02/13 00:00:00
03-FEB-13 03.00.36.316 AM  :B1 02/02/13 00:00:00
10-FEB-13 02.00.29.975 AM  :B1 02/09/13 00:00:00
10-FEB-13 03.00.23.292 AM  :B1 02/09/13 00:00:00
17-FEB-13 02.00.36.688 AM  :B1 02/16/13 00:00:00
17-FEB-13 03.00.06.374 AM  :B1 02/16/13 00:00:00
24-FEB-13 01.00.33.691 AM  :B1 02/23/13 00:00:00
24-FEB-13 02.00.20.269 AM  :B1 02/23/13 00:00:00
24-FEB-13 03.00.16.811 AM  :B1 02/23/13 00:00:00
03-MAR-13 02.00.17.974 AM  :B1 03/02/13 00:00:00
03-MAR-13 03.00.33.340 AM  :B1 03/02/13 00:00:00
10-MAR-13 10.00.10.356 PM  :B1 03/09/13 00:00:00
10-MAR-13 11.00.43.467 PM  :B1 03/09/13 00:00:00
11-MAR-13 12.00.12.898 AM  :B1 03/09/13 00:00:00

So, you can see what value this date type bind variable B1 contained each weekend that the query ran. The query runs for multiple hours so that is why it spanned multiple AWR snapshot intervals.

Here is a zip of my test script and its log: zip.

– Bobby

Posted in Uncategorized | 6 Comments

New version of my Exadata presentation

Posted on March 13, 2013 by Bobby

Here is a link to my updated Exadata PowerPoint presentation.

I’ve been trying to improve my Exadata talk for the Collaborate 13 conference. I’ve done this talk at work and at the ECO conference in October but I’m not completely happy with it. The criticism I’ve received boils down to these things:

The slides don’t stand on their own
I jump into the middle of the detail without enough introduction

So, I’ve updated the slides to have comments on many of the slides. This will be good as a reminder of what I want to say and to make the slides more meaningful to someone who just has the PowerPoint.

I’ve also added several slides to show the execution plan of a sample query to do a better job of setting up the slides I already have that discuss how data flows through an Exadata server as compared with a normal server. The point is that when the table is accessed blocks are read and certain rows are selected and certain columns are projected.

Lastly I added a slide on direct path read which shows how the buffer cache can be bypassed even on a normal Oracle database server in some cases. This is also part of the introduction in that it provides background needed to understand how the Exadata Smart Scan bypasses the buffer cache.

If anyone has time to read through the slides and give me their feedback I’d be happy to hear it. Hopefully the net result will be a presentation that is both useful to me and my company and to those attending the conference.

– Bobby

P.S. Edited again 3/22/2013

Posted in Uncategorized | Leave a comment

Implicit type conversion in where clause

Posted on March 9, 2013 by Bobby

I spent a lot of time yesterday digging into performance issues on a new database only to find a simple issue where two tables were joined on a column that was a number on one table and a varchar2 on the other. That column was a varchar2(4) on six or eight tables but one table – and it had to be the one with the most rows – had the same column defined as number(4) and as a result the queries were running with inefficient plans. All I had to do to find this was get a plan of the sample query I was working on and look for TO_NUMBER in the predicates section but of course I spent hours looking at other things first. So, I thought I would document how to make a quick check for this kind of type conversion. It is “implicit” type conversion because there is no TO_NUMBER in the sql itself. The optimizer has to add the TO_NUMBER so it can compare the character column to the number column.

Here is how I setup the example tables to mimic the situation I saw yesterday:

-- table with number column

create table ntest (div_nbr number(4));

-- table with character column

create table ctest (div_nbr varchar2(4));

Here is a sample query that joins these two tables on the columns of the same name but different types:

-- join on the column

select count(*)
from ntest,ctest
where ntest.div_nbr=ctest.div_nbr;

Here is how I get the plan which will includes the predicates section with the TO_NUMBER conversion function:

-- get plan

select * from table(dbms_xplan.display_cursor(null,null,'ALL'));

Finally, here is the plan with the predicates section with the TO_NUMBER function:

------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |     5 (100)|
|   1 |  SORT AGGREGATE     |       |     1 |    17 |            |
|*  2 |   HASH JOIN         |       |     1 |    17 |     5  (20)|
|   3 |    TABLE ACCESS FULL| NTEST |     1 |    13 |     2   (0)|
|   4 |    TABLE ACCESS FULL| CTEST |     1 |     4 |     2   (0)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NTEST"."DIV_NBR"=TO_NUMBER("CTEST"."DIV_NBR"))

Step 2 of the plan is the hash join and the predicate information section shows how the two columns are joined for this step. CTEST.DIV_NBR has to be converted to a number before it can be compared to NTEST.DIV_NBR.

So, I recommend putting a check for TO_NUMBER in the predicate information section of the plan into your toolkit of things to check when tuning a query. This kind of thing shows up more often than you would think and it is hard to find because if you look at the query itself you just see a normal join on the DIV_NBR column.

Our resolution of this issue was to change the type of the one table to varchar2(4). This combined with tuning optimizer_index_cost_adj resulted in dramatic improvements in performance on our test system.

Here is a zip of the script I used to show how to find TO_NUMBER in the predicate section.

– Bobby

Posted in Uncategorized | Leave a comment

Found an archive of my geocities blog

Posted on March 6, 2013 by Bobby

OK. This is really cool. I found an archive of my old geocities blog: archive

I thought geocities was gone forever but it isn’t. Quite cool.

– Bobby

Posted in Uncategorized | Leave a comment

Bobby Durrett's DBA Blog

Don’t use TIME_WAITED in ASH views

Lessons from preparing my Exadata talk

Yet another Exadata slides update

Exadata flash cache latency .4 milliseconds

Flash cache used when smart scans are disabled

Disabling cardinality feedback

Finding bind variable values using DBA_HIST_SQLBIND

New version of my Exadata presentation

Implicit type conversion in where clause

Found an archive of my geocities blog

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator