Exadata flash cache latency .4 milliseconds

The central point of my Exadata presentation is that data flows from the disks and into and out of memory caches in a different way on Exadata when using a Smart Scan than in non-Exadata Oracle databases.  An interesting consequence of the way data is cached in Smart Scans is that it is cached in flash memory which is slower than regular RAM.  Non-Smart Scan table accesses may use the faster block buffer cache in the database server’s RAM.  I have on my presentation that it takes about 1 millisecond to read from flash memory versus 10 nanoseconds on regular RAM.  I got those numbers from an Exadata class put on by Oracle and from memory statistics published on the internet.  But, I couldn’t remember if I had verified the 1 millisecond number for flash cache access experimentally so I did a simple test that came back with about .4 milliseconds to read 32K from the flash cache.  This is still much slower than RAM but faster than I thought.

Here are my test scripts, logs, and a spreadsheet calculating the result: zip

I slightly modified the script from my previous post to set autotrace on and timing on.

SQL> select sum(BLOCKS) from test;

SUM(BLOCKS)
-----------
 1983309312

Elapsed: 00:00:00.73

The query ran in .73 seconds.

NAME                                 VALUE
------------------------------- ----------
physical read total IO requests       1841
cell flash cache read hits            1837

Almost every IO was cached in flash memory so I’ll do the calculation as if all the execution time for the query is accounted for by the flash cache reads.

.73 seconds X 1000 milliseconds per second = 730 ms

730 ms/1837 flash cache reads = .4 milliseconds/flash cache read (rounded up from .397)

So, this isn’t perfect but it is one piece of evidence that the flash cache reads are about .4 milliseconds on the Exadata V2 system this was tested on.

– Bobby

PS.  I determined that the flash cache reads were 32 K using the following information:

Statistics
----------------------------------------------------------
  7239  physical reads

NAME                                MB
--------------------------- ----------
physical read total bytes   56.5546875

NAME                                 VALUE
------------------------------- ----------
physical read total IO requests       1841

56 megabytes read/1841 physical IOs is about 32K.  Also, 7239 physical blocks read/1841 physical IOs is about 4 8K blocks per read = 32 K.  So, these tests appear to show that 32K flash cache reads take about .4 milliseconds.

 

Posted in Uncategorized | Leave a comment

Flash cache used when smart scans are disabled

I just wanted to double check something on my Exadata slides.  I think I had checked this before but became unsure if something I was saying was true, namely that when you run a normal non-smart scan query on Exadata the cell storage servers still query the flash cache memory for cached disk blocks.  So,I tried a full scan with smart scans disabled and it appears, based on statistics kept by the database, that the flash cache was used.  Here is the (edited for clarity) output of my test script:

SQL> alter session set cell_offload_processing = FALSE;

SQL> select sum(BLOCKS) from test;

SUM(BLOCKS)
-----------
 2213273152

SQL> select a.name,b.value/1024/1024 MB from v$sysstat a,
  2  v$mystat b
  3  where a.statistic# = b.statistic# and
  4  (a.name in ('physical read total bytes',
  5  'physical write total bytes',
  6  'cell IO uncompressed bytes') or a.name like 'cell phy%');

NAME                                                              MB
--------------------------------------------------------------- ----
physical read total bytes                                       56.5
physical write total bytes                                         0
cell physical IO interconnect bytes                             56.5
cell physical IO bytes sent directly to DB node to balanceCPU u    0
cell physical IO bytes saved during optimized file creation        0
cell physical IO bytes saved during optimized RMAN file restore    0
cell physical IO bytes eligible for predicate offload              0
cell physical IO bytes saved by storage index                      0
cell physical IO interconnect bytes returned by smart scan         0
cell IO uncompressed bytes                                         0

SQL> select a.name,b.value from v$sysstat a,
  2  v$mystat b
  3  where a.statistic# = b.statistic# and
  4  (a.name like '%flash cache read hits' or
  5   a.name ='physical read total IO requests');

NAME                                                           VALUE
-------------------------------------------------------------- -----
physical read total IO requests                                 1839
cell flash cache read hits                                      1805

The alter session turned off the smart scans.

The zero in the “cell physical IO interconnect bytes returned by smart scan” statistic indicates that the smart scans were not used.

The “cell flash cache read hits” statistic > zero indicates that the flash cache is used.

So, assuming we can trust these statistics this test shows that non-smart scan reads of data blocks from cell storage servers can access blocks from flash cache.

– Bobby

Posted in Uncategorized | 1 Comment

Disabling cardinality feedback

I ran into a case today where I had tuned a query by putting a cardinality hint into a view, but the optimizer changed the plan anyway by overriding the cardinality hint with cardinality feedback.  So, I found out how to turn cardinality feedback off in case you want the cardinality hint to stick.  I built a simple testcase for this post.

Here is the test query:

select /*+ cardinality(test,1) */ count(*) from test;

The first time it runs the plan shows that the optimizer thinks there is one row in the test table:

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |   292 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST |     1 |   292   (1)| 00:00:04 |
-------------------------------------------------------------------

But the second time cardinality feedback tells the optimizer the truth:

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |   292 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST | 31467 |   292   (1)| 00:00:04 |
-------------------------------------------------------------------

Note
-----
   - cardinality feedback used for this statement

How annoying!  It ignored my cardinality hint. But you can add this hint to turn off cardinality feedback:

opt_param('_optimizer_use_feedback' 'false')

and then you are back to the original plan with rows = 1.  This doesn’t prove that this will help improve performance just that the plan will show the cardinality I’m trying to make it use.

– Bobby

 

 

Posted in Uncategorized | Leave a comment

Finding bind variable values using DBA_HIST_SQLBIND

Whenever I need to test a query that has bind variables I usually go to the development team to ask them what typical values are or what the values were the last time the query ran.  I’m pretty sure that in the past when I went looking for a DBA_HIST view that held bind variables that I came up empty.  Today I’m working on tuning a query with a bind variable and I tried to find the value of the bind variable last Sunday using the DBA_HIST_SQLBIND view.  Strangely enough it had exactly what I wanted so I’m not sure if there are cases where this doesn’t capture the variables and cases where it does, but it worked for me today so it may be useful to others in certain cases.

I had the sql_id of the query from an AWR report spanning the period of high CPU usage on Sunday: 40wpuup08vws6.  I ran this query to get the bind variable for all executions of this sql_id.

select 
sn.END_INTERVAL_TIME,
sb.NAME,
sb.VALUE_STRING 
from 
DBA_HIST_SQLBIND sb,
DBA_HIST_SNAPSHOT sn
where 
sb.sql_id='40wpuup08vws6' and
sb.WAS_CAPTURED='YES' and
sn.snap_id=sb.snap_id
order by 
sb.snap_id,
sb.NAME;

It produced this output for the bind variable B1.

END_INTERVAL_TIME          NAM VALUE_STRING
-------------------------- --- -----------------
03-FEB-13 02.00.32.733 AM  :B1 02/02/13 00:00:00
03-FEB-13 03.00.36.316 AM  :B1 02/02/13 00:00:00
10-FEB-13 02.00.29.975 AM  :B1 02/09/13 00:00:00
10-FEB-13 03.00.23.292 AM  :B1 02/09/13 00:00:00
17-FEB-13 02.00.36.688 AM  :B1 02/16/13 00:00:00
17-FEB-13 03.00.06.374 AM  :B1 02/16/13 00:00:00
24-FEB-13 01.00.33.691 AM  :B1 02/23/13 00:00:00
24-FEB-13 02.00.20.269 AM  :B1 02/23/13 00:00:00
24-FEB-13 03.00.16.811 AM  :B1 02/23/13 00:00:00
03-MAR-13 02.00.17.974 AM  :B1 03/02/13 00:00:00
03-MAR-13 03.00.33.340 AM  :B1 03/02/13 00:00:00
10-MAR-13 10.00.10.356 PM  :B1 03/09/13 00:00:00
10-MAR-13 11.00.43.467 PM  :B1 03/09/13 00:00:00
11-MAR-13 12.00.12.898 AM  :B1 03/09/13 00:00:00

So, you can see what value this date type bind variable B1 contained each weekend that the query ran.  The query runs for multiple hours so that is why it spanned multiple AWR snapshot intervals.

Here is a zip of my test script and its log: zip.

– Bobby

Posted in Uncategorized | 6 Comments

New version of my Exadata presentation

Here is a link to my updated Exadata PowerPoint presentation.

I’ve been trying to improve my Exadata talk for the Collaborate 13 conference.  I’ve done this talk at work and at the ECO conference in October but I’m not completely happy with it.  The criticism I’ve received boils down to these things:

  1. The slides don’t stand on their own
  2. I jump into the middle of the detail without enough introduction

So, I’ve updated the slides to have comments on many of the slides.  This will be good as a reminder of what I want to say and to make the slides more meaningful to someone who just has the PowerPoint.

I’ve also added several slides to show the execution plan of a sample query to do a better job of setting up the slides I already have that discuss how data flows through an Exadata server as compared with a normal server.  The point is that when the table is accessed blocks are read and certain rows are selected and certain columns are projected.

Lastly I added a slide on direct path read which shows how the buffer cache can be bypassed even on a normal Oracle database server in some cases.  This is also part of the introduction in that it provides background needed to understand how the Exadata Smart Scan bypasses the buffer cache.

If anyone has time to read through the slides and give me their feedback I’d be happy to hear it.  Hopefully the net result will be a presentation that is both useful to me and my company and to those attending the conference.

– Bobby

P.S.  Edited again 3/22/2013

Posted in Uncategorized | Leave a comment

Implicit type conversion in where clause

I spent a lot of time yesterday digging into performance issues on a new database only to find a simple issue where two tables were joined on a column that was a number on one table and a varchar2 on the other.  That column was a varchar2(4) on six or eight tables but one table – and it had to be the one with the most rows – had the same column defined as number(4) and as a result the queries were running with inefficient plans.  All I had to do to find this was get a plan of the sample query I was working on and look for TO_NUMBER in the predicates section but of course I spent hours looking at other things first.  So, I thought I would document how to make a quick check for this kind of type conversion.  It is “implicit” type conversion because there is no TO_NUMBER in the sql itself.  The optimizer has to add the TO_NUMBER so it can compare the character column to the number column.

Here is how I setup the example tables to mimic the situation I saw yesterday:

-- table with number column

create table ntest (div_nbr number(4));

-- table with character column

create table ctest (div_nbr varchar2(4));

Here is a sample query that joins these two tables on the columns of the same name but different types:

-- join on the column

select count(*)
from ntest,ctest
where ntest.div_nbr=ctest.div_nbr;

Here is how I get the plan which will includes the predicates section with the TO_NUMBER conversion function:

-- get plan

select * from table(dbms_xplan.display_cursor(null,null,'ALL'));

Finally, here is the plan with the predicates section with the TO_NUMBER function:

------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |     5 (100)|
|   1 |  SORT AGGREGATE     |       |     1 |    17 |            |
|*  2 |   HASH JOIN         |       |     1 |    17 |     5  (20)|
|   3 |    TABLE ACCESS FULL| NTEST |     1 |    13 |     2   (0)|
|   4 |    TABLE ACCESS FULL| CTEST |     1 |     4 |     2   (0)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NTEST"."DIV_NBR"=TO_NUMBER("CTEST"."DIV_NBR"))

Step 2 of the plan is the hash join and the predicate information section shows how the two columns are joined for this step.  CTEST.DIV_NBR has to be converted to a number before it can be compared to NTEST.DIV_NBR.

So, I recommend putting a check for TO_NUMBER in the predicate information section of the plan into your toolkit of things to check when tuning a query.  This kind of thing shows up more often than you would think and it is hard to find because if you look at the query itself you just see a normal join on the DIV_NBR column.

Our resolution of this issue was to change the type of the one table to varchar2(4).  This combined with tuning optimizer_index_cost_adj resulted in dramatic improvements in performance on our test system.

Here is a zip of the script I used to show how to find TO_NUMBER in the predicate section.

– Bobby

Posted in Uncategorized | Leave a comment

Found an archive of my geocities blog

OK.  This is really cool.  I found an archive of my old geocities blog: archive

I thought geocities was gone forever but it isn’t.  Quite cool.

– Bobby

Posted in Uncategorized | Leave a comment

Don’t focus on cost of execution plan

I don’t focus on the Oracle optimizer’s cost of a particular execution plan when I’m tuning a query and I’m worried that many of the developers and DBAs that I’ve talked with about query tuning are too focused on lowering the cost of a plan when attempting to tune a query.

I hear comments like this all the time when talking about making a change X to improve the performance of a query.  “I changed X and the cost in the explain plan was so much lower.”  X could be adding an index or hint or making a parameter change, etc.

I just cringe inside when I hear this and I hope I am gracious but all the time I’m filled with fear that the person I’m talking to is missing a key concept when it comes to Oracle query tuning.  The concept is just that in many cases the optimizer’s estimated cost is far off from reality.  So, making a change and seeing the cost of the plan go down really doesn’t mean much.  It could directly correlate to corresponding improvement in the query run time or it could be just the opposite.  It is kind of like the buffer cache hit ratio.  Sometimes this ratio really means something and sometimes it doesn’t.

Instead of focusing on the cost I focus on the plan itself.  Based on my investigation of the tables in the query and how many rows will be accessed from each I’ve come up with an idea of a plan that should be better than the one I’m improving.  So, my question about a proposed change X becomes “Does change X cause the plan to change to the one I determined to be better?”

I attempted to lay out this approach in my Intro to SQL Tuning presentation.  For me query tuning is kind of like programming.  I’m figuring out the best way to really do the steps of the plan based on my own study of the existing tables.  Then I just have to figure out what change to make to get the optimizer to run the query my way.  This is a time consuming approach but I would only spend the time on queries that really need it.  Who has time to tune every query?

So, my recommended approach to query tuning is to figure out a good plan on your own and then to figure out how to get the database to run it your way.  I don’t recommend focusing on what the cost of the new plan is compared with the original plan.

– Bobby

 

 

Posted in Uncategorized | 1 Comment

Delphix First Month

My company is trying out a new product called Delphix.

We have had Delphix installed for about a month and I wanted this post to contain my first impressions for others who may be considering it.  Essentially Delphix provides fast copies of databases so that a terabyte database can be refreshed in minutes.  My most recent example took three minutes for a development db that is just over one TB.

Delphix is an appliance running Open Solaris that runs in a VMWare ESX virtual machine.  At least, that is how we have it installed.  You license Delphix by the number of CPUs in the virtual machine so you can use it as much as you want as long as the CPUs can handle it.  Your database file systems are connected to the Delphix appliance using NFS.  So, the critical component of Delphix is the network connection between your target database servers and the appliance.  In our case we put new 10 gigabit ethernet hardware in to connect our HP-UX Itanium virtual machines (target database servers) to the Delphix vm.  We made sure they were all on the same IP subnet with nothing but a nice high speed switch between them all.  After some initial testing Delphix’s performance and support teams made some network configuration changes that helped with an initial performance/hang issue.  Looks like it may have something to do with the packet flow control within TCP/IP but it works fine now.

The Delphix appliance uses RMAN to pull data from a source database.  The source database has to be in archivelog mode and for best performance you need to enable block change tracking.  Delphix does an initial full level 0 backup and then regular incremental level 1 backups.  To create a new clone copy of the original database you pick one of these incremental backups – called snapshots – and point to the target machine and Delphix automatically mounts the appropriate NFS filesystems with the datafiles, tempfile, redo logs, etc. and brings up a clone of the source system with the new name you designate.

Also, Delphix takes snapshots of the clone databases – called VDBs – on a regular basis so these can be used as backups of the clones themselves and you can clone the clones – sounds like Star Wars Clone Wars doesn’t it?

One interesting challenge is space management.  Each new clone takes up very little space – until you start updating it.  I did some tests where I copied a 30 gig table to a new table and the vdb which previously took less than 100 megabytes of space now took gigabytes.  Everything is compressed so I think it was less than half of the 30 gig but the point is that the more the copies get updated the more disk space you need for the copies.  If you have an application that needs copies quickly but the copies aren’t heavily updated you can make many copies with very little disk space.  I think this kind of capability opens up all kinds of possibilities we haven’t considered before since each copy without Delphix would take up an equal amount of space as the original if you just use normal disk storage.  I.e. Before Delphix we had to minimize the number of production copies we used for development and testing.  Imagine how our processes might change if we can spin off a quick production copy, use it for a short while, and then get rid of it all with minimal additional disk space and with the clone occurring in minutes.

Right now I’m working with a Delphix consultant on a script that the developers can use to refresh their own database from the most recent snapshot of its source.  Delphix has a couple of ways it can be controlled from a Unix shell script.  One way is with ssh and their command line interface (CLI).  You can setup ssh with a public and private key so you can ssh into the VM as the “delphix admin” user and then run a set of commands in Delphix’s proprietary language.  Delphix also has a GUI but it is nice that you can run a script and do things like kick off a refresh or clone.  Also, there is a web service and they gave me a sample Python script to communicate with the VM through the web service but I haven’t delved into it.  For one thing, we don’t have Python installed on our HP-UX servers.  You can access Delphix’s documentation online here including the CLI interface commands.

Well, I’m not trying to sell Delphix to anyone but I thought it would be good to put down some of my experiences.  The database cloning and refreshing is remarkably fast.  Time will tell how the network NFS performance holds out during heavy development and testing but assuming we manage that performance and the disk space usage I’m pleased with the remarkable clone times that are supported by the Delphix VM.

– Bobby

Posted in Uncategorized | 26 Comments

Collaborate 13 paper posted

My Exadata talk got accepted by the IOUG and the deadline for the slides and paper is rapidly approaching.  I’ve had the slides for a while but have never written the presentation down as a paper before.  So, I’ve posted it along with the slides on this blog.  You can access them both in the zip file here.

My presentation is number 988 titled “Exadata Distinctives”.  Yes I know that “distinctives” is not a word.  But, I think the presentation will be helpful to someone who is new to Exadata and the basics of how it works.  There were some key things I had to learn through Oracle’s Exadata class and my own experience and I’ve tried to communicate the most important things in the presentation.  The paper isn’t perfect, but I’ve tried to capture the most important points from the slides and the spoken presentation.  If you have time to review it and give me feedback I’d be happy to hear your critique.

Also, IOUG would like the speakers to encourage people to attend Collaborate 13.  You can help me win a T-Shirt or Ipad by registering and saying I sent you.  Not that I really care about the T-Shirt but I do think the conference is worthwhile.  What I like is hearing from top people and getting ideas that I can follow up on later.  I usually go to most of the performance tuning talks and each hour long session gives me ideas of things I haven’t seen before and need to investigate after I get home.

Here is how the IOUG describes the conference:

Jointly hosted by the Independent Oracle Users Group (IOUG), Oracle Applications Users Group (OAUG) and Quest International Users Group (Quest), COLLABORATE 13 will provide all the real-world technical training you need – not sales pitches. The IOUG Forum presents hundreds of educational sessions on Oracle technology, led by the most informed and accomplished Oracle users and experts in the world, bringing 5,500+ Oracle technology and applications professionals to one venue for Oracle education, customer exchange and networking.

In addition to the expansive educational offerings, you’ll be able to test-drive the latest technology solutions in the COLLABORATE 13 Exhibitor Showcase. Packed with hundreds of Oracle vendors and partners, the Exhibitor Showcase is the one place to find innovative new products to maximize your business efficiencies and discover solutions for your existing Oracle environment.

I’d certainly encourage you to go if you can.  One key concept from one talk could easily be worth the cost of the entire trip for your company.

If you want to help me win the T-Shirt then be sure to register with the code AD03 and write in “Bobby Durrett” when asked “How did you hear about Collaborate 13?”

– Bobby

#6 - COLLABORATE 13 IOUG Banner_Ad

Posted in Uncategorized | Leave a comment