Bobby Durrett's DBA Blog

Commit Puzzle

Posted on August 22, 2014 by Bobby

This graph represents commit time compared to CPU utilization and redo log write time. I’ve included only the hourly intervals with more than 1,000,000 commits. At these peaks the number of commits ranges 1 to 1.6 million commits per hour so each point on the graph represents roughly the same commit rate. I’m puzzled by why the commit time bounces around peaking above 5 milliseconds when I can’t see any peaks in I/O or CPU that correspond to the commit time peaks.

I derived CPU% from DBA_HIST_OSSTAT. I got the other values by getting wait events from DBA_HIST_SYSTEM_EVENT. Commit time is log file sync wait time. Redo write time is log file parallel write wait time. I converted the wait times to milliseconds so they fit nicely on the chart with CPU%.

I thought I would pass this along as a puzzle that I haven’t figured out.

Here is a zip of the script I used to get the data, its raw output, and the spreadsheet I used to make the chart: zip

– Bobby

P.S. This is on HP-UX 11.31, Itanium, Oracle 11.2.0.3

P.P.S Did some more work on this today. Looks like the high commit time periods have short spikes of long redo log writes even though the average over the hour is still low. I’m looking at DBA_HIST_SYSTEM_EVENT to get a histogram of the log file parallel write waits and there are a number in the 1024 bucket when the log file sync time is high on average.

END_INTERVAL_TIME   LFPW_MILLI LFPW_COUNT AVG_COMMIT_MS AVG_WRITE_MS
------------------- ---------- ---------- ------------- ------------
21-AUG-14 11.00 AM           1     268136    9.14914833   2.45438987
21-AUG-14 11.00 AM           2     453913    9.14914833   2.45438987
21-AUG-14 11.00 AM           4     168370    9.14914833   2.45438987
21-AUG-14 11.00 AM           8      24436    9.14914833   2.45438987
21-AUG-14 11.00 AM          16       5675    9.14914833   2.45438987
21-AUG-14 11.00 AM          32       6122    9.14914833   2.45438987
21-AUG-14 11.00 AM          64       3369    9.14914833   2.45438987
21-AUG-14 11.00 AM         128       2198    9.14914833   2.45438987
21-AUG-14 11.00 AM         256       1009    9.14914833   2.45438987
21-AUG-14 11.00 AM         512        236    9.14914833   2.45438987
21-AUG-14 11.00 AM        1024         19    9.14914833   2.45438987
21-AUG-14 11.00 AM        2048          0    9.14914833   2.45438987
21-AUG-14 02.00 PM           1     522165    2.97787777   1.64840599
21-AUG-14 02.00 PM           2     462917    2.97787777   1.64840599
21-AUG-14 02.00 PM           4     142612    2.97787777   1.64840599
21-AUG-14 02.00 PM           8      17014    2.97787777   1.64840599
21-AUG-14 02.00 PM          16       4656    2.97787777   1.64840599
21-AUG-14 02.00 PM          32       5241    2.97787777   1.64840599
21-AUG-14 02.00 PM          64       1882    2.97787777   1.64840599
21-AUG-14 02.00 PM         128        820    2.97787777   1.64840599
21-AUG-14 02.00 PM         256        149    2.97787777   1.64840599
21-AUG-14 02.00 PM         512         10    2.97787777   1.64840599
21-AUG-14 02.00 PM        1024          2    2.97787777   1.64840599
21-AUG-14 02.00 PM        2048          0    2.97787777   1.64840599

There were 19 waits over half a second in the first hour and only 2 in the second hour. Maybe all the log file sync waits pile up waiting for those long writes. Here is a graph that compares the number of waits over half a second – the 1024 ms bucket – to the average log file sync and log file parallel write times for the hour:

You can see that the average redo write time goes up a little but the commit time goes up more. Maybe commit time is more affected by a few long spikes than by a lot of slightly longer write times.

Found a cool blog post that seems to explain exactly what we are seeing: blog post(DOES NOT EXIST)

Posted in Uncategorized | 6 Comments

Commit scalability

Posted on August 20, 2014 by Bobby

I am learning about how well commits scale on Oracle 11.2 and so far they seem to scale surprisingly well.

I’ve looked at two waits – log file parallel write and log file sync. Based on documents I’ve read on Oracle’s support site log file parallel write represents the time it takes to do one write to the redo logs. For mirrored redo logs the log file parallel write time includes the time to write to both of the copies. Log file sync represents the time it takes for a session to complete a commit and should include all the time measured for the write to the redo logs and added CPU time to process the commit. So, the log file sync time should equal or exceed the log file parallel write time.

Looking at AWR data I found that at peak times one of our databases had 1 millisecond log file parallel write waits and about 1.2 million waits per hour. Since there are 3.6 million milliseconds in an hour it seemed to me that during this peak hour the redo logs were about 33% utilized because writes to the redo logs were occurring during 1.2 million of the available 3.6 million milliseconds. I decided to look at a simple queuing theory model that I had read about in Craig Shallahamer’s Forecasting Oracle Performance book to get a basic idea of how queuing might impact redo log write time as the utilization of the redo log grew closer to 100%.

This model predicts that the redo log write time will go through the roof as the number of writes per hour approaches 3.6 million, assuming a constant 1 millisecond write time.

To attempt to confirm the predictions made by this graph I decided to build some scripts that will run a bunch of commits on a test database and attempt to max out the writes to the redo log so I could graph the results and compare it to the theoretical model. In the test I had twenty tables named TEST1, TEST2,…, TEST20. These tables have one row and one column. I ran a test of 1, 2, 5, 10, and 20 updating processes that I designed to generate a bunch of commits quickly. Each process ran 100,000 updates and commits like these:

update test1 set a=a+1;
commit;
update test1 set a=a+1;
commit;
update test1 set a=a+1;
commit;
…

Each process had its own table – i.e. process 15 acted on table TEST15.

My hope was that as I ran tests with increasing numbers of processes running in parallel eventually I would max out the writes to the redo log and see increasing log file parallel write wait times. But, surprisingly, as the commit rate increased the redo log write rate actually went down.

This is just one test, but it makes me wonder if I can max out the writes to the redo log. I believe that as the commit rate increases the database batches the commits together in some efficient way which makes commits more scalable than I realized.

I think that an Oracle database must have some limit to commit scalability that relates more to the CPU used to process the commits instead of the writes to the redo logs. In these same tests the log file sync or commit time did increase slowly as the number of commits ramped up.

It started around half a millisecond at 3 million commits per hour and slowly grew to almost 2 milliseconds at 10 million commits per hour. So, commit time grew, but nothing like the original response time graph which went through the roof at around 3.6 million writes to the redo logs per hour.

Here is a zip of the scripts I used to generate the data, the spreadsheets I used to make the charts, and the results from the tests: zip

To run the tests yourself enter your own username and password at the top of setup.sql, commits.sql, and report.sql. Run one of the five reports – run1.sh, run2.sh, run5.sh, run10.sh or run20.sh like this:

./run5.sh

Wait for the last process to finish outputting then run the report:

./report.sh

Results look like this:

WAIT_EVENT              ELAPSED_SECS WAITS_PER_HOUR WAITS_PER_SECOND AVG_WAIT_MS AVG_WAIT_SEC
----------------------- ------------ -------------- ---------------- ----------- ------------
log file parallel write          542     3008922.51       835.811808  .648076577   .000648077
log file sync                    542     5306207.38       1473.94649   1.1727226   .001172723

I’m really just getting started understanding how commits scale, but it was surprising to me how hard it was to get the rate of redo log writes high enough to cause the write time to increase due to queuing. I assume this is because the database batches commits together more efficiently that I expected, which makes commits more scalable than I realized.

– Bobby

P.S. This is on HP-UX 11.31, Itanium, Oracle 11.2.0.3.0

Posted in Uncategorized | 2 Comments

Passed OCP 12c test

Posted on August 5, 2014 by Bobby

Well, I finally passed the Oracle Certified Professional (OCP) 12c upgrade exam, 1Z0-060. I got 86% right on the new features section and 78% correct on the general DBA section. So, that means I missed roughly 7 questions on each section which was more than I expected because I felt good about all of my answers.

I’m happy with the three resources I used which are:

Self Test software – $99
Matthew Morris book – $10
Sam Alapati book – $44

I spent a lot of time in the 12c manuals. Be careful, though, because the 12.1.0.2 manuals just came out and some things are different in 12.1.0.2 from the test, because the test designers based the test on 12.1.0.1.

I built two Linux virtual machines for test preparation. On one I installed the 12.1.0.1 database software and created a CDB and a non-CDB for testing. On the other I installed the grid software and database software and created a CDB on ASM disks. I spent many hours testing 12c features on these two environments and three databases.

I learned a ton both about new 12c features and features that exist in earlier versions, but it was a long and tedious process and I’m glad to finally be done.

– Bobby

Posted in Uncategorized | 70 Comments

Query with new plan

Posted on July 28, 2014 by Bobby

I came up with a simple query that shows a running SQL executing a different plan than what it had in the past. Here is the query:

-- show currently executing sqls that have history
-- but who have never run with the current plan
-- joins v$session to v$sql to get plan_hash_value of 
-- executing sql.
-- queries dba_hist_sqlstat for previously used values 
-- of plan_hash_value.
-- only reports queries that have an older plan that is 
-- different from the new one.

select
vs.sid,
vs.sql_id,
vs.last_call_et,
sq.plan_hash_value
from
v$session vs,
v$sql sq
where
vs.sql_id=sq.sql_id and
vs.SQL_CHILD_NUMBER=sq.child_number and
sq.plan_hash_value not in 
(select ss.plan_hash_value
from dba_hist_sqlstat ss
where 
ss.sql_id=sq.sql_id) and 
0 < 
(select count(ss.plan_hash_value)
from dba_hist_sqlstat ss
where 
ss.sql_id=sq.sql_id);

Example output:

       SID SQL_ID        LAST_CALL_ET PLAN_HASH_VALUE
---------- ------------- ------------ ---------------
       229 cq8bhsxbbf9k7            0      3467505462

This was a test query. I ran it a bunch of times with an index and then dropped the index after creating an AWR snapshot. The query executed with a different plan when I ran it without the index. The same type of plan change could happen in production if an index were accidentally dropped.

I’m hoping to use this query to show production queries that have run in the past but whose current plan differs from any that they have used before. Of course, a new plan doesn’t necessarily mean you have a problem but it might be helpful to recognize those plans that are new and that differ from the plans used in the past.

– Bobby

Posted in Uncategorized | 2 Comments

12c Threaded Execution Test

Posted on July 22, 2014 by Bobby

I did a quick check of some facts I’m studying about Oracle 12c and its new threaded execution mode. I set this parameter:

alter system set THREADED_EXECUTION=true scope=spfile;

I had to connect SYS as SYSDBA with a password to get the system to bounce.

Then it had these processes only:

oracle    1854     1  0 09:17 ?        00:00:00 ora_pmon_orcl
oracle    1856     1  0 09:17 ?        00:00:00 ora_psp0_orcl
oracle    1858     1  2 09:17 ?        00:00:00 ora_vktm_orcl
oracle    1862     1  3 09:17 ?        00:00:00 ora_u004_orcl
oracle    1868     1 99 09:17 ?        00:00:17 ora_u005_orcl
oracle    1874     1  0 09:17 ?        00:00:00 ora_dbw0_orcl

This differs from some of my 12c OCP study material but agrees with the manuals. Only pmon, psp, vktm, and dbw have dedicated processes.

Also, I found that I needed this value in the listener.ora:

dedicated_through_broker_listener=on

I needed that value to connect using a thread. Before I put that in it spawned a dedicated server process when I connected over the network.

Lastly, contrary to what I had read I didn’t need to set the local_listener parameter to get the new connections to use a thread:

SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ---------------------
local_listener                       string

– Bobby

Posted in Uncategorized | Leave a comment

Finished first pass through Alapati 12c OCP upgrade book

Posted on July 11, 2014 by Bobby

I just finished reading Sam Alapati’s 12c OCP upgrade book for the first time and I really like it because of the content that it covered which I hadn’t discovered through my study of the Oracle manuals. Also, it did a good job explaining some things that Oracle’s manuals left unclear.

After reading each chapter I took the end of chapter test and got between 60% and 75% of the questions right. Next I plan to take the computer based test that was on the CD that came with the book and which covers both parts of the upgrade exam.

I did find minor errors throughout the book, but I still found it very useful especially after having already studied the same topics on my own without a study guide like this one to direct me. The author’s insights into the test and the material it covers adds value because they guide me to the areas that I need to focus on.

– Bobby

Posted in Uncategorized | 7 Comments

Used Delphix to quickly recover ten production tables

Posted on July 8, 2014 by Bobby

Yesterday I used Delphix to quickly recover ten production tables that had accidentally been emptied over the weekend. We knew that at a certain time on Saturday the tables were fully populated and after that some batch processing wrecked them so we created a new virtual database which was a clone of production as of the date and time just before the problem occurred. We could have accomplished the same task using RMAN to clone production but Delphix spun up the new copy more quickly than RMAN would have.

The source database is 5.4 terabytes and there were about 50 gigabytes of archive logs that we needed to apply to recover to the needed date and time. It took about 15 minutes to complete the clone including applying all the redo. The resulting database occupies only 10 gigabytes of disk storage.

If we had used RMAN we would first have to add more disk storage because we don’t have a system with enough free to hold a copy of the needed tablespaces. Then, after waiting for our storage and Unix teams to add the needed storage we would have to do the restore and recovery. All these manual steps take time and are prone to human error, but the Delphix system is point and click and done through a graphical user interface (GUI).

Lastly, during the recovery we ran into Oracle bug 7373196 which caused our first attempted recovery to fail with an ORA-00600 [krr_init_lbufs_1] error. After researching this bug I had to rerun the restore and recovery with the parameter _max_io_size set to 33554432 which is the workaround for the bug. Had we been using RMAN we probably would have to run the recovery at least twice to resolve this bug. Maybe we could have started at the point it failed but I’m not sure. With Delphix it was just a matter of setting the _max_io_size parameter and starting from scratch since I knew the process only took 15 minutes. Actually it took me two or three attempts to figure out how to set the parameter, but once I figured it out it was so simple I’m not sure why I didn’t do it right the first time. So, at the end of the day it was just under 3 hours from my first contact about this issue until they had the database up and were able to funnel off the data they needed to resolve the production issue. Had I been doing an RMAN recover I don’t doubt that I would have worked late into the night yesterday accomplishing the same thing.

– Bobby

P.S. These databases are on HP-UX 11.31 on IA64, Oracle version 11.1.0.7.0.

Posted in Uncategorized | 4 Comments

Sam Alapati’s 12c OCP upgrade book includes test software with key DBA skills

Posted on June 26, 2014 by Bobby

Sweet! I just installed the software from the CD that came with Sam Alapati’s book related to the OCP 12c upgrade exam and found that it has questions related to the key DBA skills section of the test. Now I feel good about my test preparation materials. The first time I took the test I didn’t have this degree of guidance on what to study, especially on the key DBA skills section.

Now I have questions on key DBA skills from the testing software included with Sam Alapati’s book, questions on key DBA skills in the Kaplan Selftest software, and a large section on key DBA skills in Matthew Morris’ book.

I’ve read Matthew’s book already, and I’ve read the key DBA skills part twice. Now I’m going to start reading Sam’s book and then on to the software tests. All this to buy me one more correct answer on the test but I’d rather over prepare this time instead of leaving it to chance.

– Bobby

Posted in Uncategorized | 2 Comments

Bought Matthew Morris’ OCP 12c upgrade book for Key DBA Skills section

Posted on June 18, 2014 by Bobby

I haven’t even started on Sam Alapati’s OCP book but I realized today that Matthew Morris’s OCP 12c upgrade book has an outline of the Key DBA Skills section of the OCP exam and that is what I need most. I was two questions away on that section when I took the test the first time and I need to make sure I’m studying the right things. After I got back to the office after failing my first attempt (a $245 lesson) I made notes about what I remember being unsure of and that gave me some ideas what to study. But, it is nice to get some advice from someone else who has passed the exam. I saved a couple of bucks by getting the Kindle version for $9.99 so it seemed like a safe bet. Hopefully I can hit it hard on both books and my practice test software, which also covers the key DBA skills, and be able to take the test again by the end of the August.

Posted in Uncategorized | 14 Comments

Limits of SQL Optimization Toastmasters Talk

Posted on June 12, 2014 by Bobby

I think I’m done with the script for my Toastmaster’s talk: pdf

It’s Toastmaster’s talk 3 “Get to the Point” so I’m trying to do a 5-7 minute talk that focuses on how Oracle’s SQL optimizer sometimes chooses a slow plan. It’s tough to say anything meaningful in 7 minutes and I’m not using any kind of slides so I have to describe everything verbally.

But, it’s a topic I’m passionate about so I will enjoy giving it. I kind of got bogged down thinking about ways to relate this information to my non-technical audience’s life and I decided to give up on that. I would probably need another 7 minutes or more to explain why they should care that there are limits to SQL optimization so I decided to focus on convincing them that the limits exist.

– Bobby

Posted in Uncategorized | Leave a comment

Bobby Durrett's DBA Blog

Commit Puzzle

Commit scalability

Passed OCP 12c test

Query with new plan

12c Threaded Execution Test

Finished first pass through Alapati 12c OCP upgrade book

Used Delphix to quickly recover ten production tables

Sam Alapati’s 12c OCP upgrade book includes test software with key DBA skills

Bought Matthew Morris’ OCP 12c upgrade book for Key DBA Skills section

Limits of SQL Optimization Toastmasters Talk

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator