In my previous post I explained that setting db_writer_processes=1, dbwr_io_slaves=32 made a 2-3 times reduction in run time of my test of 15 concurrent updates.
Further testing has shown that an even greater improvement – really 10 times – can be made by switching to direct IO and maxing out the db writer processes.
To switch my test database to direct IO I had to do two things:
- Ask one of our Unix administrators to remount the filesystem that contains the datafile being tested using these options: mincache=direct,convosync=direct
- Change this parameter: filesystemio_options=directIO
Then I switched to what the documentation says is the maximum number of db writers:
- db_writer_processes=36
- dbwr_io_slaves=0
I had setup a test that generated free buffer waits by changing my update statements to update more blocks than could be held in the buffer cache and I had set log_checkpoint_interval back to its default of 0 so we wouldn’t get frequent checkpoints. I also increased the redo logs to 2 gig so they wouldn’t switch and checkpoint frequently. So, my test was getting plenty of free buffer waits and it took roughly 30 minutes for my 15 concurrent update statements to update 1 million rows each. This was with my current production settings of db_writer_processes=4 and dbwr_io_slaves=0.
Here is a profile of the time spent by one update statement with no direct io and db_writer_processes=4 and dbwr_io_slaves=0:
TIMESOURCE SECONDS PERCENTAGE -------------------------- ---------- ---------- TOTAL_TIME 2503 100 free buffer waits 2097 84 db file scattered read 253 10 CPU 61 2 UNACCOUNTED_TIME 57 2 db file sequential read 26 1 latch: redo copy 5 0 events in waitclass Other 2 0 log buffer space 1 0
Here is a profile with the direct io options and db_writer_processes=36, dbwr_io_slaves=0:
TIMESOURCE SECONDS PERCENTAGE -------------------------- ---------- ---------- TOTAL_TIME 171 100 free buffer waits 51 30 UNACCOUNTED_TIME 41 24 db file scattered read 34 20 CPU 23 13 log buffer space 16 9 events in waitclass Other 3 2 latch: redo copy 2 1
Incredible. Thanks to Jonathan Lewis and Mark Powell for all of their patient discussion of this issue with me on our forum thread.
So, I guess the bottom line is that if you can’t get your filesystems mounted with direct IO options then the IO slaves may be the way to go in certain scenarios. But, with direct IO it appears that upping the number of db writers is better than using IO slaves, at least in a scenario like mine were you have many concurrent updates filling the buffer cache with updated blocks and waiting on free buffer waits.
– Bobby
Pingback: Two configuration changes with 11.2.0.4 upgrade | Bobby Durrett's DBA Blog