Oracle support says we have hit bug 13914613. Here is what our wait events looked like in an AWR report:
Top 5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
latch: shared pool | 3,497 | 17,482 | 4999 | 38.83 | Concurrency |
latch: row cache objects | 885 | 12,834 | 14502 | 28.51 | Concurrency |
db file sequential read | 1,517,968 | 8,206 | 5 | 18.23 | User I/O |
DB CPU | 4,443 | 9.87 | |||
library cache: mutex X | 7,124 | 2,639 | 370 | 5.86 | Concurrency |
What really struck me about these latch waits were that the average wait time was several thousand milliseconds which means several seconds. That’s a long time to wait for a latch.
Oracle pointed to the Latch Miss Sources section of the AWR. This is all gibberish to me. I guess it is the name of internal kernel latch names.
Latch Miss Sources
Latch Name | Where | NoWait Misses | Sleeps | Waiter Sleeps |
---|---|---|---|---|
shared pool | kghfrunp: clatch: wait | 0 | 1,987 | 1,956 |
shared pool | kghfrunp: alloc: session dur | 0 | 1,704 | 1,364 |
Bug description says “Excessive time holding shared pool latch in kghfrunp with auto memory management” so I guess the “kghfrunp” latch miss sources told Oracle support that this was my issue.
I did this query to look for resize operations:
SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final, 2 to_char(start_time,'dd-mon hh24:mi:ss') Started, 3 to_char(end_time,'dd-mon hh24:mi:ss') Ended 4 FROM V$SGA_RESIZE_OPS; COMPONENT OPER_TYPE FINAL STARTED ENDED ------------------------- ------------- --------------- ------------------------- ------------------------- DEFAULT 2K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 streams pool STATIC 134,217,728 12-may 04:33:01 12-may 04:33:01 ASM Buffer Cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT buffer cache INITIALIZING 10,401,873,920 12-may 04:33:01 12-may 04:33:08 DEFAULT 32K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 KEEP buffer cache STATIC 2,147,483,648 12-may 04:33:01 12-may 04:33:01 shared pool STATIC 13,958,643,712 12-may 04:33:01 12-may 04:33:01 large pool STATIC 2,147,483,648 12-may 04:33:01 12-may 04:33:01 java pool STATIC 1,073,741,824 12-may 04:33:01 12-may 04:33:01 DEFAULT buffer cache STATIC 10,401,873,920 12-may 04:33:01 12-may 04:33:01 DEFAULT 16K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT 8K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 DEFAULT 4K buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 RECYCLE buffer cache STATIC 0 12-may 04:33:01 12-may 04:33:01 KEEP buffer cache INITIALIZING 2,147,483,648 12-may 04:33:02 12-may 04:33:04 DEFAULT buffer cache SHRINK 10,334,765,056 20-may 21:00:12 20-may 21:00:12 shared pool GROW 14,025,752,576 20-may 21:00:12 20-may 21:00:12 shared pool GROW 14,092,861,440 27-may 18:06:12 27-may 18:06:12 DEFAULT buffer cache SHRINK 10,267,656,192 27-may 18:06:12 27-may 18:06:12 shared pool GROW 14,159,970,304 01-jun 09:07:35 01-jun 09:07:36 DEFAULT buffer cache SHRINK 10,200,547,328 01-jun 09:07:35 01-jun 09:07:36 DEFAULT buffer cache SHRINK 10,133,438,464 05-jun 03:00:33 05-jun 03:00:33 shared pool GROW 14,227,079,168 05-jun 03:00:33 05-jun 03:00:33 DEFAULT buffer cache SHRINK 10,066,329,600 08-jun 11:06:06 08-jun 11:06:07 shared pool GROW 14,294,188,032 08-jun 11:06:06 08-jun 11:06:07
The interesting thing is that our problem ended right about the time the last shared pool expansion supposedly started. The latch waits hosed up our database for several minutes and it ended right about 11:06. I suspect that the system was hung up with the bug and then once the bug finished then the normal expansion work started. Or, at least, the time didn’t get recorded until after the bug finished slowing us down.
So, I guess it’s just a bug. This is on 11.2.0.3 on HP-UX Itanium. I believe there is a patch set with the fix for this bug.
Maybe it will be helpful for someone to see an example.
– Bobby
Pingback: Flushing Shared Pool Does Not Slow Its Growth | Bobby Durrett's DBA Blog