I saw a load of 44 on a node of our production Exadata and it worried me. The AWR report looks like this:
Host CPU Load Average CPUs Begin End %User %System %WIO %Idle ----- --------- --------- --------- --------- --------- --------- 16 10.66 44.73 68.3 4.3 0.0 26.8
So, why is the load average 44 and yet the CPU is 26% idle?
I started looking at ASH data and found samples with 128 processes active on the CPU:
select 2 sample_time,count(*) 3 from DBA_HIST_ACTIVE_SESS_HISTORY a 4 where 5 session_state='ON CPU' and 6 instance_number=3 and 7 sample_time 8 between 9 to_date('05-MAR-2015 01:00:00','DD-MON-YYYY HH24:MI:SS') 10 and 11 to_date('05-MAR-2015 02:00:00','DD-MON-YYYY HH24:MI:SS') 12 group by sample_time 13 order by sample_time; SAMPLE_TIME COUNT(*) ---------------------------- ---------- 05-MAR-15 01.35.31.451 AM 128 ... lines removed for brevity
Then I dumped out the ASH data for one sample and found all the sessions on the CPU were running the same parallel query:
select /*+ parallel(t,128) parallel_index(t,128) dbms_stats ...
So, for some reason we are gathering stats on a table with a degree of 128 and that spikes the load. But, why does the CPU idle percentage sit at 26.8% when the load starts at 10.66 and ends at 44.73? Best I can tell load in DBA_HIST_OSSTAT is a point measurement of load. It isn’t an average over a long period. The 11.2 manual describes load in v$osstat in this way:
Current number of processes that are either running or in the ready state, waiting to be selected by the operating-system scheduler to run. On many platforms, this statistic reflects the average load over the past minute.
So, load could spike at the end of an hour-long AWR report interval and still CPU could average 26% idle for the entire hour? So it seems.
– Bobby