Finding query with high temp space usage using ASH views

We are having challenges with queries that spend most of their time on these waits:

enq: TS – contention

This wait indicates that a query on one instance of a RAC system is causing its instance to add temp space to its sort segment and remove space from another instance’s sort segment.  For some reason this process is extremely slow and totally bogs down any query waiting on this type of wait.  I worked an SR with Oracle on this issue and Oracle development claims this is not a bug.  I’ve duplicated the problem on both 10.2 and 11.2 RAC.

Anyway, so now we periodically see queries spending most of their time on this wait and this occurs when some other query is eating up all the temp space or has done so recently.  So how do I go back in time and figure out what query was using temp space?  If I wanted to see the session ids of the current users of temp space on a RAC system I would just join gv$tempseg_usage to gv$session like this:

select se.inst_id,se.sid,sum(tu.blocks)*8192
from
gv$tempseg_usage tu,
gv$session se
where
tu.inst_id=se.inst_id and
tu.session_addr=se.saddr and
tu.session_num=se.serial#
group by se.inst_id,se.sid
order by se.inst_id,se.sid;

This assumes an 8K blocksize.  But to go back in time you can use the TEMP_SPACE_ALLOCATED column of V$ACTIVE_SESSION_HISTORY or DBA_HIST_ACTIVE_SESS_HISTORY to identify a query that consumed a lot of temp space.

Here is an example:

select sql_id,max(TEMP_SPACE_ALLOCATED)/(1024*1024*1024) gig 
from DBA_HIST_ACTIVE_SESS_HISTORY 
where 
sample_time > sysdate-2 and 
TEMP_SPACE_ALLOCATED > (50*1024*1024*1024) 
group by sql_id order by sql_id;

This gives the sql_id and maximum allocated temp space of any queries that ran in the past two days and exceeded 50 gigabytes of temp space.

This is a great feature of these ASH views.  Now we can go back in time and find the queries that were using all the temp space and tune them.

– Bobby

Posted in Uncategorized | 22 Comments

sar -d output during disk issue

I thought it might be helpful to show what sar -d returns during a disk issue.  Sorry, I only have a screenshot I took during the problem.  Click on the smaller image to see the full size jpeg.

This was a “sar -d 5 2” –  two five second intervals.  Here is what this looks like now that the disk subsystem is functioning again:

 

When I first checked on the problem database the sessions were waiting on log file sync.  I did an insert into a table and the insert ran quickly, but the commit took like 27 seconds.  I believe this is because the block I inserted into wasn’t immediately written to disk but the commit forces a write to the redo log.  So, anything doing a commit was taking forever.  One thing I’m confused about is why the long time is avwait while the avserv is short.  Maybe this is because the current request is taking forever and isn’t included in avserv whereas the wait time for the next request is included in avwait, but that is just a guess.  Maybe if I did a longer interval for sar -d then the avserv would have been higher, but I’ve only got the screenshot I took at the time.

– Bobby

Posted in Uncategorized | Leave a comment

Session profile using ASH

Worked on the scripts in this zip today: https://www.bobbydurrettdba.com/uploads/sessionprofiledbahist.zip

I wanted to see a session profile from the DBA_HIST_ACTIVE_SESS_HISTORY view.  My idea was to include the time the session was inactive as well as the time it was waiting on an event or using the CPU.

On both these scripts you have to manually update the date range.

ashsessionlist.sql – this just gives you a list of the sessions for a given date range.  Sorts them from most to least active.  Also shows the total timeframe so you can tell how much inactive time there was compared with active.

ashsessionprofile.sql – takes a given instance, SID, and sequence number and produces a nice session profile.  The key to any session profile is to include some comparison of the waits and cpu with the overall elapsed time.

– Bobby

Posted in Uncategorized | Leave a comment

Exadata presentation

Slides for a presentation on Exadata that I did at work:

https://www.bobbydurrettdba.com/uploads/ExadataDistinctives.ppt

Hopefully I balanced out the good and bad.  Exadata has some good features but we have hit a large number of bugs.  Given adequate testing you could benefit from the new features.

– Bobby

Posted in Uncategorized | Leave a comment

Autotask clients fall into OTHER_GROUPS on custom plans

The autotask clients run in their own resource consumer groups.  So, stats, space, and SQL tuning tasks have some consumer groups that you have to include in your own plans if your plan will be active while the window for these tasks is open.

Here are the Oracle supplied consumer groups for the autotask clients:

SQL> select client_name,consumer_group
   2 from DBA_AUTOTASK_CLIENT;

CLIENT_NAME                      CONSUMER_GROUP
-------------------------------- ------------------------------
auto optimizer stats collection  ORA$AUTOTASK_STATS_GROUP
auto space advisor               ORA$AUTOTASK_SPACE_GROUP
sql tuning advisor               ORA$AUTOTASK_SQL_GROUP

In a case I was working on our custom resource plan was active but didn’t include any of the consumer groups listed above.  So, when stats ran it fell through to the other_groups part of the plan.  Sadly, other_groups was set to 0% which should never be the case and this caused the stats job to hang when the CPU use was high.

So, don’t make other_groups 0% or include these special autotask consumer groups in your plan if your plan will be active during the maintenance windows.

– Bobby

Posted in Uncategorized | Leave a comment

Resource Manager wait events

You many wonder why you should care about Oracle Resource Manager wait events.  You should care because every Oracle 11g database uses Resource Manager by default during the maintenance windows.  So, even if you have not intentionally configured RM you may see waits that start with resmgr:  which are RM waits.  It is helpful to understand what the two main RM waits are and what they mean.

The default maintenance windows are M-F 22:00 for four hours and 06:00 for 20 hours on the weekend.  Several times we have had performance complaints and had people say that one of the main waits during the slow time was “resmgr:cpu quantum“.

The resmgr:cpu quantum wait event means that your database server’s CPU is high enough that resource manager has kicked in and is dividing up the CPU cycles among the various sessions that need it.  The plan that runs by default during the maintenance window is called DEFAULT_MAINTENANCE_PLAN.  This plan allocates most of the CPU resources to tasks other than the maintenance tasks.  This is really a helpful feature because it prevents the maintenance tasks like gathering optimizer statistics from eating up a lot of CPU resources that are needed by user processes.  But, if you have a busy system and see a bunch of resmgr:cpu quantum waits and don’t know what they are it can be disconcerting.  The solution is to look at what is using so much CPU and tune that.  Resource Manager actually does a great job of dialing back the CPU of the maintenance jobs so there is no need to disable it.

I’ve seen one situation where resmgr:cpu quantum indicated a real problem.  The active plan had a 0% CPU allocation.  In this case the session was waiting on a resmgr:cpu quantum wait but never got a CPU slice.  This is essentially a hang and needs to be addressed by giving the user associated with the session a > 0% CPU allocation.

The second RM wait that I have seen is “resmgr:become active“.  You will only see this wait if you have configured RM yourself to have limits on the number of active sessions.  When the number of active sessions that are in a given consumer group reaches the maximum then any other sessions that want to become active will wait on resmgr:become active.  I really don’t like active session limits and encourage you to carefully consider whether these limits make sense in your situation.  The sessions waiting on resmgr:become active are essentially hung and eventually will timout.  If you have a hang like situation and see a bunch of resmgr:become active waits then there is cause for concern and you may need to increase or eliminate your active session limits.

So, to summarize:

resmgr:cpu quantum – sessions are sharing CPU, not necessarily a problem

resmgr:become active – sessions are hung , can be a big problem

– Bobby

Posted in Uncategorized | Leave a comment

Oracle blog aggregators

Found a couple of Oracle blog aggregators.  These sites combine the many Oracle DBA blogs into single sites.

http://www.orafaq.com/

http://orana.info/ – NO LONGER EXISTS

It looks like Oracle has an aggregator for their own employees:

https://blogs.oracle.com/

Pretty cool.

– Bobby

Posted in Uncategorized | Leave a comment

DBA_HIST_ACTIVE_SESS_HISTORY

I’ve been using the view DBA_HIST_ACTIVE_SESS_HISTORY to investigate problems lately.  I’ve found this useful for both performance problems and hangs.  I’ve done some more complicated things, but I find that just dumping out every column of the view for a very narrow window of time can help explain what was happening in a way that other tools do not. 

DBA_HIST_ACTIVE_SESS_HISTORY records the list of active sessions every 10 seconds.  It is like looking at your active sessions in Toad except going back in time.  We have been keeping six weeks of history on the AWR so I can go back to any ten second interval over the past six weeks and see what was running.

In the case of a hang or locking you can see the blocking sessions and what they are spending time on.  Here is a recent example I used to debug a hang caused by sessions on two different instances – 1 and 12.  I picked a 10 second interval that I knew was during the hang:

select 
*
from DBA_HIST_ACTIVE_SESS_HISTORY
where 
sample_time 
between 
to_date('18-APR-2012 10:40:00','DD-MON-YYYY HH24:MI:SS')
and 
to_date('18-APR-2012 10:40:10','DD-MON-YYYY HH24:MI:SS')
and instance_number in (1,12)
order by sample_time,instance_number,SESSION_ID;

One key point if you have a hang or other issue is to run DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT before bouncing the database if you have to bounce.  Otherwise you will lose all of the active session history since the last snapshot when the database bounces.

– Bobby

Posted in Uncategorized | 2 Comments

How to show currently active events

Found a helpful PL/SQL block in this presentation:

http://www.sagelogix.com/idc/groups/public/documents/sagelogix-presentation/sage016048.pdf – NO LONGER EXISTS

This shows you which events are set – i.e. by an alter system set events command.  I modified this to have a larger range of event values – to 29999.

declare
event_level number;
begin
for i in 10000..29999 loop
sys.dbms_system.read_ev(i,event_level);
-- note: requires exec permission for DBMS_SYSTEM
if (event_level > 0) then
dbms_output.put_line
('Event '||i||' set at level '|| event_level);
end if;
end loop;
end;
/

This shows the events that were set dynamically as well as any set in the init parameter event.

Run:

alter system set events
'10503 trace name context forever, level 4000';

Then run the PL/SQL block and you get this:

Event 10503 set at level 4000

Also, Oracle note 160178.1 has a good explanation of setting events dynamically and in an spfile.

– Bobby

Posted in Uncategorized | Leave a comment

DBMS_SPACE.SPACE_USAGE

Found a cool package and procedure while working on a problem today.

DBMS_SPACE.SPACE_USAGE

This takes a segment and shows how many blocks are free or partially filled.

In our case we hit a bug which resulting in tons of unformatted blocks and this was the recommended fix.

Oracle note 729149.1 describes the issue and shows an example use of DBMS_SPACE.SPACE_USAGE.

Here is a simple script and log:  https://www.bobbydurrettdba.com/uploads/space_usage.zip

Here is the output of the PL/SQL block:

Total number of blocks that are unformatted: 0
Number of blocks that has at least 0 to 25% free space: 0
Number of blocks that has at least 25 to 50% free space: 0
Number of blocks that has at least 50 to 75% free space: 0
Number of blocks that has at least 75 to 100% free space: 5
Total number of blocks that are full in the segment: 0

The example is a small table with the minimum 5 blocks and only 1 row.   The output says that the 5 blocks all have at least 75% of their space free.

– Bobby

Posted in Uncategorized | Leave a comment