I’m still doing state dumps as described in my previous post.
I wanted to collect the state dumps (diag process trace files) from all nodes of my Exadata database and decided to write a couple of scripts to automate it because I anticipate doing more state dumps in the future. So, I thought I’d share these on the blog.
Both scripts take two arguments: TARGET_HOST and TARGET_SID. TARGET_HOST is the node in your Exadata cluster that you want to retrieve the state dump from. TARGET_SID is the instance name on that node. I.e. If your database name is xyz your TARGET_SID on the third node would be xyz3.
The first script gzips the statedump on the target host and scp’s it back to the host you run the script from. Within the script you define a local directory where the gzipped dump file will land. Just be sure there is enough room so you don’t fill up / or some other important filesystem. Here is the script:
export TARGET_HOST=$1 export TARGET_SID=$2 export DBNAME=`echo $ORACLE_SID|awk '{ print substr($1,1,length($1)-1) }'` export DIAG_PID=`ssh ${TARGET_HOST} "ps -ef" | grep diag | grep ${TARGET_SID} | awk '{print $2;}'` export TRACEDIR=/u01/app/oracle/diag/rdbms/$DBNAME/$TARGET_SID/trace export TRACEFILE=${TRACEDIR}/${TARGET_SID}_diag_${DIAG_PID}.trc export GZTRACEFILE=${TRACEFILE}.gz export DUMPDEST=YOURPREFERREDDIRECTORYHERE ssh ${TARGET_HOST} "gzip < $TRACEFILE > $GZTRACEFILE" scp ${TARGET_HOST}:${GZTRACEFILE} ${DUMPDEST} ssh ${TARGET_HOST} "rm $GZTRACEFILE"
So, replace YOURPREFERREDDIRECTORYHERE with the directory on your current host where you want the state dumps to land. I put this on the first node and ran this script for all nodes in the cluster with the SID’s for the one database that I had gotten the state dumps on. The script will only work if you are on one of the first 9 nodes because it assumes your current instance number has only one digit.
The second script goes back and truncates the source trace file. As I mentioned in the previous post the diag process never closes its trace file so to clear out old state dumps you need to cat /dev/null over it. I didn’t want to put this in the first script because I wanted to make sure the state dumps all got copied over correctly. Once I have them in my target directory I run this script on all the target hosts and sids:
export TARGET_HOST=$1 export TARGET_SID=$2 export DBNAME=`echo $ORACLE_SID|awk '{ print substr($1,1,length($1)-1) }'` export DIAG_PID=`ssh ${TARGET_HOST} "ps -ef" | grep diag | grep ${TARGET_SID} | awk '{print $2;}'` export TRACEDIR=/u01/app/oracle/diag/rdbms/$DBNAME/$TARGET_SID/trace export TRACEFILE=${TRACEDIR}/${TARGET_SID}_diag_${DIAG_PID}.trc ssh ${TARGET_HOST} "cat /dev/null > ${TRACEFILE}"
Now that I have two higher level scripts setup to run these two for every node of my production database I can collect the statedumps into one directory easily and clear out the diag trace files all with two scripts.
– Bobby