Friday, June 29, 2012

Oracle数据库监控脚本


  1. 检查实例的可用性
  2. 检查监听器的可用性
  3. 检查alert日志文件中的错误信息
  4. 在存放log文件的地方满以前清空旧的log文件
  5. 分析table和index以获得更好的性能
  6. 检查表空间的使用情况
  7. 找出无效的对象
  8. 监控用户和事务

显示服务器上的可用实例:

$ ps -ef | grep smon
oracle 21832 1 0 Feb 24 ? 19:05 ora_smon_oradb1
oracle 898 1 0 Feb 15 ? 0:00 ora_smon_oradb2
dliu 25199 19038 0 10:48:57 pts/6 0:00 grep smon
oracle 27798 1 0 05:43:54 ? 0:00 ora_smon_oradb3
oracle 28781 1 0 Mar 03 ? 0:01 ora_smon_oradb4

显示服务器上的可用监听器:

$ ps -ef | grep listener | grep -v grep
oracle 23879 1 0 Feb 24 ? 33:36 /8.1.7/bin/tnslsnr listener_db1 -inherit
oracle 27939 1 0 05:44:02 ? 0:00 /8.1.7/bin/tnslsnr listener_db2 -inherit
oracle 23536 1 0 Feb 12 ? 4:19 /8.1.7/bin/tnslsnr listener_db3 -inherit
oracle 28891 1 0 Mar 03 ? 0:01 /8.1.7/bin/tnslsnr listener_db4 -inherit

查看Oracle存档目录的文件系统使用情况

$ df -k | grep oraarch
/dev/vx/dsk/proddg/oraarch 71123968 4754872 65850768 7% /u09/oraarch

列出alert.log文件中的全部Oracle错误信息:

$ grep ORA- alert.log
ORA-00600: internal error code, arguments: [kcrrrfswda.1], [], [], [], [], []
ORA-00600: internal error code, arguments: [1881], [25860496], [25857716], []

检查Oracle实例的可用性

oratab文件中列出了服务器上的所有数据库
$ cat /var/opt/oracle/oratab
############################################################
## /var/opt/oracle/oratab                                 ##
############################################################
oradb1:/u01/app/oracle/product/8.1.7:Y
oradb2:/u01/app/oracle/product/8.1.7:Y
oradb3:/u01/app/oracle/product/8.1.7:N
oradb4:/u01/app/oracle/product/8.1.7:Y


以下的脚本检查oratab文件中列出的所有数据库,并且找出该数据库的状态(启动还是关闭)

## ckinstance.ksh ##
ORATAB=/var/opt/oracle/oratab
echo `date`
echo Oracle Database(s) Status `hostname` :
db=`egrep -i :Y|:N $ORATAB | cut -d: -f1 | grep -v # | grep -v *`
pslist=`ps -ef | grep pmon`
for i in $db ; do
echo $pslist | grep ora_pmon_$i > /dev/null 2>$1
if (( $? )); then
echo Oracle Instance - $i: Down
else
echo Oracle Instance - $i: Up
fi
done


使用以下的命令来确认该脚本是可以执行的:

$ chmod 744 ckinstance.ksh
$ ls -l ckinstance.ksh
-rwxr--r-- 1 oracle dba 657 Mar 5 22:59 ckinstance.ksh*


以下是实例可用性的报表:

$ ckinstance.ksh
Mon Mar 4 10:44:12 PST 2002
Oracle Database(s) Status for DBHOST server:
Oracle Instance - oradb1: Up
Oracle Instance - oradb2: Up
Oracle Instance - oradb3: Down
Oracle Instance - oradb4: Up

检查Oracle监听器的可用性

以下有一个类似的脚本检查Oracle监听器。如果监听器停了,该脚本将会重新启动监听器:
#!/bin/ksh
## cklsnr.sh ##
DBALIST=primary.dba@company.com,another.dba@company.com;export DBALIST
cd /var/opt/oracle
rm -f lsnr.exist
ps -ef | grep mylsnr | grep -v grep > lsnr.exist
if [ -s lsnr.exist ]
then
echo
else
echo Alert | mailx -s Listener ‘mylsnr‘ on `hostname` is down $DBALIST
TNS_ADMIN=/var/opt/oracle; export TNS_ADMIN
ORACLE_SID=db1; export ORACLE_SID
ORAENV_ASK=NO; export ORAENV_ASK
PATH=$PATH:/bin:/usr/local/bin; export PATH
. oraenv
LD_LIBRARY_PATH=${ORACLE_HOME}/lib;export LD_LIBRARY_PATH
lsnrctl start mylsnr
fi

检查Alert日志(ORA-XXXXX)

每个脚本所使用的一些环境变量可以放到一个profile中:
## oracle.profile ##
EDITOR=vi;export EDITOR ORACLE_BASE=/u01/app/oracle; export
ORACLE_BASE ORACLE_HOME=$ORACLE_BASE/product/8.1.7; export
ORACLE_HOME LD_LIBRARY_PATH=$ORACLE_HOME/lib; export
LD_LIBRARY_PATH TNS_ADMIN=/var/opt/oracle;export
TNS_ADMIN NLS_LANG=american; export
NLS_LANG NLS_DATE_FORMAT=‘Mon DD YYYY HH24:MI:SS‘; export
NLS_DATE_FORMAT ORATAB=/var/opt/oracle/oratab;export
ORATAB PATH=$PATH:$ORACLE_HOME:$ORACLE_HOME/bin:/usr/ccs/bin:/bin:/usr/bin:/usr/sbin:/
sbin:/usr/openwin/bin:/opt/bin:.; export
PATH DBALIST=primary.dba@company.com,another.dba@company.com;export
DBALIST


以下的脚本首先调用oracle.profile来设置全部的环境变量。如果发现任何的Oracle错误,该脚本还会给DBA发送一个警告的email。

## ckalertlog.sh ##
#!/bin/ksh
.. /etc/oracle.profile
for SID in `cat $ORACLE_HOME/sidlist`
do
cd $ORACLE_BASE/admin/$SID/bdump
if [ -f alert_${SID}.log ]
then
mv alert_${SID}.log alert_work.log
touch alert_${SID}.log
cat alert_work.log >> alert_${SID}.hist
grep ORA- alert_work.log > alert.err
fi
if [ `cat alert.err|wc -l` -gt 0 ]
then
mailx -s ${SID} ORACLE ALERT ERRORS $DBALIST < alert.err
fi
rm -f alert.err
rm -f alert_work.log
done

清除旧的归档文件

以下的脚本将会在log文件达到90%容量的时候清空旧的归档文件:
$ df -k | grep arch
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/proddg/archive 71123968 30210248 40594232 43% /u08/archive
#######################################################################
## clean_arch.ksh                                                    ##
#######################################################################
#!/bin/ksh
df -k | grep arch > dfk.result
archive_filesystem=`awk -F ‘{ print $6 }‘ dfk.result`
archive_capacity=`awk -F ‘{ print $5 }‘ dfk.result`
if [[ $archive_capacity > 90% ]]
then
echo Filesystem ${archive_filesystem} is ${archive_capacity} filled
# try one of the following option depend on your need
find $archive_filesystem -type f -mtime +2 -exec rm -r {} ;
tar
rman
fi

分析表和索引(以得到更好的性能)

以下我将展示如何传送参数到一个脚本中:
## analyze_table.sh ##
#!/bin/ksh
# input parameter: 1: password # 2: SID
if (($# fi
if (($# fi

要传入参数以执行该脚本,输入:

$ analyze_table.sh manager oradb1

脚本的第一部分产生了一个analyze.sql文件,里面包含了分析表用的语句。脚本的第二部分分析全部的表:
## analyze_table.sh ##
#####################################################################
sqlplus -s < oracle/$1@$2
set heading off
set feed off
set pagesize 200
set linesize 100
spool analyze_table.sql
select ANALYZE TABLE || owner || . || segment_name ||
ESTIMATE STATISTICS SAMPLE 10 PERCENT;
from dba_segments
where segment_type = TABLE
and owner not in (SYS, SYSTEM);
spool off
exit
!
sqlplus -s < oracle/$1@$2
@./analyze_table.sql
exit
!


以下是analyze.sql的一个例子:

ANALYZE TABLE HIRWIN.JANUSAGE_SUMMARY ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE HIRWIN.JANUSER_PROFILE ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE APPSSYS.HIST_SYSTEM_ACTIVITY ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE HTOMEH.QUEST_IM_VERSION ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE JSTENZEL.HIST_SYS_ACT_0615 ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE JSTENZEL.HISTORY_SYSTEM_0614 ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE JSTENZEL.CALC_SUMMARY3 ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE IMON.QUEST_IM_LOCK_TREE ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE APPSSYS.HIST_USAGE_SUMMARY ESTIMATE STATISTICS SAMPLE 10 PERCENT;
ANALYZE TABLE PATROL.P$LOCKCONFLICTTX ESTIMATE STATISTICS SAMPLE 10 PERCENT;

检查表空间的使用

以下的脚本检测表空间的使用。如果表空间只剩下10%,它将会发送一个警告email。

## ck_tbsp.sh ##

#####################################################################
#!/bin/ksh
sqlplus -s < oracle/$1@$2
set feed off
set linesize 100
set pagesize 200
spool tablespace.alert
SELECT F.TABLESPACE_NAME,

TO_CHAR ((T.TOTAL_SPACE - F.FREE_SPACE),999,999) "USED (MB)",
TO_CHAR (F.FREE_SPACE, 999,999) "FREE (MB)",
TO_CHAR (T.TOTAL_SPACE, 999,999) "TOTAL (MB)",
TO_CHAR ((ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)),999)|| % PER_FREE
FROM (
SELECT TABLESPACE_NAME,
ROUND (SUM (BLOCKS*(SELECT VALUE/1024
FROM V\$PARAMETER
WHERE NAME = db_block_size)/1024)
) FREE_SPACE
FROM DBA_FREE_SPACE
GROUP BY TABLESPACE_NAME
) F,
(
SELECT TABLESPACE_NAME,
ROUND (SUM (BYTES/1048576)) TOTAL_SPACE
FROM DBA_DATA_FILES
GROUP BY TABLESPACE_NAME
) T
WHERE F.TABLESPACE_NAME = T.TABLESPACE_NAME
AND (ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)) < 10; spool off exit ! if [ `cat tablespace.alert|wc -l` -gt 0 ] then cat tablespace.alert -l tablespace.alert > tablespace.tmp
mailx -s "TABLESPACE ALERT for ${2}" $DBALIST < tablespace.tmp
fi
警告email输出的例子如下:
ABLESPACE_NAME USED (MB) FREE (MB) TOTAL (MB) PER_FREE
------------------- --------- ----------- ------------------- ------------------
SYSTEM 2,047 203 2,250 9 %
STBS01 302 25 327 8 %
STBS02 241 11 252 4 %
STBS03 233 19 252 8 %

查找出无效的数据库对象

以下查找出无效的数据库对象:
## invalid_object_alert.sh ##
#####################################################################
#!/bin/ksh . /etc/oracle.profile
sqlplus -s < oracle/$1@$2
set feed off
set heading off column object_name format a30
spool invalid_object.alert
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE,
STATUS FROM DBA_OBJECTS WHERE STATUS =
INVALID ORDER BY OWNER, OBJECT_TYPE, OBJECT_NAME;
spool off
exit ! if [ `cat invalid_object.alert|wc -l` -gt 0 ] then
mailx -s "INVALID OBJECTS for ${2}" $DBALIST < invalid_object.alert
fi
查看结果:
$ cat invalid_object.alert
OWNER OBJECT_NAME OBJECT_TYPE STATUS
--------------------------------------------
HTOMEH DBMS_SHARED_POOL PACKAGE BODY INVALID
HTOMEH X_$KCBFWAIT VIEW INVALID
IMON IW_MON PACKAGE INVALID
IMON IW_MON PACKAGE BODY INVALID
IMON IW_ARCHIVED_LOG VIEW INVALID
IMON IW_FILESTAT VIEW INVALID
IMON IW_SQL_FULL_TEXT VIEW INVALID
IMON IW_SYSTEM_EVENT1 VIEW INVALID
IMON IW_SYSTEM_EVENT_CAT VIEW INVALIDLBAILEY CHECK_TABLESPACE_USAGE PROCEDURE INVALID
PATROL P$AUTO_EXTEND_TBSP VIEW INVALID
SYS DBMS_CRYPTO_TOOLKIT PACKAGE INVALID
SYS DBMS_CRYPTO_TOOLKIT PACKAGE BODY INVALID
SYS UPGRADE_SYSTEM_TYPES_TO_816 PROCEDURE INVALID
SYS AQ$_DEQUEUE_HISTORY_T TYPE INVALID
SYS HS_CLASS_CAPS VIEW INVALID SYS HS_CLASS_DD VIEW INVALID

监视用户和事务(死锁等)

以下的脚本在死锁发生的时候发送一个警告e-mail:
##!/bin/ksh.. /etc/oracle.profilesqlplus -s < oracle/$1@$2 set feed off set heading off spool deadlock.alert SELECT SID, DECODE(BLOCK, 0, NO, YES ) BLOCKER, DECODE(REQUEST, 0, NO,YES ) WAITER FROM V$LOCK WHERE REQUEST > 0 OR BLOCK > 0ORDER BY block DESC;spool offexit!if [ `cat deadlock.alert|wc -l` -gt 0 ]thenmailx -s "DEADLOCK ALERT for ${2}" $DBALIST < deadlock.alert fi  放入crontab 

 0,20,40 7-17 * * 1-5 /dba/scripts/ckinstance.sh > /dev/null 2>&10,20,40 7-17 * * 1-5 /dba/scripts/cklsnr.sh > /dev/null 2>&10,20,40 7-17 * * 1-5 /dba/scripts/ckalertlog.sh > /dev/null 2>&130 * * * 0-6 /dba/scripts/clean_arch.sh > /dev/null 2>&1* 5 * * 1,3 /dba/scripts/analyze_table.sh > /dev/null 2>&1* 5 * * 0-6 /dba/scripts/ck_tbsp.sh > /dev/null 2>&1* 5 * * 0-6 /dba/scripts/invalid_object_alert.sh > /dev/null 2>&10,20,40 7-17 * * 1-5 /dba/scripts/deadlock_alert.sh > /dev/null 2>&1

1 comment:

  1. Did you know that you can generate cash by locking premium pages of your blog / site?
    Simply open an account on Mgcash and run their Content Locking widget.

    ReplyDelete