CentOS搭建Nagios监控

     2013年04月14日       teddy.sun       运维笔记->系统管理       centos nagios 

A.Nagios服务端
1.安装软件包
yum install -y httpd
2.下载nagios
wget  http://syslab.comsenz.com/downloads/linux/nagios-3.0.5.tar.gz
wget  http://syslab.comsenz.com/downloads/linux/nagios-plugins-1.4.13.tar.gz
wget  http://syslab.comsenz.com/downloads/linux/nrpe-2.12.tar.gz
3.添加nagios账号
useradd nagios
4.编译安装nagios
mkdir /opt/hadoop/
tar -xzvf nagios-3.0.5.tar.gz
cd nagios-3.0.5
./configure --prefix=/opt/hadoop/nagios
make all
make fullinstall
mkdir /opt/hadoop/nagios/etc
mkdir /opt/hadoop/nagios/etc/objects
cp ./sample-config/cgi.cfg /opt/hadoop/nagios/etc/
cp ./sample-config/nagios.cfg /opt/hadoop/nagios/etc/
cp ./sample-config/resource.cfg /opt/hadoop/nagios/etc/
cp ./sample-config/template-object/commands.cfg /opt/hadoop/nagios/etc/objects/
cp ./sample-config/template-object/contacts.cfg /opt/hadoop/nagios/etc/objects/
cp ./sample-config/template-object/timeperiods.cfg /opt/hadoop/nagios/etc/objects/
cp ./sample-config/template-object/templates.cfg /opt/hadoop/nagios/etc/objects/
cp ./sample-config/template-object/localhost.cfg /opt/hadoop/nagios/etc/objects/
touch /opt/hadoop/nagios/var/nagios.log
chmod -R 755 /opt/hadoop/nagios/etc/
chown -R nagios:nagios /opt/hadoop/nagios
5.编译安装nagios-plugins
tar zxvf nagios-plugins-1.4.13.tar.gz
cd  nagios-plugins-1.4.13
./configure --prefix=/opt/hadoop/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
检查是否已经安装成功,看这个目录下是否有插件文件
ls /opt/hadoop/nagios/libexec/
6.安装nrpe
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --prefix=/opt/hadoop/nagios --enable-ssl --enable-command-args 
make all
make install-plugin
make install-daemon
make install-daemon-config
7.配置httpd
添加web账号
htpasswd -c /opt/hadoop/nagios/etc/htpasswd.users nagiosadmin
B.Nagios客户端
1.准备软件包
wget  http://syslab.comsenz.com/downloads/linux/nagios-plugins-1.4.13.tar.gz
wget  http://syslab.comsenz.com/downloads/linux/nrpe-2.12.tar.gz
2.添加nagios账号,准备安装目录
mkdir /opt/hadoop/nagios
useradd nagios
3.编译安装nrpe
tar -xzvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --prefix=/opt/hadoop/nagios --enable-ssl --enable-command-args
make all
make install-plugin
make install-daemon
make install-daemon-config
4.安装nagios-plugin
tar -xzvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure --prefix=/opt/hadoop/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
检查是否已经安装成功,看这个目录下是否有插件文件
ls /opt/hadoop/nagios/libexec/
5. 配置nrpe
vim /opt/hadoop/nagios/etc/nrpe.cfg
找到”allowed_hosts=127.0.0.1” 改成 “allowed_hosts=127.0.0.1,10.130.2.72”,后边的IP是nagios服务端IP
找到” dont_blame_nrpe=0” 改成 “dont_blame_nrpe=1”
6.一段nrpe启停脚本,放在/etc/init.d/nrpe里
#!/bin/bash
#
# chkconfig: 2345 55 25
# description: NRPE Daemon
#

# source function library
. /etc/rc.d/init.d/functions

RETVAL=0

prog='nrpe'
NRPE_CFG='/opt/hadoop/nagios/etc/nrpe.cfg'
NRPE_PRG='/opt/hadoop/nagios/bin/nrpe'
NRPE_OPT='-d'
PID_FILE='/var/run/nrpe.pid'

start()
{
        echo -n $"Starting $prog: "
    [ -f $PID_FILE ] && rm -f $PID_FILE
    $NRPE_PRG -c $NRPE_CFG $NRPE_OPT
    pid=`ps aux | grep -v grep | grep $NRPE_PRG | awk '{print $2}'`
    echo $pid > $PID_FILE

    if ps aux | grep -v grep |  grep -q $NRPE_PRG ; then
            RETVAL=0
        success
    else
            RETVAL=1
        failure
    fi
    echo
}

stop()
{
        echo -n $"Stopping $prog: "
    ps --pid=`cat $PID_FILE` &>/dev/null
    if [ $? -eq 0 ] ; then
        kill -9 `cat $PID_FILE`
            RETVAL=0
    fi
    success
    echo
        RETVAL=0
}

case "$1" in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                start
                ;;
        status)
                status -p $PID_FILE $prog
                RETVAL=$?
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|status}"
                RETVAL=1
esac
exit $RETVAL
6. 启动nrpe
/etc/init.d/nrpe start
C.Nagios服务端添加被监控机
1.配置监控机目录
mkdir /opt/hadoop/nagios/etc/servers
vim /opt/hadoop/nagios/etc/nagios.cfg 追加cfg_dir=/opt/hadoop/nagios/etc/servers
2.添加配置的机器
vim /opt/hadoop/nagios/etc/servers/10.130.2.22.cfg
define host{
       use                     linux-server
       host_name               10.130.2.22
       alias                   10.130.2.22
       address                 10.130.2.22
}
define service{
       use                     generic-service
       host_name               10.130.2.22
       service_description     check_ping
       check_command           check_ping!100.0,20%!200.0,50%
       max_check_attempts      5
       normal_check_interval   1
}
define service{
       use                     generic-service
       host_name               10.130.2.22
       service_description     check_ssh
       check_command           check_ssh
       max_check_attempts      5
       normal_check_interval   1
}
3.reload nagios服务端使配置生效
service nagios reload
重新加载nagios后就可以在nagios的界面上看到新的被监控的机器了
4.添加使用nrpe的监控
在/opt/hadoop/nagios/etc/objects/commands.cfg里增加如下行
define command{
       command_name    check_nrpe
       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
在服务器监控配置文件中加入如下行,确保被监控机的nrpe服务是开的
define service{
       use                     generic-service
       host_name               10.130.2.22
       service_description     check_load
       check_command           check_nrpe!check_load
       max_check_attempts      5
       normal_check_interval   1
}
重新加载nagios使配置生效。
service nagios reload
5.自定义监控脚本
编写脚本check_diskmount.sh
vim /opt/hadoop/nagios/libexec/check_diskmount.sh
#!/bin/bash
num=`cat /proc/mounts  | grep '/disk' | wc -l`
if [ $num -eq 12 ] ; then
   echo "OK - mount disk is $num"
   exit 0
else
   echo "Critical - mount disk is $num"
   exit 1
fi
加上可执行权限
chmod +x /opt/hadoop/nagios/libexec/check_diskmount.sh
在被监控机的nrpe里加入自定义脚本路径
vim /opt/hadoop/nagios/etc/nrpe.cfg
command[check_diskmount]=/opt/hadoop/nagios/libexec/check_diskmount.sh
重启nrpe
/etc/init.d/nrpe restart
在nagios服务端加入配置
vim /opt/hadoop/nagios/etc/servers/10.130.2.22.cfg
define service{
       use                     generic-service
       host_name               s9xplan2.isv.cm6
       service_description     check_diskmount
       check_command           check_nrpe!check_diskmount
       max_check_attempts      3
       normal_check_interval   1
}
重新加载nagios,使得配置生效
service nagios reload