Nagios 安装、配置和使用 操作
1. 概述本手册主要描述Nagioscores,Nagiosplugin,NRPE,NDOUtils的安装、配置以及Horizon如何使用Nagios实现监控Openstack控制和计算节点硬件资源[1]和服务[2]。
备注:
[1]:CPU,Mem,Disk,Network
[2]:keystone, glance-api, glance-register, nova-api, nova-computer,nova-network, nova-scheduler, nova-volume, nova-objectstores, mysql,dnsmasq, rabbitmq, etc
2. ReferencesNagios官方docs:http://www.nagios.org/documentation
参考手册:http://library.nagios.com/library/products/nagioscore/manuals
PluginResources:http://exchange.nagios.org/
TarResources:http://sourceforge.net/projects/nagios/files/?source=navbar
3. 环境准备操作系统:Ubuntu 12.04 LTS 64x server
Nagioscore Version:nagios-3.4.4
NRPEVersion:nrpe-2.14
NDOUtilsVersion:ndoutils-1.5.2
Dependslist:
apache2
libapache2-mod-php5
build-essential
libgd2-xpm-dev
make
gcc
xinetd
-dDEVICE DEVICE must be without /dev (ex: -d sda)
-w/cTPS,READ,WRITE TPS means transfer per seconds (aka IO/s)
READ andWRITE are in sectors per seconds
Example:
【本地环境】
$ sudo/usr/local/nagios/check_diskstat.sh -d vda -w 200,100000,100000 -c300,200000,200000
>summary: 0 io/s, read 8 sectors (0kB/s), write 56 sectors (4kB/s) in6 seconds | tps=0io/s;;; read=682b/s;;; write=4778b/s;;;
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_diskstat]=/usr/local/nagios/libexec/check_diskstat.sh -d vda -w 200,100000,100000 -c 300,200000,200000
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ diskstat
> 同上
---------------------------------------------------
插件名称:check_disk
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_disk--2D-%25-used-space/details
插件描述:基于df命令编写,-d需要设置df打印出来的Mountedon
插件参数:
Thisplugin shows the % of used space of a mounted partition, using the'df' utility
./check_disk:
-c<integer> If the % of used space is above <integer>,returns CRITICAL state
-w<integer> If the % of used space is below CRITICAL and above<integer>, returns WARNING state
-d<device> The partition or mountpoint to be checked. eg./dev/sda1, /home, /
Example:
【本地环境】
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.9G 1.7G 7.8G 18% /
udev 998M 12K 998M 1% /dev
tmpfs 401M 224K 401M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1002M 0 1002M 0% /run/shm
/dev/vdb 20G 173M 19G 1% /mnt
$/usr/local/nagios/check_disk -d /mnt -c 80 -w 10
> OK- /mnt space used=1% | '/mnt usage'=1%;10;80;
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_disk]=/usr/local/nagios/libexec/check_disk-d /mnt -c 80 -w 10
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ disk
> 同上
---------------------------------------------------
插件名称:check_lvm
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_lvm/details
插件描述:仅运行在存在vg的情况下
插件参数:
NOTE -This script only works on _mounted_ volumes!
Usage:./check_lvm -w -c
Description:
Thisplugin finds all LVM logical volumes, checks their used space, andcompares against the supplied thresholds.
Example:
插件名称:check_proc
插件描述:基于ps,可用于查看相关服务的进程是否存在。
插件参数:
check_procs-w <range> -c <range> [-m metric] [-s state] [-p ppid]
[-uuser] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
[-Ccommand] [-t timeout] [-v]
Options:
-h,--help
Printdetailed help screen
-V,--version
Printversion information
-w,--warning=RANGE
Generatewarning state if metric is outside this range
-c,--critical=RANGE
Generatecritical state if metric is outside this range
-m,--metric=TYPE
Checkthresholds against metric. Valid types:
PROCS - number of processes (default)
VSZ - virtual memory size
RSS - resident set memory size
CPU - percentage CPU
ELAPSED- time elapsed in seconds
-t,--timeout=INTEGER
Secondsbefore connection times out (default: 10)
-v,--verbose
Extrainformation. Up to 3 verbosity levels
Filters:
-s,--state=STATUSFLAGS
Onlyscan for processes that have, in the output of `ps`, one or
moreof the status flags you specify (for example R, Z, S, RS,
RSZDT,plus others based on the output of your 'ps' command).
-p,--ppid=PPID
Onlyscan for children of the parent process ID indicated.
-z,--vsz=VSZ
Onlyscan for processes with VSZ higher than indicated.
-r,--rss=RSS
Onlyscan for processes with RSS higher than indicated.
-P,--pcpu=PCPU
Onlyscan for processes with PCPU higher than indicated.
-u,--user=USER
Onlyscan for processes with user name or ID indicated.
-a,--argument-array=STRING
Onlyscan for processes with args that contain STRING.
--ereg-argument-array=STRING
Onlyscan for processes with args that contain the regex STRING.
-C,--command=COMMAND
Onlyscan for exact matches of COMMAND (without path).
Example:
$/usr/local/nagios/check_procs -w 3 -c 5 -a nagios
>PROCS OK: 2 processes with args 'nagios'
[LOG]
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_log-2Esh/details
http://exchange.nagios.org/directory/Plugins/Log-Files
[DNS]
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_dig/details
[DHCP]
http://exchange.nagios.org/directory/Plugins/Network-Protocols/DHCP-and-BOOTP
[AMQP]
http://exchange.nagios.org/directory/Plugins/Software/check_rabbitmq/details
[MYSQL]
http://exchange.nagios.org/directory/Plugins/Databases/MySQL
[ROUTE]
http://exchange.nagios.org/directory/Plugins/Network-Protocols/%2A-Routing
Nagios本身具有web界面,web界面通过与Nagioscore的进程交互获取信息,而Nagioscore通过plugin获取信息,并将数据保存在mysql数据库中。
由于在目前环境下仅需基于Nagios的plugin获取节点的监控信息,所以并未在Nagioscore,NDOUtils,Nagiosweb interface进行深入描述。具体详细信息科参考Refernces。