Monitoring Hadoop with Munin


This document is based on using Debian; some information could be different in a different distribution.

Features

Munin is a very flexible and powerful monitoring tool and framework written mainly in Perl for analyzing resources like CPU, memory, hard disks, networks, and services. Munin has two components:

  1. Munin - the server (aka Grapher/Gatherer), which collects all the data and generates graphs.
  2. Munin-Node - the client, which tracks the data from the machine and sends it to the server.
    Munin runs on every machine that supports Perl. The database of Munin is based on RRD (Round-Robin-Database).

Requirements

  • Both
    • Operating system which supports and has Perl installed (must be POSIX compatible)
  • Server
    • A configured Webserver/Webapp-Framework, where Munin stores the graphs and web interface.

Installation

Munin is on many Linux distributions installable through a package manager suck as apt, yum, yast, or zypper, This section explains how it works on a Debian 5 (Lenny) machine.

Munin (the server)

  1. Install the server where the Datameer X distribution is located using the command:
% sudo apt-get install munin

or on Fedora:

% sudo yum install munin

This installs the server and the client. If you don't want the client on the monitoring machine, then remove munin-node from autostart configuration, such as inittab or rc.d (based on distribution). On some distributions there are no prepackages for Munin, you need to install them manually.

  1. After installation, Munin created the graphs/web interface at the default location, htmldir which can be configured in /etc/munin/munin.conf. The default htmldir is /var/www/munin. If this folder hasn't been created, then wait a while, because the Munin-Server is not a listener which runs as process; it's a script that is executed through Cronjob. You can run the cronjob-run manually as the root user using the following command:

    % su -c /usr/bin/munin-cron munin

    If the Script does not return information usin stdout, then everything should be OK.

  2. Now, you can access Munin through any browser. 

    http://SERVERADDRESS/munin

    You should see "Munin :: Overview" with a list of machines which will be monitored. In this example, you should see the machine you configured as the Munin-Server itself.

Munin-Node (the client)

  1. Install the client where the Datameer X distribution is located, using the command:

    % sudo apt-get install munin-node

    or on Fedora:

    % sudo yum install munin-node
  2. Next, run a process called munin-node. Munin-Node uses port 4949 by default. If you installed the server and the client on the same machine, you should see graphs in the web interface which are updated every five minutes.

Resources

Locations of the data and configuration, based on the default configuration.

File/Folder

Description

/etc/cron.d/munin

Munin-Cronjob (Server)

/etc/cron.d/munin-node

Munin-Node-Cronjob (Client)

/etc/init.d/munin-node

Control-Script for Munin-Node (Client)

/etc/logrotate.d/munin

Logrotator-Script for Munin (Server)

/etc/logrotate.d/munin-node

Logrotator-Script for Munin-Node (Client)

/etc/munin

Configuration-Folder

/etc/munin/munin.conf

Munin-Configuration (Server)

/etc/munin/munin-node.conf

Munin-Node-Configuration (Client)

/etc/munin/plugin-conf.d

Plugin-Configurations

/etc/munin/plugins

The Plugins 

/etc/munin/templates

Templates for the Web-Interface

/etc/rc0.d/K20munin-node

Autorun: Stop Munin-Node (Shutdown)

/etc/rc1.d/K20munin-node

Autorun: Stop Munin-Node (Localmode)

/etc/rc2.d/S98munin-node

Autorun: Start Munin-Node (Runlevel 2)

/etc/rc3.d/S98munin-node

Autorun: Start Munin-Node (Runlevel 3)

/etc/rc4.d/S98munin-node

Autorun: Start Munin-Node (Runlevel 4)

/etc/rc5.d/S98munin-node

Autorun: Start Munin-Node (Runlevel 5)

/etc/rc6.d/K20munin-node

Autorun: Stop Munin-Node (Restart)

/usr/bin/munin-cron

The Munin-Cronjob-Script (Server)

/usr/bin/munindoc

Shows POD-Documentation for the Plugins of Munin

/usr/lib/cgi-bin/munin-cgi-graph

CGI-Script which creates the graphs

/usr/sbin/munin-node

Munin-Node-Program (Client)

/usr/sbin/munin-node-configure

Munin-Node-Configurator

/usr/sbin/munin-node-configure-snmp

Munin-Node-Configurator for SNMP

/var/lib/munin

Location where the data are stored (RRD)

/var/log/munin

Location where the log files of Munin are stored

Manage Service

Action

Debian

Fedora

Node Start

Execute

% /etc/init.d/munin-node start

Execute

% /sbin/service munin-node start

Node Status

Execute

% /etc/init.d/munin-node status

Execute

% /sbin/service munin-node status

Node Stop

Execute

% /etc/init.d/munin-node stop

Execute

% /sbin/service munin-node stop

Node Restart

Execute

% /etc/init.d/munin-node restart

Execute

% /sbin/service munin-node restart

Add Node Autostart

Move
'$HOME/removed-rcd/rc2.d/S98munin-node',
'$HOME/removed-rcd/rc3.d/S98munin-node',
'$HOME/removed-rcd/rc4.d/S98munin-node' and
'$HOME/removed-rcd/rc5.d/S98munin-node'
back to the Runlevel-Directories ('/etc/rc*.d')

Execute

% ntsysv

Check 'munin-node' and press 'OK'

Remove Node Autostart

Move
'/etc/rc2.d/S98munin-node',
'/etc/rc3.d/S98munin-node',
'/etc/rc4.d/S98munin-node' and
'/etc/rc5.d/S98munin-node'
to '$HOME/removed-rcd'

Execute

% ntsysv

Uncheck 'munin-node' and press 'OK'

Configuration

Munin (the server)

By default, you can find the configuration at /etc/munin/munin.conf

Parameter

Default value

Possible values

Description

dbdir

/var/lib/munin

Filesystem-Folder

Location of the RRD-Database

htmldir

/var/www/munin

Filesystem-Folder

Location where the web interface is stored. (Should be accessible through HTTP)

logdir

/var/log/munin

Filesystem-Folder

Location of the log files

rundir

/var/run/munin

Filesystem-Folder

Location of Process-State-Files, such as PID

tmpldir

/etc/munin/templates

Filesystem-Folder

Location of the templates which are used by the web interface

You can define the structure of monitored machines for the web interface. (The format looks similar to INI-Configuration-sections.)
For example:

[localhost.localdomain]
    address 127.0.0.1
    use_node_name yes

Use a group called localdomain and associate it with the machine localhost and use localhost as the name instead the address in the web interface.
To add a server, copy it and change localhost to the name of the machine and change the IP address to the correct IP address of the machine you want to monitor. For more detailed instructions, see Possible Configuration-Parameters for Munin (Server).

Munin-Node (the client)

By default you can find the configuration at /etc/munin/munin-node.conf. Additional information about this configuration can be found at Possible Configuration-Parameters for Munin-Node (Client).

Parameter

Default value

Possible values

Description

log_level

2

0..4

0 = Off, 4 = Maximal Verbose

log_file

/var/log/munin/munin-node.log

Logfile-Location

Where the log file should be stored

pid_file

/var/run/munin/munin-node.pid

Pidfile-Location

Where the PID file should be stored

background

1

1 or comment out and set setsid to 0

Set to 1 to run in the background, or set to 0.

user

root

User

Run node below this user

group

root

Group

Run node below this group

setsid

yes

yes/1 or no/0

Fork after bind to daemonize or not

ignore_file

~$

Expression for excluding Files

Regular expression to exclude files which match this expression (This command can be repeated)

allow

^127\.0\.0\.1$

Expression for IP-Address

Regular expression of IP to allow access on node (This command can be repeated)

host

*

IP-Address or * for all

The address where the node will listen

port

4949

1..65534

The port where the node will listen

cidr_allow

-

CIDR for IP-Address

Allows use of CIDR- notation See http://en.wikipedia.org/wiki/CIDR_notation

cidr_deny

-

CIDR for IP-Address

Cancels (or negates) cidr_allow See http://en.wikipedia.org/wiki/CIDR_notation

If the cidr_* parameter won't work in the configuration, use allow instead; which command is supported depends on the version of Perl Net::Server.