Cleaning Up Old RRD Data (clean-rrd-data) August 16, 2010

Issue

OpenNMS can collect a tremendous amount of SNMP and respose data. It stores that data in jrb files under $OPENNMSBASE/share/rrd/. Once created, OpenNMS will normally not delete these files, even when they are not being written to anymore. In my experience once a file has not been updated in a week it normally will not be written to in the future. Here are a few examlples of how this can occur:

  1. The OpenNMS administrator does not tick off “delete data” when a node is deleted.
  2. A hardware component of a node is replaced (eg. stack member of a switch stack or a disk in a server). In some cases these devices SNMP name include their hardware address or seial number. Therefore the SNMP data for the replacement hardware is put into new jrb files and the old files, with the old hardwares name, are left unused. (In this case you may want to copy the old data files over the new files so the history is not lost, however I rarely find a need to do this.)
  3. The OpenNMS config files are changed so that some data is no longer collected and therefore jrb files are left unused.

Assuming you don’t need it for reporting, this data only uses up disk space and slows down backups. It also shows up in the Resource Graphs page of the OpenNMS web interface along with the current data. This can lead to confusion. Users may assume data is not being collected correctly because some graphs show data and others are empty. They may only be looking at the last few days and don’t see the old data from the old files.

In terms of size, 100,000 jrb files (one for each dataum collected) with data stored at 5 minute intervals and kept for 2 years would result in about 200GB of disk space being required. Your numbers may differ depending on your jrb file configuration. This is the jrb configuration I use:

<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:211000</rra>
<rra>RRA:MAX:0.5:288:732</rra>
<rra>RRA:MIN:0.5:288:732</rra>
</rrd>

Solution

As long as you don’t need this data for reporting it may be helpful to remove it. OpenNMS does not provide a means to this so I created the clean-rrd-data script.

Synopsis

$NOCBASE/bin/clean-rrd-data

Must be run as root (or using sudo) since this OpenNMS data is writable only by root.

Description

The script searches $OPENNMSBASE/share/rrd for the following items and asks if each should be deleted:

  1. SNMP directories ($OPENNMSBASE/share/rrd/snmp/*) where the nodeid has data but the node no longer exists in the opennms database;
  2. SNMP jrb files that have not been updated in the last 7 days. (Only a warning will be given if all SNMP jrb files for a node have not been updated in 7 days since it is likely the problem is with the data collection for that node rather than anything that needs to be cleaned up with the data. The node may have simply been offline for more than 7 days.);
  3. strings.properties in SNMP direcotries that have no jrb files. This would result if the previous check removed all the jrb files;
  4. SNMP directories that are empty as a result of the previous two steps;
  5. response jrb files that have not been updated in 7 days;
  6. response directories ($OPENNMSBASE/share/rrd/response/*) that are empty as a result of the previous step.

Implementation

Prerequisites

Setting up Scripts

Install

Download the clean-rrd-data file and copy it to $NOCBASE/bin/clean-rrd-data. Make sure to enable the execute bit with chmod as shown below.

/bin/bash
source /etc/noc.conf
cd $NOCBASE/bin/
wget http://opennms.dougbakewell.ca/downloads/bin/clean-rrd-data
chmod a+x $NOCBASE/bin/clean-rrd-data

Example

[root~]# $NOCBASE/bin/clean-rrd-data

============================================
NODE: 126
LABEL: Server1 ( M-Servers S-Production T-Switch )
TOTAL SNMP DATA FILES: 227
WARNING: no data collection in last 7 days

============================================
NODE: 26
LABEL: Server2 ( M-Servers S-Production T-Server )
TOTAL SNMP DATA FILES: 52
Checking SNMP Data for any datums older than 7 days.
WARNING: 1700132 Aug  3 11:47 26/cim5MinCpuUtilPct.jrb
WARNING: 1700132 Aug  3 11:47 26/cimPctHddUsed2.jrb
WARNING: 1700132 Aug  6 02:09 26/hrDeviceEntry/10/hrProcessorLoad.jrb
WARNING: 1700132 Jul 20 19:50 26/hrDeviceEntry/3/hrProcessorLoad.jrb
WARNING: 1700132 Aug  6 02:09 26/hrDeviceEntry/13/hrProcessorLoad.jrb
WARNING: 1700132 Aug  6 02:09 26/hrDeviceEntry/11/hrProcessorLoad.jrb
WARNING: 1700132 Aug  6 02:09 26/hrDeviceEntry/12/hrProcessorLoad.jrb
Delete? (y/n): y

============================================
NODE: 613
LABEL: Switch1 ( A-WAN M-Networking P-StrafePing S-Production T-Switch )
TOTAL SNMP DATA FILES: 1295
Checking SNMP Data for any datums older than 7 days.
WARNING: 1700132 Jul 22 11:25 613/Vl1-0234c3d464c0/ifHCOutOctets.jrb
WARNING: 1700132 Jul 22 11:25 613/Vl1-0234c3d464c0/ifInOctets.jrb
WARNING: 1700132 Jul 22 11:25 613/Vl1-0234c3d464c0/ifHCInOctets.jrb
WARNING: 1700132 Jul 22 11:25 613/Vl1-0234c3d464c0/ifOutOctets.jrb
Delete? (y/n): y

============================================
strings.properties in dirs that have no jrb file:
./613/StackSub_St2_1/strings.properties
./613/StackSub_St4_2/strings.properties
./613/StackSub_St5_2/strings.properties
./613/Vl1-0234c3d464c0/strings.properties
./613/StackSub_St1_1/strings.properties
./613/StackSub_St3_1/strings.properties
./613/StackSub_St5_1/strings.properties
./613/StackSub_St2_2/strings.properties
./613/StackSub_St6_2/strings.properties
./613/StackSub_St3_2/strings.properties
./613/StackSub_St1_2/strings.properties
./613/StackSub_St4_1/strings.properties
./613/StackSub_St6_1/strings.properties

Delete strings.properties from dirs that have no jrb file? (y/n): y
./613/StackSub_St2_1/strings.properties
./613/StackSub_St4_2/strings.properties
./613/StackSub_St5_2/strings.properties
./613/Vl1-0234c3d464c0/strings.properties
./613/StackSub_St1_1/strings.properties
./613/StackSub_St3_1/strings.properties
./613/StackSub_St5_1/strings.properties
./613/StackSub_St2_2/strings.properties
./613/StackSub_St6_2/strings.properties
./613/StackSub_St3_2/strings.properties
./613/StackSub_St1_2/strings.properties
./613/StackSub_St4_1/strings.properties
./613/StackSub_St6_1/strings.properties

============================================
SNMP Dirs that are now empty:
./613/StackSub_St2_1
./613/StackSub_St4_2
./613/StackSub_St5_2
./613/Vl1-0234c3d464c0
./613/StackSub_St1_1
./613/StackSub_St3_1
./613/StackSub_St5_1
./613/StackSub_St2_2
./613/StackSub_St6_2
./613/StackSub_St3_2
./613/StackSub_St1_2
./613/StackSub_St4_1
./613/StackSub_St6_1

Delete SNMP Dirs that are now empty? (y/n): y

============================================
Response Data not updated in 7 days:
192.168.154.105/strafeping.jrb
192.168.37.249/icmp.jrb
192.168.37.249/ssh.jrb

Delete Response Data not updated in 7 days? (y/n): y

============================================
Response Dirs that are now empty:
./192.168.154.105
./192.168.37.249
Delete Response Dirs that are now empty? (y/n): y
#
Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter
Leave a Reply