Cleaning Up Old RRD Data (clean-rrd-data) August 16, 2010 No Comments
Issue
OpenNMS can collect a tremendous amount of SNMP and respose data. It stores that data in jrb files under $OPENNMSBASE/share/rrd/. Once created, OpenNMS will normally not delete these files, even when they are not being written to anymore. In my experience once a file has not been updated in a week it normally will not be written to in the future. Here are a few examlples of how this can occur:
- The OpenNMS administrator does not tick off “delete data” when a node is deleted.
- A hardware component of a node is replaced (eg. stack member of a switch stack or a disk in a server). In some cases these devices SNMP name include their hardware address or seial number. Therefore the SNMP data for the replacement hardware is put into new jrb files and the old files, with the old hardwares name, are left unused. (In this case you may want to copy the old data files over the new files so the history is not lost, however I rarely find a need to do this.)
- The OpenNMS config files are changed so that some data is no longer collected and therefore jrb files are left unused.
Assuming you don’t need it for reporting, this data only uses up disk space and slows down backups. It also shows up in the Resource Graphs page of the OpenNMS web interface along with the current data. This can lead to confusion. Users may assume data is not being collected correctly because some graphs show data and others are empty. They may only be looking at the last few days and don’t see the old data from the old files.
Policies that reduce the workload! No Comments
Issue
Any OpenNMS implementation (or any monitoring system for that matter) needs to ensure the correct nodes and services are monitored. Creating and maintaining a large scale configuration with hundreds of nodes and thousands of services is daunting. If nodes are missing then you risk being unaware of an important outage. If you mistakenly monitor a node that will be taken off-line without further notice then you risk a false alarm.
Regardless of the effort expended, a configuration of this size is never likely to be perfect so the question is, can it be made manageable? Can the workload be constrained to something reasonable while still keeping false alarms at a low level and ensuring important outages are detected? Until this issue is resolved to a reasonable level an organization is unlikely to trust and benefit from OpenNMS.
Setting up Scripts August 14, 2010 No Comments
Everything in the post below, except for verifying database access, can be completed by the noc-install-upgrade.sh script. See the post Install/Upgrade Scripts for more information.
All the scripts on this site rely on a common configuration so they can find the resources they need. They are designed to be run on the same system as OpenNMS. If you’ve already installed OpenNMS then you should be able to follow the steps below and be ready to run the scripts. A future post will talk about setting up CGIs and SHTML pages which require the Apache web server to be installed.
Welcome to OpenNMS Patterns and Scripts July 9, 2010 1 Comment
There is a lot of information I could post right away but I will try to make sure each post is clean and as generally applicable as possible. So things may come in bits and pieces but once complete, I hope they will provide a set to tools that will help anyone implementing OpenNMS.
If you have not yet done so, please read through the overview page.
- Doug