Small Update to Node Details (nodedetails.cgi) December 31, 2010 No Comments

Someone pointed out that the Node Details (nodedetails.cgi) report would be more useful if it only displayed services that are being monitored on a node, as opposed to all the services that have been discovered by OpenNMS. After running this by a few of my users, all agreed that it was better to hide services that are not monitored. A small update has been make to the base function file (base.func) and the Node Details (nodedetails.cgi) script. The noc-install-upgrade.sh script will perform the upgrade or you can do the upgrade manually.

As you may have noticed, not much has been updated on the blog latley. More will come as time allows. Keep the ideas coming!

Cheers
Doug

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Auditing Node Configuration November 15, 2010 No Comments

Issue

In a large scale deployment it becomes difficult to know if nodes are configured correctly in OpenNMS. As with any large configuration, things get missed and small mistakes can lead to critical holes in monitoring. The scripts on this site help maintain node configurations but some aspects of a node can not be checked or updated programmatically. Someone who knows the node must audit the configuration to ensure it’s correct. Often this requires input from staff other than the OpenNMS administrator. Staff find this tedious at best and want to spend as little time on it as possible. A well defined process is needed to effectively audit the nodes in OpenNMS in a timely manner.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Node Details (nodedetails.cgi) October 7, 2010 No Comments

Issue

I often want to review important aspects of nodes in OpenNMS including: SNMP status, monitored services, IP addresses, group membership, data links, and critical path settings. The OpenNMS interface does not provide an easy way to quickly review these items in bulk. Some of this information can be viewed on the OpenNMS node page but a list view would be more useful when looking for configuration errors.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Script Speed vs. Impacting OpenNMS October 5, 2010 No Comments

I’ve been asked a similar question by two different people so I figured a public response was in order. After looking at the scripts on this site they noticed that some of them could be changed to run much faster. Most of the speed increase would come from performing a single large SQL statement and parsing the result as opposed to many smaller SQL statements. So the question is: Why are the scripts built the way they are?

In fact, the scripts where first written as these folks have suggested. The problem was that on a busy OpenNMS deployment the single large SQL query could cause hiccups. While the impact was not major it was noticeable.

One solution would have been to optimize the hardware and database but there is no guarantee the hiccups would not come back when OpenNMS is upgraded or more nodes are added. Since the slow CGI scripts contain information only used intermittently, it made sense to simply break their workload down into smaller SQL statements. This causes them to run slower but does not put a single large hit on the system. The results are slow but safe.

Specifically knownames.cgi and commentsforced.cgi can take minutes to run if there are 1000+ nodes in your OpenNMS database. Shortly, I will also post a nodedetails.cgi script which will have similar performance. I agree that these 3 scripts feel slow even when run on a high performance system. Although these 3 scripts currently report on all nodes at once, I rarely use them this way. Future posts will show how to separate an OpenNMS installation into views where nodes can be reported on in smaller groups. With smaller groups the speed difference is not noticeable.

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Comments / Forced Unmanged (commentsforced.cgi) September 24, 2010 No Comments

Issue

The OpenNMS poller-configuration.xml file controls what services are monitored. Fine grain exceptions to the poller-configuration.xml can be created using the OpenNMS web interface: node admin page -> Manage and Unmanage Interfaces and Services. This allows monitoring of a service or an entire interface to be turned off when the poller-configuration.xml would normally cause it to be monitored. These services and interfaces can be seen on the node page as “Forced Unmanged”.

I recommend that Forced Unmanaged be used sparingly however it’s required in several situations (see note below). The problem is that while the poller-configuration.xml file provides a global view of what will be monitored, there is no way to view all the exceptions (Forced Unmanaged) in one place. Also, when the administrator marks a service or interface with Forced Unmanged, there is no record of why.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Known Names (knownnames.cgi) September 23, 2010 No Comments

Issue

OpenNMS stores name information from several sources in its database but only makes some of it available in the web interface. It’s useful to see all this information in one place for auditing and searching. It’s also important to check which name (Manually set by administrator, DNS, NetBIOS (SMB), SNMP, or lowest numbered IP) OpenNMS has chosen for a node’s label and have quick access to edit it

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Install/Ugrade Scripts September 18, 2010 No Comments

Install/Upgrade ScriptsThe noc-install-upgrade.sh script will install or upgrade all the scripts from this site onto a Linux system. It’s only prerequisite is that OpenNMS be installed first. Also, I recommend reading the information on this site before installing.

It’s not required to install the scripts in this way. If you only want a small subset, the scripts can be downloaded and used individually. Refer to individual posts for details.

The noc-install-upgrade.sh script creates the /etc/noc.conf file and downloads the contents of $NOCBASE (/opt/noc by default). It does not test database access, configure Apache, or alter scripts to match your environment. After running noc-install-upgrade.sh for the first time you will need to do the following:

  1. Verify the scripts can access the OpenNMS database. Refer to the “Verify Database Access” section of Setting up Scripts.
  2. Optionally, enable the HTTP scripts as described in Setting up Custom Web Pages and Single Sign On. Don’t worry about the “Create Web Directories” section. It’s completed by in the install script.
  3. Refer to the Scripts page for information on individual scripts. Some HTTP scripts may not function correctly unless specific configurations are created inside OpenNMS. The scripts in $NOCBASE/bin/ should not be run unless you understand what they do. Some will alter the OpenNMS configuration, removing configurations you may want to keep. Others need to be modified to match your environment before they’ll produce useful output.

The noc-install-upgrade.sh script installs itself under $NOCBASE/bin/. It can be run later to upgrade or install new scripts. The upgrade process will not overwrite customizations you’ve made (but backup first anyway), instead saving the new files with a .new extension. See install and upgrade examples below.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Check DNS Reverse Lookups (check-dns) September 16, 2010 No Comments

Issue

OpenNMS stores DNS reverse lookups for the IP addresses it discovers (iphostname column of the ipInterface table). It appears that OpenNMS only performs the DNS reverse lookup once and stores the result. Therefore, OpenNMS may never be aware of a DNS change. The DNS reverse lookup is visible in the OpenNMS web interface on the node page as “IP Host Name”. An outdated entry here can cause confusion. A way is needed to keep this information up to date.

More serious problems can occur when OpenNMS uses DNS reverse lookups for node labels (see How are node labels determined? for more information). In the worst case, a node can have the wrong name and outages will appear to be on a different node then they actually are. Here’s one way this can happen:

  1. OpenNMS discovers a node with IP address 192.168.1.20 and DNS reverse lookup of server20. It sets the node label to server20. This is correct and everything is fine.
  2. Now the IP address 192.168.1.10 is added to server20. 192.168.1.10 has the reverse DNS lookup of devserver10 since it was just removed from a server with that name. On the next rescan OpenNMS sees the new IP and changes the label of server20 to devserver10! This follows the node naming convention described in the link above. There are now two nodes named devserver10.
  3. Shortly after, the DNS administrator updates the DNS reverse lookup of 192.168.1.10 to server20. OpenNMS has already stored the old name and it never updates it. If the OpenNMS administrator was not involved then there is no reason anyone would know something is wrong.
  4. At some point in the future a failure on 192.168.1.10 or 192.168.1.20 will cause an outage to be created with the node label devserver10 but it should be server20!

The above scenario may seem obscure but its happened to me twice before I found the solution below.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Statistics (stats) September 14, 2010 1 Comment

Issue

As with any large system, statistics are a useful management tool. OpenNMS gives the number of nodes at the bottom of the Node List page but few other statistics are available. In particular, knowing how many interfaces and services have been discovered and are monitored would be useful. The number of active jrb files is also key to IO tuning.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter

Automating Critical Paths (set-critical-paths) September 10, 2010 No Comments

Issue

If you’ve ever had a router go down and been flooded by outage notifications for devices behind the router then you probably looked into using OpenNMS’s critical path outage feature. Critical path outages stop notifications about a node if another IP address is also not available. If properly configured, a router failure will only produce one notification, the one about the router, and suppress notifications about nodes in the network behind the router.

The problem is that creating and maintaining critical path settings in OpenNMS requires significant effort. There are three methods to set critical paths:

  1. The first is from a node’s OpenNMS page by clicking on Admin -> Configure Path Outage. This allows you to change the critical path outage for a single node. This is not practical if you have more than a handful of nodes.
  2. The second option is from the main Admin page, click on Configure Notifications -> Configure Path Outages. This allows you to define a critical path IP and then define a rule which selects nodes behind the critical path IP. This method is more efficient but has a major shortcoming: it does not remember the rule. Future nodes added to OpenNMS will not have the critical path automatically assigned to them. You must record your rules outside OpenNMS and routinely copy and paste them into the interface to keep critical paths up to date for new nodes. Also, if nodes are moved between networks their critical path may be incorrect until they’re manually updated.
  3. The final method relates to provisioning groups. It’s a somewhat better solution but is complex. When mixed with other methods of adding nodes all the problems above still exist.

What’s needed is a rule based method for managing critical path settings that remembers rules and can be run on a regular bases.

Read more / Read the solution »

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Slashdot
  • StumbleUpon
  • Twitter