Duplicate IP Addresses (dupipaddr.cgi) September 10, 2010 No Comments
Issue
OpenNMS supports more than one node having the same IP address. This is required in many high availability configurations so it’s not unexpected that OpenNMS allows it. However, erroneous duplicate IP addresses on the same network is a common configuration problem that is often hard to identify.
Besides real cases of duplicate IP addresses on the network, there may be the case where OpenNMS mistakenly has the same IP address associated with more than one node. This can occur when an IP address is first discovered on one node and then moved to a different node. OpenNMS discovers the IP address on the second node but does not automatically remove it from the first node. This can cause a serious error if an outage is detected but associated with the wrong node. This issue is also hard to identify.
For the reasons above, it’s important to have a clear view of what duplicate IP addresses are in the OpenNMS database. While OpenNMS allows searching for any single IP address and shows all the matching nodes, it does not directly identify duplicates. A simple list of all duplicates is needed.
SNMP Description Summary (descsummary.cgi) September 7, 2010 No Comments
Issue
OpenNMS captures the SNMP Description field which is a useful source of information, often containing device patch levels, vendor, etc. Searching for nodes based on their description or simply seeing a list of all the descriptions can be useful. OpenNMS does not directly provide a feature for this.
What’s to come… September 4, 2010 No Comments
There has been a lot of traffic over the last few days thanks to Tarus Balogs who posted a link to this site on his blog adventuresinoss.com. I’d like to thank Tarus and everyone who took the time to visit. I’ve enjoyed meeting several new people, such as Ronny Trommer who maintains the German blog www.open-factory.org.
A few people sent messages asking what they could expect in the coming months? As it says in the overview, I plan to document what I’ve learned deploying OpenNMS and post scripts I’ve written. I hope to have most of that done by early December (2010). At a high level, this will include patterns and scripts to:
- manage RRD data and mine that data for reporting purposes,
- create a custom web site that runs beside the OpenNMS web interface,
- report on many aspects of nodes in OpenNMS,
- a workflow for discovering and auditing nodes in manageable sized groups,
- a system that segments nodes into views for monitoring and notifications,
- and some other random stuff…
So stay tuned. I hope you will find something useful in the next few months (even it it’s just ripping some of the SQL out of my scripts to make your own). Feedback is always appreciated.
Cheers
- Doug
Setting up Custom Web Pages and Single Sign On September 3, 2010 4 Comments
Most of the scripts on this site output HTML so it’s easier to run them from a web server than on the command line. The following configuration is needed to serve the scripts through an Apache 2 web server.
The configuration below does the following:
- Proxies the OpenNMS web interface through Apache.
- Provides a document root to add custom .html, .shtml, and .cgi files.
- Provides single sign on for both the document root and the OpenNMS web interface.
- An unauthenticated area is provided for public content.
Small Update to Policies September 1, 2010 No Comments
An issue today reminded me of an important policy that was missing from Policies that reduce the workload! The following policy has been added:
Reduce monitoring system dependencies.
There’s no point in having a monitoring system if it goes down as part of another failure. Although it will not reduce your workload, running OpenNMS on a stand alone system will help ensure it’s available during a failure. I use a real server (not a VM) that’s dedicated to OpenNMS and other monitoring. It does not depend on any other systems to authenticate or operate properly. As long as the switches are operational, OpenNMS will work correctly. Heavy OpenNMS users hard code the server IP address on their desktop systems so DNS is not required to reach the OpenNMS web interface. Also, notifications are sent directly to cell phones in addition to the corporate email system in case the email system is not working.
Counting RRD Data (count-rrd-data) August 26, 2010 No Comments
Issue
Managing jrb files is an important factor in large OpenNMS deployments. OpenNMS can collect a tremendous number of jrb files which contain SNMP and response data. SNMP jrb files are created as a result of alias definitions in $OPENNMSBASE/etc/datacollection-config.xml and response jrb files as a result of ds-name definitions in $OPENNMSBASE/etc/poller-configuration.xml. It’s useful to know how much data each definition is responsible for as part of managing disk space and IO load. OpenNMS does not provide a way to do this and the directory structure under $OPENNMS/share/rrd/* is too complex to do this manually.
Node State – Controlling Outage Response August 24, 2010 No Comments
Issue
Let’s define the term Node State as: an indicator of the level of response required by an outage. If a node provides a business function to clients then we can say it’s in a Production state which implies OpenNMS should respond to outages in a timely manner. In most environments, the Production state is the most common state but it’s not the only state. For example, nodes may be in a Maintenance state which implies their outages are expected and no response is required. There could be another state to indicate a node is producing outages because of Monitoring Issues in OpenNMS or the network, as opposed to something related to the node. A node may move back and forth between these states. You might also have Other nodes that need to be monitored for data collection purposes but should never have notifications generated. Other nodes, such as test or development nodes, would never be in the Production state.
Node state is a common issue in enterprise IT processes, such as the ITIL Change Management process. It’s important that OpenNMS know what state a node is in and is able to use node state to determine what level of response is required by an outage. I can’t emphasize enough the power of having a clear solution for Node State. Although this post will concentrate on OpenNMS’s response to outages, node state can also play an important role in the creation of views and reporting. Any solution for node state will need to be flexible enough to meet future requirements. Later posts will explore the use of node state for these uses.
Automating Node Type Categories (set-cats-from-snmp) August 18, 2010 No Comments
Issue
It’s often useful to filter nodes by type, such as servers, switches, UPS, and storage. You can set up categories and assign nodes manually but it would save time if node type was automatically detected, which OpenNMS does not do.
Testing Node SNMP (testsnmp) 4 Comments
Issue
It’s not uncommon for a large organization to use several different SNMP read-community strings for various nodes. Often the various strings do not correspond to specific network ranges which makes it harder to determine which string should be used for which node. To determine the correct string, the OpenNMS administrator must contact the node administrator or test each possible SNMP read-community and version, both of which are time consuming.
A System for Node Categories August 17, 2010 No Comments
Issue
In a large scale OpenNMS deployment nodes need to be put into categories. Categories allow for inclusion or exclusion from reports, web pages, notifications, etc. Categories may relate to many different node attributes, such as who manages a node, node type, monitoring frequency, etc. OpenNMS provides no specific structure for categories and has several distinct ways of storing and managing categories:
- From the OpenNMS node page, you can click on Asset Info and find the Category field. The default OpenNMS configuration lets you assign a node to one of several types, such as Server, Infrastructure, Telephony, etc…
- There are also categories defined in the $OPENNMSBASE/etc/categories.xml file. These categories can be used to define views on the OpenNMS home page with the help of the $OPENNMSBASE/etc/viewsdisplay.xml file. They also appear as a filter for the OpenNMS 1.6 Availability Report (OpenNMS 1.8 has changed its reporting structure).
- Finally, from the node page, you can edit a node’s Surveillance Category Memberships. Surveillance categories can be created/deleted from the Manage Surveillance Categories link under the Admin page. These categories are designed to filter Surveillance Views which can be viewed by clicking on the Surveillance link.
This raises several questions, such as:
- When to use each way of adding a node to a category?
- How to manage category membership rules, such as Exclusive OR (eg. 3 categories which every node must be a member of only one)?
- How to ensure categories are not ambiguous? Does the category “Server” mean the node is a server or is managed by the server group?
- Can category membership be automated so the workload can be reduced?
- How do we implement a hierarchy of categories (sub-categories)?