NNMi Application Failover is a new feature introduced with NNMi/NNMi Advanced 8.11. NNMi Application Failover functionality gives you the ability to setup two NNMi management stations, namely Primary/Active node and a Standby/Backup node. If the Primary management server crashes, the Backup server will automatically startup and resume NNMi functions (trap handling, analysis, discovery, polling, etc.). The two NNMi systems (Active and Standby) monitor each other using a “heartbeat” signal over the network. If the Active node fails (loss of heartbeat), the Standby node will automatically start NNMi. The Active as well as the Standby server instance uses the embedded Postgres database by default.
Note: Application Failover feature monitors system availability, not the availability of NNMi instance itself; i.e. if NNMi instance crashes or is killed, but the system remains online, Application Failover will not detect this.
How does this work?
- The Active node performs periodic database backups
- These backups are sent to the Standby node.
- Between backups, periodic database transaction logs are created and immediately sent to the Standby node.
- If the Active node crashes or is shutdown, Standby becomes the New-Active node
- The New-Active node starts NNMi, including the database server.
- The database server imports all the transaction logs and is available to NNMi for requests.
Does the Application Failover Support WAN?
The 8.11 release only supports LAN (same subnet). Going across a router is not supported at this time. Future releases will look into adding support for WAN.
What is the compatibility matrix for the active server and the standby server?
- OS-consistency across the active and standby server is enforced. Both server need to be HPUX-to-HPUX, Linux-to-Linux, etc. Mix mode OS configuration such as Windows-to-Linux is not supported.
- NNMi version and patch level needs to be the same across both servers
- Licensing needs to consistent. If license (capacity or features) on Standby is less than Active, nodes will get unmanaged on failover. Eg. if the Standby node has lower node count and/or a subset of features (iSPI-NET, iAdvance, etc.) then when that node becomes Active, those features will be disabled and/or nodes will become unmanaged.
- Both servers need to have the same “system” password
How are the 2 servers licensed?
The main Active server will need to have a production license equivalent to the number of nodes that it needs to managed. Standby uses non-production license. The node counts for the Active server and Stand
What if I want to use another external database such as Oracle instead of Postgres?
Application failover is not supported on any external database like Oracle and is only supported with the internal Postgres Database. If there is a need to use an external database such as Oracle, other supported clustering technologies such as MC ServiceGuard for HPUX or Microsoft Clusters (MSCS) for Windows or Veritas for Linux can be used.
Is the application level failover supported for NNM iSPI products?
All the iSPIs (except iSPI Performance) get “free” support to the Application Failover feature as long as they follow the following two conditions:
- They use NNM’s Postgres database (then their data is replicated along with NNM’s data)
- They use ovstart/ovstop to start/stop their services
iSPI NET as well as the iSPI for Performance do not support Application Failover feature.
In order to avail the Application Failover support of NNMi with iSPI Performance, the following could done:
- iSPI Performance must be installed on a 3rd node (same subnet), without NNMi (cannot co-exist on the same server)
- The two NNM nodes both run the script “nnmenableiSPI Performance.ovpl” to point to iSPI Performance station
- iSPI Performance is (initially) configured to point to currently-Active node
- iSPI Performance detects an NNM Cluster environment, and periodically polls “who-is-current-Active?” using a special command.
- When the Active node changes, iSPI Performance reconfigures itself to point to the new-Active node.
Do we support Oracle RAC?
Oracle RAC support is not available at this time.
I am Aruna Ravichandran, the Product Marketing Manager for NNM/NNMi/iSPI products within the Network Management Center for HP Software. I have been with HP for 13 years. I started my career as an engineer in the HP-UX kernel lab, moved on to do application development and was an architect in the High Availability/Clustering lab for couple of years. I then wanted to experiment the “darker” side of the business and moved to product management/Product marketing 5 years ago and marketed Storage products – high end disk arrays (XP) followed by Security Marketing where I created a secure appliance solution for enterprise log management and took it to market. I recently joined the Business Service Management (BSM) organization of HP Software. I have to say that I am still a “techy” at heart, though I totally love the “darker” side of the business.
We have to other conversations you can join. For Operations ITOpsBlog and for Business Service Management the BSMblog.
Posted
02-16-2009 3:41 AM
by
Michael_Procopio