Can I get away without using discovery? - Application Management -
Can I get away without using discovery?

When I was at our European HP Software event before Christmas / The Holidays, I spent a good deal of time talking to people about our new product releases and the future of BSM. One customer looked a little worried and said, "wow - you seem to rely on discovery a lot".

I guess there are two things to say in answer to that observation. The first is "yes - because we rely on the service hierarchy model a lot". And the second is, "but there are a number of different types of discovery - and a number of them you already have".

 

So, in a two part post, I thought I'd answer that observation more comprehensively. So, let's first look at how we use inventory and service hierarchy information in the management of service health (and thanks Jon Haworth from the OperationsManager team for his significant help on this post):

 

  • It helps with administering the monitoring deployment of the managed environment. It tells us what is out there, what we need to manage, what has disappeared, and so on. This only requires discovery of the infrastructure inventory – "tell me what servers exist" (unless everything is virtualized, in which case it needs a lot more. The OperationsCenter team has posted on the new virtualization SPI recently at ITOpsBlog. This SPI discovers, and more importantly, continues to discover, virtualized environments).

 

  • It helps OMi to understanding the stream of events which are being detected in the infrastructure and applications. The hierarchy of the monitored items ("configuration items" or CI's) allows OMi to tell us which events are causal events and which are symptoms – what do we need to work on and what can we ignore. I talked about how OMi does this in a post last year.

 

  •  It allows all parts of the BSM stack perform service impact analysis. This is where events are related to infrastructure and applications and their impact or potential impact on the services above in the hierarchy is established. We can then use this impact information to prioritize the events.  Service impact analysis requires a model of the hierarchy of CI's and services.  Maintaining the service hierarchy manually is untenable -- things just change too rapidly for humans to keep up.  

 

  • When a disk has a set of read/write errors, is that catastrophic? If it's a single disk, then yes - the infrastructure element is in trouble. If it's part of a RAID array, then no -  provided the rest of the array is OK.  If we know the type of CI that we seeing events against, we can make better decisions about its true health.

 

This is also a new feature in OMi: when CI's are discovered we know their type. OMi ships with a database of health indicators for each CI type. For example, for single disks, it's a problem if the disk gets bad errors; if it's a RAID array, then provided a high percentage of the other disks are OK, this is not a serious problem; and so on.

 

This feature makes the calculation of the true health of CI much easier. You don't need to define a set of propagation rules. OMi uses the discovered CI type information and it's lookup table to figure out propagation itself.

 

This all ties into a new feature in OMi called "Health Indicators". Jon Haworth has promised to post on this on his team's blog at the OperationsCenter blog

 

  • Our top-down performance Problem Isolation software needs to understand the service hierarchy on which the end user application rests. For example, if I have a web user interface, I need to understand what services that user interface depends on. As I discussed in a post last year, problem isolation uses statistical correlation analysis to suggest the likely cause of such top-down performance problems.

 

  • We need the service hierarchy for defining SLAs. I may define a compound SLA that depends on a number of OLAs and a top-level measured SLA. The modeling user interface for this and the subsequent off-line SLA calculation is done based on the service hierarchy.

 

In the second part of the post, I'll talk about all the things that now populate the host inventory and service dependency map.  Hint: if you have SPIs, you'll like what we have to say :-)

 

Mike Shaw


Posted 01-16-2009 9:50 AM by adsey007

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems