Making the Best Use of the Tools You Have - Infrastructure Management Software Blog -
Making the Best Use of the Tools You Have

I spent the day today at our executive briefing center with a customer that provides online spend management services. They have a number of our data center products, so we spent most of the day discussing integration. The agenda was simple. They wanted to learn how to:

  1. Make the best use of the tools they have
  2. Determine where they should be looking next

First, let’s examine their environment, which in turn drives their requirements.

Hardware
They have a variety of hardware, some HP servers and storage and some from other vendors. Some of this hardware is at the end of its lifecycle and in line for replacement. They want to dramatically reduce their power consumption with the replacement servers. Part of the savings will come from consolidation onto fewer, more powerful machines; part from newer hardware that is more energy efficient. We did not discuss replacement hardware explicitly today, but one topic of concern was that their current infrastructure management software must have the flexibility to manage future hardware purchases.

Virtualization
They are very interested in moving aggressively towards virtualizing most of their IT infrastructure. They want to be able to manage both physical and virtual servers and storage using a single set of instrumentation.

Software
Service Desk. They are using most of the modules within HP Service Manager. This allows them to manage their help desk efficiently and track changes throughout the enterprise.
Configuration Management Database. They have HP’s uCMDB (“u” for universal), which manages all the configuration items (CI) within their enterprise. In an effort to streamline their operations, they also purchased our Discovery and Dependency Mapping (DDM) software to automatically discover their IT infrastructure and populate the CMDB. The CMDB is the foundation layer that ties together all the components within our Business Technology Optimization suite. In addition to maintaining state and configuration information about individual CIs, it understands relationships among them, and how these align with business services.
Network Management. They use an open source network management software.
End-User Monitoring. They use a commercial product (non-HP). They purchased it last year to replace home-grown scripts. It runs synthetic scripts (similar to our End User Management software)
Operations Manager. They just purchased Operations Manager, along with agents, but have not yet deployed it. The prospect of consolidating several existing management consoles was one of the main reasons driving the purchase.

Presenting a Complete View of the IT Infrastructure
We started with the usual slide presentations that showed all the nice relationships among the products. Of course, heads nodded in agreement when we mentioned self-inflicted IT problems, the finger-pointing among groups during troubleshooting, and the challenge of seeing everything through a single console.

The key problem emerged that they lack a holistic view of the entire environment. Fortunately, once they deploy Operations Manager, this will solve the problem. It provides a “single pane of glass” in which they can view events from across their entire infrastructure, including the non-HP servers, non-HP network management, and non-HP user monitoring, in addition to all their HP hardware.

Generate (Enriched) Service Tickets from Events
And, Operations Manager can automatically open tickets in Service Manager. In addition to opening tickets based on events, Operations Manager enriches the events with all the relevant information from the CMDB including the affected business service. Once the incident is closed, either manually or automatically, Operations Manager will clear the event in its console and then tell Service Manager to close the ticket .That wrapped up the section on making the most of what they already have.

Automation Cuts Costs
Then, things got really interesting when we went to the white board. We outlined how much money they can save by implementing Operations Orchestration, our runbook automation solution, to automate some of the routine actions an operator would perform using Operations Manager. We used an example of another customer who saved $400K per year just by automating a database fix that takes only one minute to fix. That problem occurs 400 thousand times per year. At $1 per minute for support costs, do the math.

This paints a clear picture of where they should be looking next. And, all the discussions were based on released technology that is available to anyone today.

Let us know how you are making the best use of the tools you have. We’ll give you some expert guidance about what steps to take next that will further increase the return on your investment in infrastructure management software.

For Operations Center, Peter Spielvogel.


Posted 02-26-2009 5:27 AM by pspielvogel

Comments

William Vambenepe wrote re: Making the Best Use of the Tools You Have
on 02-26-2009 6:48 PM

A problem that occurs 400,000 times per year? That means 45 times per hour, or every 78 seconds. If it takes one minute to fix, that means that before "operations orchestration" it was being fixed around the clock almost non-stop.

This invokes the image of an operator holding the server in his/her arms year-round instead of putting it in a rack.

pspielvogel wrote re: Making the Best Use of the Tools You Have
on 02-27-2009 1:05 AM

Thanks for the analysis.

For the example I used of 400,000 annual database events, the customer is a large enterprise with many thousands of servers under management. As you pointed out, this works out to 45 occurrences per hour. But, if they are managing 45,000 servers (their actual number is much higher, but this keeps the math simple), then the event is occurring once per hour per 1,000 servers. And, the customer has a very large front-line staff to deal with events. So, it is not as burdensome as would seem. (Although I liked the image of the operator cradling the server.)

There is a deeper implication. For the error in question, the response is very simple, that’s why it takes only a minute to fix. But, if operators start to ignore the problem because it is usually benign, they run the risk of missing the few occurrences that are serious and will lead to a costly outage. Even if 99% of the time there is nothing going on, what organization can afford to ignore 4,000 alerts (1% of the 400,000) that have the potential to disrupt a core business service?

With this in mind, it really makes sense to implement a runbook automation solution that can address every event (after they have been consolidated and prioritized in an event correlation console) to minimize the risk of an operator ignoring a catastrophic alert.

Peter

Joannah wrote re: Making the Best Use of the Tools You Have
on 03-21-2009 8:26 AM

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Joannah

Powered by Community Server (Non-Commercial Edition), by Telligent Systems