Availability Management


Availability Management and IT Service Continuity Management highly collaborate, as both have the same aim: the highest possible service stability. While Availability manages the normal service delivery, IT Service Continuity Management takes over in case of disaster situations. 

Scope

Availability Management ensures that services are up and running without impairments. IT Service Continuity Management does the same in case of disaster situations and also plans for these cases:

The availability of services is one of the most important goods of any ITSM organization, as the availability is the key for customer statisfaction. Withou the given waranty for a service, it\’s utiliy cannot be realized. If an service is interrupted, customer statisfaction can still be achieved through proper plans for a quick service recovery. Improvement to service availability can only be achieved by understanding how services support the business.

During a service runtime there are different states a service can be measured by:

  1. Service Availability
    100 * (Agreed Service Time – Downtimes) / Agreed Service Time
  2. Maintainability
    Meantime between Failure
  3. Reliability
    Meantime between Failure / Meantime to restore Service
  4. Serviceability
    Meantime to restore Service
    • Measured by customers based on Service Level Agreements
    • Measured by ITSM Provider based on Operative Level Agreements

Activities

There are diffent activities which data, reports and plans are stored within an Availability Management Information System:

  • Re-active activities
    • Monitor, mearsure, analyze, report and review the availability
    • Investigate any non-availability and deduct actions
  • Pro-active activities
    • Risk Analysis and -Management
    • Economical Counter-Measures, e.g. clustering
    • Plan, design and review new or changed services
    • Test of availability-mechanisms
    • Continuous Service Improvement

For measuring the risk of service downtimes the following methods can be used:

  • Component Failure Impact Analysis (CFIA)
  • Fault Tree Analysis (FTA)
  • Service Failure Analysis (SFA)
  • Enhanced Incident LifeCycle (retrospectively) 

Critical Success Factors

The following items are examples:

  • CSF: Management of the availability and reliability of IT-Services
    • KPI: Reduction of non-availability and impact
    • KPI: Increasement of reliability
  • CSF: Fulfillment of business needs
    • KPI: Reduced overtime of on business side through service downtimes
    • KPI: Reduced service downtimes during critical timeframes, e.g. christmas sales
    • KPI: Increased business and customer statisfaction
  • CSF: Availability of infrastructure and applications – based on Service Level Agreements – is achieved economically
    • KPI: Reduced costs during service downtimes