September 21, 2023

ITIL incident management process in 5 steps

IT incidents are more than just an inconvenience — they cost money and interfere with business operations. And tech teams are often asked to do more with less, facing both increasing technological complexity and budget constraints.

While IT incidents are inevitable, proper management can make a huge difference in maintaining uptime and retaining happy customers. With a well-thought-out ITIL ‌incident management process, your support teams can streamline and automate incident resolution.

In this article, we’ll discuss the roles, best practices, and steps involved in the ITIL incident management lifecycle, including how to conduct a post-incident review. Let’s get started!

What is ITIL incident management?

IT incidents can impact your businesses’ critical services, undermining customer confidence and lowering profit margins. In fact, companies lose $75 billion annually when customers switch brands due to poor customer support.

IT incidents may include network failure, application downtime, or even software incompatibility. Such disruptions can have a negative impact on a single user or on the entire business. For instance, underestimating resources could create architecture bottlenecks that cause performance to suffer for the entire enterprise, especially during peak traffic events.

Defining ITIL incident management

To mitigate incidents like these and prevent unexpected outages, many businesses rely on resolution strategies defined by Information Technology Infrastructure Library (ITIL) incident management framework.

The most efficient incident management teams use a formal ticketing system and a specialized process flow. But every organization may approach incident management differently. The ITIL framework does not have to be a rigid, one-size-fits all approach. Rather, it provides a defined set of best practices that you can use to help build out the incident management processes that best serve your needs.

Remember that the ultimate goal of incident management is user-focused, aiming to minimize downtime and restore service operation as per ‌established service level agreements (SLAs) whenever possible.

Common roles involved in the ITIL incident management lifecycle

While you can define your support team’s roles and responsibilities, the following are the most common ITIL incident management roles.

Roles in the ITIL incident management lifecycle

End user or incident reporter

The end user experiences a service disruption and submits a ticket to convey the details of the problem to the service desk agent. They typically communicate issues via web forms, live chat, phone calls, emails, or SMS.

1st line support

The 1st level support role includes the service desk agents who serve as the first point of contact for the end user. They’re involved in incident identification, categorization, and prioritization and should have a working knowledge of incidents and processes.

They help to resolve common IT incidents like network failure, and can escalate unresolved issues to 2nd level support teams.

2nd line support

The 2nd level support role includes technicians with advanced knowledge of the relevant IT systems and ITIL incident management processes. They receive more complex requests as escalations from 1st level support agents. If they can’t resolve the incident, they escalate it to ‌3rd level support or the IT problem management team.

3rd line support

The 3rd level support role includes specialist technicians with advanced knowledge of specific areas of the IT infrastructure. For example, Network Operations Center (NOC) technicians check network infrastructure for anomalies or interruptions and handle performance-related incidents.

They receive escalations from 2nd level support agents and, if necessary, can escalate them to external support. External support options may come from specific vendors, or from third party support (3PS) providers like Spinnaker Support, who generally offer more cost-effective and comprehensive break/fix solutions than vendor support teams.
Incident manager

The incident manager’s role is to oversee incident management processes throughout an organization to keep ITIL processes functioning smoothly. They’re responsible for assessing proficiencies and guiding the right resources to the right places so incidents can be resolved effectively.

Process owner

The process owner’s role is responsible for designing, analyzing, and improving specific ITIL processes. They assess processes using key performance indicators (KPIs) and manage improvements to ensure that processes best serve the business goals.

The ITIL incident management lifecycle

Incidents happen. Proactive monitoring and thoughtful system design can help prevent issues before they start, but sooner or later, you’re bound to encounter a bug, human error, or hardware failure. To best mitigate these issues when they do happen, though, you can follow the five steps defined by the ITIL incident management lifecycle.

The ITIL incident management process

  1. Incident identificationFirst, incidents are reported through phone calls, emails, web forms, SMS, live chat, or walk-ins. The service desk team determines if the report is an incident that requires technical help, or simply a service request such as password retrieval or opening a new account.
  2. Incident loggingNext, the service team creates an incident log and records the incident details. Generally, the incident log should include:
    Name and contact of the requester
    Date and time of the report
    Unique identification number
    Detailed description of the incident
  3. Incident categorization
    In the third stage, the incident management team performs incident classification. The incident log is assigned a logical category or subcategory based on the area of business or IT it affects.
  4. Incident prioritizationDepending on the urgency and severity of the incident, incident logs are sorted into either high, medium, or low priority. This allows incidents to be addressed in order of importance.
  5. Incident responseThe final stage of the incident management lifecycle is the technical response. Because this stage is more complex and varied, it is broken down into substeps.The ITIL incident management process5.1. Initial diagnosis

    At this stage, a service desk agent (1st level support) tries to quickly diagnose the problem before routing it to the relevant personnel.

    The service desk team is equipped with predefined troubleshooting templates, diagnostic manuals, knowledge bases, and flowcharts. They’ll try to fix any minor incident before escalating it to the relevant teams.

    5.2. Incident escalation

    When service desk employees can’t resolve an issue, they’ll notify a higher-level technician. The next team (2nd level support) uses logged data to diagnose the incident. If the incident is something they are qualified to handle, they will move on to next steps themselves, or for severe issues they may choose to immediately escalate to
    the next team (3rd level support).

    5.3. Incident investigation and diagnosis

    At this stage, the designated support team investigates the nature of the issue, such as by verifying a service outage or reproducing a bug on a a website. They check to see if their initial diagnosis is accurate before diving into potential resolutions.

    Following a thorough diagnosis, your team can formulate an approach to fixing the problem. For example, if the incident log noted a system outage, a deeper investigation might reveal that cyber attacks like data breaches or DDoS attacks were to blame for the outage.

    If a team encounters difficulties beyond their skillset, they can return to the escalation step and pass the issue to higher-level support personnel to diagnose and investigate.

    5.4. Incident resolution

    After you identify the issue, you’ll move forward to implement the solution. The service desk records all relevant details and verifies service restoration as part of incident reporting.

    5.5. Incident closure

    After incident resolution, the service desk is notified, and the ticket is closed. The service desk double-checks with the end user to ensure normal operations have resumed before closing the incident. Next, your support team reviews and analyzes response documentation, including how the incident management process occurred.

Post-incident review process

It’s important to conduct a post-incident review of the response strategies so as to get an insight into what to improve on. The post-incident review procedure involves rigorous internal processes review and external user-satisfaction rate evaluation.

Internal evaluation

Performing internal evaluations is a great way to plan and test your incident management workflow to improve the resolution of future incidents. About 51% of organizations want to spend more on security — including incident responses — and proper evaluations will help ensure these resources are allocated where they can best help.

You should conduct an internal evaluation of your IT support team to determine:

  • Mean time to acknowledge (MTTA): How quickly did the end user discover the incident, and who made the discovery?
  • First call resolution rate: How many incidents were resolved within the first call?
  • Average response time: How quickly were stakeholders notified of the incident, and what channel did the support team use to relay notifications?
  • Mean time to resolution (MTTR): What was the incident resolution rate? Were there any unresolved incidents?
  • Cost per ticket: How were ‌ incident resources employed? How much time and money does it cost to resolve each incident?
  • SLA compliance rate: How was the incident response team initially organized, and did they maintain the predefined incident management processes? Were the service level agreements adhered to? Were there any reports made to evaluate the incident response?

External evaluation

You can perform an external evaluation by gathering user feedback to help you determine:

  • User satisfaction rate: Was the end user satisfied by the incident resolution process, and were they served on time?
  • Average resolution time: How long did your support team take to resolve an issue?
  • Average initial response time: How long did it take for your support team to respond to initial incident reports?

Proper post-incident review can help you identify opportunities to optimize your incident monitoring and response for better performance, which ultimately serves your business goals.

How to optimize your ITIL incident management strategies

Here are the best practices for ITIL incident management that’ll help you reduce costly downtimes on your business services.

Tips for handling IT incidents

Set up multiple request and communication options

One of the best ways to improve your ITIL incident management processes is to provide several options for customers to submit requests for help. Some customers may prefer text over voice, for example.

It’s best if these options are integrated rather than siloed. By sharing records such as customer identity, you can remove friction from the process. For example, around 40% of consumers who use three or more conversation channels to contact customer service say they get frustrated with re-identifying themselves.

You can also streamline this step by using company-specific IT incident forms for efficient data collection.

Implement and test an ITIL incident workflow

You can define ticket criteria to classify incoming incidents and auto-assign them to appropriate endpoints based on an incident priority matrix. You can employ load balancing and round robin algorithms to automatically assign multiple tickets to multiple experts at the same level of expertise.

You’ll need to create specialized workflows to manage major incidents and incorporate IT service level agreements with ticket parameters such as priority.

Make sure you verify with the end user and use the correct closure codes to mark incidents as resolved after they’ve been properly resolved.

Maintain an information base

You can ensure faster resolution of major incidents through proper documentation. IT incident tickets should be linked to the appropriate documentation highlighting all the IT assets, previously resolved IT problems, and any asset changes.

Your support team should build and update a repository of commonly reoccurring IT incidents to ensure timely resolution and improve customer experience.

Opt for third party support and managed services

During incident resolution, the odds are that hierarchical escalations may require external technical assistance. You can resolve IT incidents through a third-party support provider like Spinnaker Support with swift responses and proven ISO-certified processes.

With Spinnaker Support, high-priority incidents — as determined by you, not the software vendor — are guaranteed to be answered within 15 minutes. Our SLAs guarantee equally fair response times for low-priority incidents, and issues are handled by our team of ITIL Level 2 and Level 3 engineers.

For an even more robust option, you can opt for Spinnaker’s managed services. This allows you to offload IT operations, freeing your internal teams for other mission-critical tasks. Managed services can cover both traditional IT ticketing as well as more proactive system monitoring and maintenance, like patching and upgrades to keep your critical systems running at peak performance.

Conclusion

Service outages can be expensive for a business and, in turn, affect employee productivity and daily IT operations. Through ITIL incident management processes, you can expedite the resolution of issues, giving your business operational value‌ in the long run.

If you’re struggling to optimize your ITIL incident management, consider outsourcing to a managed services provider like Spinnaker Support.

You’ll get cost savings and support for your entire IT infrastructure backed by a global engineering team. To get started, contact a Spinnaker Support expert today.