How to measure the success of ITSM tools and processes?

Effective IT Service Management (ITSM) processes and tools are critical to delivering quality IT services that meet business objectives and satisfy end-users. On the other hand, inefficient, ineffective, or outdated ITSM processes and ITSM tools can lead to a range of problems, including poor customer satisfaction, high IT costs, IT security risks, and inaccurate or incomplete data.

In this article, we’ll explore the key metrics and KPIs to measure the success of ITSM processes and ITSM tools and discuss best practices for setting and analyzing those metrics. By the end, you’ll have a clear roadmap for measuring and improving the effectiveness of your ITSM processes and tools.

Importance of Measuring Success in ITSM

Measuring the success of ITSM processes and tools is crucial for several reasons:

It Provides insight into how well the organization is meeting its service level agreements (SLAs) and customer expectations.

Tip: Monitor performance against SLAs and adjust to maintain customer satisfaction.

Enables the organization to identify trends and patterns in ITSM performance over time, which can inform strategic planning and decision-making.

Tip: Use data analytics tools to identify trends and patterns in ITSM performance. Use the findings for resource allocation and investment prioritization.

Helps quantify ITSM investments’ value and demonstrate the return on investment (ROI) to stakeholders. Also, it Facilitates benchmarking against industry standards and best practices.

Tip: Communicate the ROI of ITSM investments to stakeholders clearly and stay up-to-date with ITSM industry standards and best practices.

Key Metrics and KPIs for Measuring ITSM Success

It’s important to use metrics and key performance indicators (KPIs) to measure ITSM processes and tools’ success effectively. However, these must be relevant, measurable, and aligned with organizational goals.

Here are some key metrics and KPIs that can be used to measure ITSM success:

Time to Resolve

Time to Resolve (TTR) measures the average time taken to resolve an IT incident once it is reported to the service desk.

The faster an issue is resolved, the faster the customer can resume work, resulting in higher satisfaction levels. So it’s directly tied to customer satisfaction. TTR also indicates how effective and efficient ITSM processes are and can help organizations identify bottleneck areas in the support process. A few ways the TTR can be improved are-

Bringing automation in incident and problem management where possible.
Providing self-service options for end users.
Improving knowledge management
Providing training to IT staff to improve their technical skills and knowledge

Things to remember:

While TTR is an important metric to measure, remember that it can be easily ‘tricked’ by closing calls too quickly without confirming with the customer that everything is working as expected. This can result in misleading TTR numbers, which may appear well on the surface but ultimately lead to lower customer satisfaction. This phenomenon is known as “watermelon metrics” – where the dashboard shows green (good performance), but the actual outcome is red (poor performance).

Service Availability

Service Availability refers to the extent to which an IT service is available for use as defined in the service level agreement (SLA).

This KPI is typically calculated as a percentage based on the agreed service time in the SLA and downtime. If a service is unavailable, it can cause downtime, leading to lost productivity, revenue, and customer dissatisfaction. And because of its direct impact on end users, it’s an important metric for demonstrating the value of IT services to the business.

Things to remember:

While choosing this KPI, it’s important to consider how downtime is defined and reported. Many organizations report the total downtime for the month, which could hide the number of service disruptions experienced by customers. For instance, frequent interruptions in a critical service can be frustrating and result in productivity loss, even if each interruption is short.

So even if you find that the total disruption is within the SLA limits but it’s leaving customers dissatisfied. Therefore, reporting the total downtime and the number of service disruptions is recommended to accurately represent service availability.

Customer Satisfaction (CSAT)

This metric sheds light on how satisfied end-users are with the quality of IT services provided through ITSM processes. The usual way to access it is through feedback mechanisms and periodic surveys. It provides insight into how well IT services are meeting end-user needs. The higher the CSAT score, the more satisfied end-users are with the service quality they receive. And mostly the ultimate objective of most ITSM processes in some way is to improve this metric.

Like the watermelon metrics previously discussed, other metrics may look good (green), but inside, your customers may not be happy.

Things to remember:

Hence while choosing CSAT as a KPI, organizations should

Choose the right survey methodology
Establish a benchmark for comparison
Capture feedback, and produce actionable insights by designing the right questions.

Bonus Tip:

Create a Net Promoter Score to understand customer satisfaction levels. Place it alongside the CSAT value, and you’ll have a more accurate picture of user satisfaction.

Average First Response Time

Average First Response Time measures the time taken by an agent to provide an initial response to a customer who reports an incident or requests a service. The lower the average response time, the better it is for the customer and the IT organization.

According to industry best practices, the average first response time should be under a specific time limit, typically within a few minutes of the customer’s request through Live Chat. This is because customers expect prompt and timely responses to their issues and queries, and a quick initial response can help establish trust and confidence in the IT organization.

Image: expected first response time after reviewing the benchmark data

Intelligent field suggestions, powered by machine learning (ML), can significantly reduce the average first response time as incoming service desk tickets are automatically classified. So the agents don’t have to assign correct parameters and field values and can start the conversation fast.

Things to remember:

It’s important to factor in the complexity of issues that agents deal with, as some issues may require more time to resolve, which could affect the average response time.

First Contact Resolution (FCR)

First Contact Resolution (FCR) measures the percentage of issues resolved during the customer’s first interaction with the service desk, eliminating the need for further action. It results in a happy customer and saves resources and costs for the IT team. Because escalating an issue to Level 2 or 3 support is way more costly than resolving them at the service desk.

However, not all tickets can be closed in the first interaction, so it’s important to set realistic expectations for FCR and ensure that issues are properly resolved rather than just closed quickly.

Things to remember:

Ensure that the definition and measurement methodology of FCR is consistent across all agents and teams.

Consider factors such as the complexity of incidents, types of service requests, and availability of resources while setting targets for FCR.

SLA Breach Rate

SLA breach rate is a critical ITSM metric that measures the percentage of tickets that have broken or breached an SLA.

Let’s say that a company has an SLA in place that requires a high-priority incident to be resolved within 2 hours. If a high-priority incident takes 3 hours to resolve, it has breached the SLA.

The SLA breach rate is then calculated as the number of high-priority incidents that breached the SLA divided by the total number of incidents during a specific period (usually a month).

For example, let’s say that in April, the company had 100 high-priority incidents, and 10 of them breached the SLA. The SLA breach rate for April would be 10%, which indicates that 10% of high-priority incidents did not meet the agreed-upon SLA.

The SLA breach rate provides insights into the company’s ability to meet customer commitments. A high SLA breach rate can indicate issues with the ITSM processes or the company’s ability to meet its service level commitments.

Things to remember:

Regular SLA breaches can indicate that the SLA is unrealistic and may require renegotiation with the customer or revisiting the underpinning operational level agreements (OLAs) to ensure they properly support the SLA agreed with the customer.

Cost Per Ticket

Cost per ticket (also known as cost per contact) measures the amount spent on each support ticket or contact. Divide the total cost of providing IT support by the number of tickets or contacts received, and you get the cost per ticket.

The metric tells that the organization is spending too much on support and might need to reduce or redeploy some resources. However, it’s essential to note that a high cost per ticket doesn’t necessarily mean the organization is performing poorly.

For example, suppose an organization has a high cost per ticket despite having a high customer satisfaction score. In that case, consider investing in more cost-effective tools, automation, or self-service options to reduce the number of tickets and ultimately reduce the cost per ticket.

Things to remember:

While calculating the cost, remember to include indirect costs such as training, maintenance, and upgrades along with direct costs such as personnel’s salary, telecommunication, hardware, and software.

The above metrics and KPIs provide the performance overview of the ongoing ITSM processes. But to improve these processes, you must have specific metrics tied with key areas of ITSM:

Measuring ITSM Process Success: Incident Management, Problem Management, Change Management, etc.

The success of ITSM processes depends on the performance of key areas such as incident management, problem management, and change management.

Incident Management Metrics

Incident volume:

This metric measures the number of incidents reported over a given time period. Over time, a decrease in incident volume can indicate that ITSM processes are becoming more stable and reliable.

Tips: While measuring incident volume

– Define what qualifies as an incident.

– Categorize those incidents for easier analysis and trend identification.

– Keep account of duplicates

Incident resolution time:

This metric measures how quickly incidents are resolved. A shorter resolution time indicates that incidents are being addressed more quickly, which can improve customer satisfaction. However, a longer time doesn’t imply faults with the incident management process. Rather it may be that the number of high-priority incidents and complex issues is on the rise.

Tips: While measuring incident resolution time

Clear guidelines for when an incident is considered resolved
When the time tracking starts- when the customer reports the issue, or when the issue is assigned to a technician
Set benchmark incident resolution time for different categories of incidents.

Problem Management Metrics

Problem volume:

It tells the number of problems reported over a given time period. A decrease in problem volume over time can indicate that the root causes of incidents are being addressed and resolved.

Tips:

Categorize problems based on their impact (high, medium, or low) and urgency to resolve.
Set a target to reduce problem volume based on historical data (say, 10% redacted over the next quarter).
Analyze trends to identify particular systems or applications causing high problem volume.

Problem resolution time:

This metric measures how quickly problems are resolved. A shorter resolution time indicates the efficiency of the problem-solving process. However, suppose the SLA for resolving a problem is 2 hours, and the problem resolution time consistently exceeds this time frame. In that case, it may indicate a need to re-evaluate the current process, allocate additional resources, or improve communication channels between the team members.

Tips:

Categorize problems based on their complexity (simple, moderate, or complex) and the resources required to resolve them.
Similar to problem volume, use historical data to set targets for problem resolution time. Say, a target of resolving all moderate problems can be four hours.

Change Management Metrics

Change success rate:

This metric measures the percentage of changes that are implemented without causing service disruptions. A higher success rate indicates effective implementation of change management processes.

Tips:

Clearly define who is responsible for what acts/steps in the change management process.

Conduct post-implementation reviews

Change lead time:

This metric measures the amount of time required in advance before a change can be implemented. As different change types require different levels of approvals and governance, the lead time varies accordingly. For example, standard changes, which are typically low impact and low risk, may have pre-approved workflows and lead times. While normal changes may have stipulated lead time so that relevant change requests can be reviewed by CAB or Change Manager in a given time. However, irrespective of the change type, a shorter lead time in each case indicates efficient and effective change management processes.

Tip: Account for holidays and non-business hours while monitoring change lead time.

Once you’ve covered the key ITSM processes, it’s time you look into how well the ITSM tools such as CMDB, ITSM platform, service desk, and self-service portal are contributing towards those processes. Whether they’re good or not, the cost of not refreshing ITSM tools is way more than changing to a new one considering the long term.

Measuring ITSM Tool Success: Service Desk, CMDB, Self-Service Portal, etc

Service Desk

First call resolution rate: As explained earlier, this metric measures the percentage of incidents or service requests resolved on the first call. A higher first-call resolution rate may indicate that service desk agents are effectively resolving customer issues, which can improve customer satisfaction.

Average handling time: This metric measures the time a service desk agent takes to resolve an incident or service request. A shorter average handling time indicates efficient service desk operations.

Other metrics include Ticket Volume, Escalation Rates, etc.

CMDB (Configuration Management Database)

Data accuracy:

This metric measures the accuracy of the data stored in the CMDB. A high data accuracy rate indicates that the CMDB is being effectively managed, which can improve incident and problem management processes.

If you’re using ServiceNow ITSM and CMDB, you can ensure accurate, up-to-date, and complete CMDB data by integrating Virima Discovery and Service Mapping. Its agentless and client-based (for Windows) discovery will automatically discover and provide wide CI data for an extensive array of asset types to CMDB. Then the ServiceNow ITSM accesses it.

How to measure Data accuracy?

Automated discovery tools like Virima ensure accurate CMDB data. However, Sampling and Reconciliation are two common methods to check data accuracy in CMDB.

CMDB completeness:

This metric measures the percentage of configuration items (CIs) that are accurately represented in the CMDB. In other words, it depicts the percentage of Configuration Items (CIs) with all the required attributes or fields populated in the CMDB. A high completeness rate indicates that the CMDB is being effectively maintained, which can improve change management processes.

How to measure CMDB completeness?

If you’re using ServiceNow CMDB, you can use their CMDB health dashboard.

Self-Service Portal:

Adoption rate:

This metric measures the percentage of customers using the self-service portal to submit incidents or service requests. A higher adoption rate indicates that customers effectively use the self-service portal, which can improve overall customer satisfaction. Achieving a higher adoption rate depends on how easy the portal is to access FAQs, the completeness of the current issue-resolving picture, and also easiness of logging and tracking new service requests.

How to measure the adoption rate?

The number of unique visitors and percentage of self-service transactions can effectively measure the adoption rate of the self-service portal.

Self-service success rate:

This metric measures the percentage of incidents or service requests resolved via the self-service portal without the need for further assistance. In other words, it depicts the rate of completion of desired translation and/or finding useful information by a customer in the knowledge base or customer community without the assistance of a support representative. A higher self-service success rate indicates that the self-service portal effectively meets customer needs.

How to measure self-service success rate?

The same metrics that are used in the case of adoption rate are equally applicable here.

While you’re setting up metrics and KPIs to measure the success of your ITSM processes and seeing how well ITSM tools are contributing towards those processes, it’s important to adhere to some best practices.

Best Practices for Measuring ITSM Success and Improving Processes and Tools

Define clear goals and objectives:

Establish measurable targets for each key metric you set up and ensure they align with the overall ITSM strategy. For example, if the overall business objective is to reduce costs, a clear goal for ITSM might be to reduce the cost of incident resolution by 20% within the next year.

This goal can be measured using the cost-per-ticket metric mentioned earlier. Now ITSM teams can focus their efforts on finding ways to reduce costs, such as implementing automation, improving processes, or reducing the number of incidents.

Tip: Ensure you’re setting an achievable and realistic goal yet making it challenging enough. Moreover, goals should be communicated clearly with all stakeholders involved in ITSM processes and managing ITSM tools – business leaders, customers, and most importantly, IT staff.

Use relevant metrics:

If one of your objectives is to improve incident response time, you might choose metrics such as total time for identifying, classifying, and resolving an incident. On the other hand, if you want to improve your ITSM processes’ overall efficiency, consider metrics such as the number of tickets resolved per hour or the average time it takes to resolve a ticket.

Regularly review and analyze data:

Monitor metrics on a regular basis and analyze the data to identify trends and the ITSM process areas that need improvement. You can use data analytics tools like Tableau or Power BI to collect and analyze data on key ITSM metrics such as incident resolution time, change success rate, and customer satisfaction scores. For example, you can easily integrate Tableau with your existing ServiceNow ITSM. The analytics can then be used to identify weaknesses in your ITSM process and bring changes accordingly.

However, ITSM tools like Virima come with robust inbuilt data analysis capabilities that identify future or recurring problems by finding patterns from data generated through ITSM processes- problem management and incident management.

Collaborate with stakeholders:

Apart from communicating goals with every stakeholder, such as end-users, service desk agents, IT support teams, and management, it is crucial to solicit feedback from them, involve them in the improvement process, and communicate successes and areas for improvement regularly.

Leveraging ITSM Reporting and Analytics for Better Business Outcomes

Setting up metrics and KPIs for measuring ITSM success will be no good if those aren’t analyzed and transformed into reports that could be used to optimize processes and improve the performance of ITSM tools supporting the processes. And this demands a comprehensive tool with reporting and analytics capabilities, such as Virima’s ITSM platform.

It offers easy-to-use KPI Dashboards and detailed reporting capabilities. This allows you to set up metrics and KPIs for every core area of your ITSM processes. As discussed before, Virima ITSM identifies future or recurrent problems by finding patterns from ITSM process data. This enables you to work towards improving various metrics, such as incident resolution time and problem resolution times.

To know more about Virima’s powerful ITSM platforms, book a demo today!

How to measure the success of ITSM tools and processes?

Table of Contents

Importance of Measuring Success in ITSM