Pillar 02 · ITSM AI Readiness

Service desk data foundations for AI that actually works.

The five questions every IT leader should answer before turning on AI in their service desk, and what the answers look like when you ask them honestly.

Last updated May 2026 · Based on analysis of 1,000+ tickets from a UK-based MSP and ServiceNow implementation partner · Reading time ~12 min

Most ITSM AI projects don't fail because the AI is bad. They fail because the data underneath it can't support what the AI is being asked to do. The vendor demo works; the production rollout doesn't. Six months later, the dashboards are quiet and nobody can quite explain why.

This page answers the five questions that separate AI projects that deliver from those that quietly stall. The answers draw on a recent diagnostic we ran on 1,000 closed tickets and 232 open tickets from a single UK-based MSP, a working ServiceNow implementation partner serving 22 client organisations. We've anonymised the source. The patterns are real.

1 Data Quality

Is our IT data clean and structured enough to train AI models?

Short answer: Probably not, but the more important question is whether it's clean enough to be used by AI. ITSM AI features rarely train custom models on your data. They retrieve from it, cluster it, and pattern-match against it in real time. That changes what "clean enough" means.

Most AI features shipping in ServiceNow Now Assist, Halo's AI suite, Freshservice Freddy, and Zendesk's AI tools don't fine-tune on customer data at scale. They use your data as a retrieval source. That distinction matters because the failure modes are different. A retrieval-based AI doesn't need millions of examples; it needs a few hundred coherent examples per pattern. The bottleneck isn't volume. It's consistency.

Three measurable conditions determine whether ITSM data is usable in this way:

Categorisation consistency

When tickets describing the same issue land in different categories, or default to a generic catch-all, AI clustering can't form coherent groups. In one MSP's open ticket data, 70% of categorised tickets resolved to a single leaf category called "Configuration Change." That's effectively no taxonomy.

Resolution note quality

Suggested-replies, knowledge suggestions, and FCR uplift all depend on past resolutions being readable. In the same dataset, 26% of ticket summaries were under 30 characters. Resolution notes were absent entirely from the standard export. AI cannot suggest a fix from a ticket closed with no record of how it was solved.

Signal-to-noise ratio

ITSM systems accumulate non-customer noise: scheduled checks, automation alerts, time-tracking entries, dev tickets cross-posted from project tools. Roughly 13% of one MSP's closed tickets weren't customer issues at all. The largest discoverable cluster was time-tracking entries from a single agent.

The honest assessment Take a representative sample of closed tickets from the last 90 days, and check what proportion have (a) a meaningful category beyond a default value, (b) resolution text describing what was actually done, and (c) any connection to a real customer issue. If any of those three is below 60%, AI features will underperform their demos noticeably.

The fix isn't AI training. It's data hygiene, and it produces value with or without AI.

2 Measuring Value

How do we measure whether our AI investment is actually paying off?

Short answer: By measuring the four agent-experience signals that tell you whether the data underneath your AI is actually supporting it. Vendor dashboards report usage; these four measure value. If usage is high but these signals are weak, you have a data quality problem dressed up as an adoption success.

Most ITSM platforms ship AI features with built-in dashboards showing how often the features are invoked: how many summaries generated, how many suggestions surfaced, how many sessions used the virtual agent. Those numbers are easy to grow and tell you almost nothing. A suggestion that's surfaced but ignored is worse than no suggestion at all, it costs the agent attention without delivering value.

The four signals that actually measure whether AI is paying off all sit downstream of the data quality work described above. Each maps to a specific failure mode that buyers should expect to see if their data isn't ready:

What was promised

How to measure value (not usage)

AI-suggested next actions

Suggestion acceptance rate by agents. Above 30% = data is supporting it. Below 15% = upstream cleanup needed.

Ticket deflection / virtual agent

Deflection rate against KB articles created in the last 12 months. New KB content driving deflection means the loop is closing.

Agent-assist suggested replies

Reply acceptance rate and edit distance. High edits = AI is starting from the wrong place.

First-contact resolution improvement

FCR rate movement on AI-active tickets vs. control group, measured over 90 days minimum.

The chain matters. If clustering is incoherent because categorisation is fragmented, suggestions can't form. If suggestions can't form, replies aren't useful. If replies aren't useful, FCR doesn't move. One broken link breaks every metric downstream. Which is why measuring usage in isolation produces false comfort, and why measuring all four together reveals the actual health of the AI deployment.

A common misread "Our AI suggestion volume is up 400% quarter-over-quarter" sounds like adoption success. It's actually evidence the AI is generating more output. Whether that output is useful is a different question, and the only one that determines ROI.

The teams that get this right run a baseline measurement before enabling AI, then measure the same signals 90 days post-enablement. Without the baseline, vendor dashboards become the only available frame, and they always look positive.

3 Data Preparation

How much data preparation does AI in ITSM actually require?

Short answer: More than vendors suggest, less than consultancies sell. The realistic preparation work for most AI features is 20-60 hours of focused effort on a specific dataset, not a multi-month transformation programme. The risk isn't under-investing; it's the wrong kind of investment.

Vendor demos run on cleaned, curated, representative data. Vendor sales materials describe AI as "ready to use against your existing data." Consultancy proposals describe AI as requiring a 6-12 month "data foundation programme" before any value can be realised. Both are misleading. The realistic position sits between them.

For most mainstream ITSM AI features, agent-assist, knowledge suggestions, ticket clustering, basic predictive routing, the preparation work breaks down into three categories:

Triage tasks (10-20 hours). Identifying and removing data that's polluting the dataset rather than informing it. Closed-but-not-resolved tickets. Test tickets that escaped to production. Automation noise. Dev tickets cross-posted from other systems. This is the single highest-value cleanup activity for AI readiness because it changes the signal-to-noise ratio that AI clustering depends on. It's also the activity vendors and consultancies routinely overlook.
Categorisation cleanup (20-40 hours). Reducing category count to a usable taxonomy and re-categorising historical tickets to that taxonomy. Most teams have accumulated 50-200 categories where 15-30 would do. Collapsing to a tighter taxonomy and back-applying it to closed tickets is the single most expensive cleanup activity but also the most durable. Categories shape every AI feature downstream.
Resolution-note enforcement (ongoing). Mandating substantive resolution notes before closure. This doesn't fix historical data but it improves data quality from day one of enforcement. Within 90 days a team produces a useful AI training set entirely from new closures.

What consultancies sell as "AI data preparation" often includes none of these and instead focuses on data warehousing, lake architectures, and governance frameworks. Those are valuable for analytics; they have minimal impact on whether ITSM AI features deliver. The teams getting fast value from AI are the ones doing the unglamorous triage and categorisation work first.

The fastest test Pick a single AI feature your platform offers, suggested-replies is the easiest. Enable it, and measure how often agents accept the suggestions over a two-week period. Above 30% means the data is feeding it usefully. Below 15% means the upstream data needs the work described above before the feature is worth the licence cost.

4 Integration

How do we integrate AI with our legacy IT systems and existing workflows?

Short answer: Most integration failures aren't technical, they're workflow mismatches. The AI feature works; the workflow it produces output into wasn't designed to consume AI output. The integration question is really a workflow design question.

Vendor AI features generally produce three types of output: suggestions (agent-facing), automations (system-facing), and insights (manager-facing). Each requires different integration work, and most teams under-invest in the integration most likely to determine adoption.

Suggestions require the most workflow design and the least technical integration. The AI is already inside the agent's primary tool. The question is whether the suggestion appears at the moment the agent needs it, in a form they can act on without context-switching. If suggested-replies appear after the agent has already started typing, they get ignored. If knowledge articles surface in a sidebar the agent has minimised, they get ignored. The integration work is interface design, not API work.

Automations require the most technical integration and the most governance. AI-driven actions, auto-categorising, auto-routing, auto-resolving, must connect to legacy systems where the consequences land. A misrouted ticket in a workflow that sends to an email distribution list nobody monitors creates a longer outage than no routing at all. The integration risk is not the API call; it's the human or system at the receiving end of an action they didn't expect.

Insights require the least integration but the most cultural alignment. AI-generated trend reports, anomaly alerts, and capacity warnings only produce value if the recipient has authority and bandwidth to act on them. Most insight features get switched on, generate alerts that nobody actions, and quietly stop being read.

For legacy system integration specifically, older ITSM platforms, custom-built tools, on-premise systems, the practical question is whether the AI feature can be granted appropriate read/write access without exposing sensitive data. SaaS AI features often require data egress that legal or security teams will (correctly) block. The mitigation is usually one of: regional data residency commitments from the vendor, on-premise AI inference (rare but emerging), or scoping the AI feature to non-sensitive data only.

The sequence that works (1) Start with suggestion-class features against a single team. (2) Measure adoption and acceptance over 60 days. (3) Expand to insights if managers will actually use them. (4) Introduce automations only after governance is demonstrably working. Teams that start with automation tend to roll back within 90 days.

5 Vendor vs. Custom

Do we choose a vendor-provided "AI-native" ITSM platform or build custom solutions?

Short answer: For 90% of organisations, vendor-provided is the right answer, but the question being asked is usually wrong. The real choice isn't between vendor AI and custom AI. It's between using vendor AI well and using vendor AI badly. Custom AI for ITSM rarely outperforms a properly-prepared vendor solution.

The "AI-native platform" claim from ITSM vendors is overstated but increasingly meaningful. ServiceNow, Halo, Freshservice, Zendesk, and TOPdesk have shipped real AI features in 2025-2026 that meaningfully change the agent experience when the data underneath supports them. The features aren't differentiated enough to drive platform selection on their own, but they're capable enough that switching platforms purely to access AI rarely justifies the migration cost.

Three scenarios where custom AI development genuinely makes sense:

Domain-specific knowledge models. Organisations with highly specialised technical environments, research computing, regulated industries with proprietary terminology, deeply customised infrastructure, sometimes find generic vendor AI produces low-quality suggestions because it lacks domain context. Custom retrieval-augmented systems trained on domain-specific knowledge can outperform vendor defaults. This applies to a small minority of organisations and requires sustained ML engineering capability.
Cross-platform intelligence. Organisations running ITSM, CRM, ERP, and observability tools as separate systems sometimes need AI that operates across all of them, incident correlation that includes customer impact from CRM data, change risk that incorporates ERP financial exposure. Vendor AI is platform-bound; custom AI can span systems. The integration overhead is substantial.
Privacy-mandated isolation. A small number of regulated environments cannot send any operational data to vendor cloud AI services. On-premise AI inference is technically feasible but operationally expensive. Most organisations citing this requirement actually have policy flexibility they haven't tested with their security team.

For everyone else, which is most readers of this page, the productive question is not vendor-vs-custom but how to extract maximum value from the vendor AI you've already paid for. The data preparation described in the previous questions is the highest-leverage activity available. Custom AI development is high-leverage in narrow circumstances and a costly distraction in most.

The trap to avoid Deciding to build custom AI because vendor AI underperformed in a pilot, when the actual cause of underperformance was data quality. Custom AI fed the same data will underperform the same way, at higher cost, with longer time to value.

See where your service desk data sits.

Run our free assessment. Upload a CSV export from your ITSM tool, Halo, ServiceNow, Freshservice, Zendesk, TOPdesk, or any CSV. We score it across categorisation, resolution quality, completeness, and noise. The whole thing runs in your browser. Nothing leaves your machine.

Run the free assessment See pricing

Service desk data foundations for AI that actually works.

Is our IT data clean and structured enough to train AI models?

Categorisation consistency

Resolution note quality

Signal-to-noise ratio

How do we measure whether our AI investment is actually paying off?

How much data preparation does AI in ITSM actually require?

How do we integrate AI with our legacy IT systems and existing workflows?

Do we choose a vendor-provided "AI-native" ITSM platform or build custom solutions?

See where your service desk data sits.

Continue the series