MSc Data Science & AI — Capstone Project Proposal

Reliable AI-Assisted
Trade Channel Classification

A Three-Stage Hybrid Pipeline Designed Around the Known Limitations of Large Language Models

Kristina Hanxhara · 200052562 CSCK700 — University of Liverpool
demo
Origin — where the idea came from

The idea stemmed from solving an outdated approach in a B2B organisation

This idea stemmed from the need to solve an outdated solution in a B2B commercial organisation and proposed a better, more scalable alternative.

The Operational Context

The organisation maintains a large database of UK customer and prospect companies that must be sorted into trade channel categories — mobile phone specialists, tyre specialists, computer hardware resellers, stationers — to support sales, marketing, and planning for universe studies to estimate the market for each channel.

The Broken Status Quo

  • Buying classifications from Experian / LDC is expensive and contractually restricted
  • Their channel definitions are incompatible with the organisation's own channel definitions
  • Manual analyst research is accurate but causes workload and is time-consuming — results are needed faster and cheaper
Channel Classification Market Universe Market Share Territory Planning Marketing Strategy Board Decisions

A wrong classification at step one flows into every decision built on top of it, making this task far more important than it first appears.

Historical Context — how this problem has been approached

Every approach so far is too costly, too rigid, or not scalable

Current approach — commercial data

Bought from Experian / LDC

Data has been bought at around £3,000 per channel definition from commercial providers. The classifications are:

  • Expensive and contractually locked
  • Based on SIC codes that do not align with the organisation's own channel definitions
  • Delivered as a static file with no flexibility to reclassify on demand
Expensive and misaligned
Current approach — manual research on top

Bought Data + Manual Reclassification

Because bought data does not match internal definitions, analysts manually research and reclassify companies on top of the purchased data. This means:

  • Accurate results but only for companies that get reviewed
  • Not scalable across thousands of records
  • Slow, time-consuming, and creates workload pressure
Accurate but not scalable
Why SIC codes fall short: SIC code 47410 covers computer hardware specialists, software resellers, mobile phone retailers, and IT managed services providers. These are commercially critical distinctions that are completely invisible to SIC-based approaches.
This is exactly why AI-assisted classification is needed.
Research Questions

Research Questions

Three questions guide the design, evaluation, and findings of this project.

1

Accuracy

Can a three-stage pipeline combining web retrieval, rule-based logic, and LLM synthesis classify UK businesses into trade channels at accuracy levels comparable to Experian and LDC, at a fraction of the cost?

2

Cost, Speed and Workload

What is the per-record cost and processing time, and how many analyst hours are displaced per 1,000 records, compared to manual research and commercial subscriptions?

3

LLM Contribution and Confidence

Does the LLM add measurable value over rules alone? How well-calibrated are its confidence scores, and what are the failure conditions of each pipeline stage that should trigger a flag for human review?

Research Hypothesis

Research Hypothesis

What this project expects to find.

An application that keeps retrieval, rule-based classification, and LLM reasoning as three separate, focused stages is expected to match the accuracy of commercial providers like Experian and LDC combined with human analyst review, at a much lower cost per record, while producing a clear explanation for every classification decision made.

The Artefact — the three-stage pipeline

How can we use AI reliably for trade channel classification?

Each stage is constrained to the task it can perform dependably. The LLM is the reasoning layer.

Stage 1 — Retrieval

Company Data

Companies House API then Google Search API
  • All desired UK-registered companies are pulled from Companies House, the official UK business registry, using SIC codes as filters
  • Using each company's name and SIC code as a starting point, the Google Search API retrieves live, current, public web information about what that company actually does
  • Every search call is logged with query, timestamp, and exact results returned
Stage 2 — Rule Engine

Structured Rules

Python channel-specific logic

Applies a set of clear, channel-specific rules to the retrieved text. Produces a best-guess channel label and a confidence score. Organises the evidence into a clean, compact package so the LLM in Stage 3 receives focused, well-structured input rather than a large volume of raw text.

Stage 3 — LLM Synthesis

Grounded Reasoning

Claude API

Receives the full structured evidence package from Stages 1 and 2 combined. Reasons over all retrieved signals and the rule engine output together to confirm or adjust the final classification, producing a calibrated confidence score alongside the decision.

Audit logs prove exactly what was checked, minimising the hallucinations that LLMs are known to produce when asked to retrieve and reason at the same time.
Literature Review — key topics the design is grounded in

Key areas of literature this project draws from

LLM Hallucination

How and why large language models generate confident but factually incorrect outputs, and the evidence that this is a structural property of how they work, not an occasional occurrence. This is the core reason retrieval and reasoning must be kept separate.

Long-Context Degradation

Research showing that LLM performance drops when important information is inside a long text, even in models built for long inputs. This is why Stage 2 combines the evidence before passing it to the LLM.

Retrieval-Augmented Generation (RAG)

The principle that an AI system performs better when it is given retrieved facts to reason over, rather than being asked to retrieve and reason at the same time. This is the architectural foundation of the whole pipeline.

Combining Rules with AI Models

Evidence that using a set of fixed rules alongside an AI model works better than either on its own, especially for specific domains. Clear-cut cases are handled by the rules, and the LLM steps in for the ones that are harder to categorise.

Teaching the AI Without Retraining It

LLMs can follow a completely new set of classification rules if you simply write them into the prompt, no retraining or extra data needed. This is how Stage 3 learns the organisation's specific channel definitions on the spot, without any prior preparation.

AI, Work Design and Analyst Workload

Research on when automating tasks genuinely helps knowledge workers versus simply adding new pressures. Automating high-volume, repetitive lookup work frees analysts for higher-value tasks.

Evaluation — how the artefact will be tested

Testing the pipeline against real-world conditions

The pipeline is evaluated on 200 to 300 labelled UK companies, compared across three conditions and measured across seven dimensions.

What we compare

A

Full Three-Stage Pipeline

The complete system being tested to see if it works overall

B

Rule Engine Only

How accurate are the rules on their own, without the LLM?

C

Manual Classification

How much does it cost in money and time for an analyst doing it manually?

Classification Accuracy — what percentage of labels are correct?

Accuracy Score — precision, recall, and F1 per channel

Cost Per Record — API costs plus analyst time vs. Experian and LDC

Processing Speed — time to classify one record end-to-end

Analyst Hours Freed — per 1,000 records classified

Statistical Significance — is the accuracy gain real?

Failure Analysis — where and why did each stage go wrong?

In summary

Focusing on how to design a system that uses AI reliably by being grounded in accurate, specific data, not a general all-achieving system we are used to seeing.

Kristina Hanxhara · 200052562

MSc Data Science and Artificial Intelligence · University of Liverpool

CSCK700 — Computer Science Capstone Project

01 / 9
← → keys or click dots