Support Audit overview

Bookbag's free Support Audit scores your existing customer-support transcripts for hallucinations, policy violations, answer quality, and resolution rate — no account required. Find out where your current support is leaking trust.

View as Markdown

The Support Audit is a free tool that grades your existing customer-support conversations. You paste in real transcripts — from your current chatbot, your help desk, or a competitor's bot — and Bookbag returns a scored report: how often answers are unverifiable, how many replies break policy, an overall quality score, and an estimated resolution rate.

Free and no signup

Run an audit at app.bookbag.ai/audit without creating an account. Paste your transcripts, get a scored report, and share it via a link.

Who it's for

The audit is built for teams that already run support — with a chatbot, a help desk, or both — and want a fast, objective read on quality before switching tools. If you have a bot that hallucinates order numbers or promises refunds it shouldn't, the audit quantifies the damage and tells you how to fix it.

Evaluating a switch from Chatbase, Intercom, or another tool — audit their transcripts side by side.
Auditing your own bot to find where it invents facts or breaks policy.
Setting a baseline before deploying a Bookbag agent, so you can measure the improvement.

What it measures

The audit scores the agent's replies (not the customer's messages) across four dimensions:

Metric	What it means
Hallucination rate	The percentage of agent replies that contain unverifiable or likely-invented claims — fabricated order numbers, prices, or policies the customer never provided.
Policy violations	A count of risky replies: unauthorized refunds or guarantees, PII exposure, legal/medical advice, rude tone, or over-promised delivery dates.
Quality score	A composite 0–100 score for overall answer quality, penalized by hallucination and policy-violation density.
Resolution rate	The estimated share of conversations that reached a clear resolution rather than being left unresolved or deflected.

The report also includes the total agent turns and the number of conversations analyzed, plus specific findings (with excerpts) and a prioritized list of fixes.

How scoring works

You choose the judge engine that scores your transcripts:

Engine	How it scores	When to use it
Heuristic	A deterministic, multi-signal scorer using pattern rules (invented order numbers, refund/promise/PII/tone regexes). Fast and runs with no model key.	A quick baseline, or when you want fully reproducible scores.
LLM judge	A language model reviews each transcript as a strict QA auditor and returns structured findings. The report engine reads `llm:<model>`.	A more nuanced read that catches subtler hallucinations and policy issues.

info

The heuristic engine is the default and always available. The LLM judge is optional — pick the model in the dashboard audit's engine dropdown.

The report

Every audit produces a shareable, branded report. It carries a report token and a public URL, so you can send the results to a teammate or stakeholder without them logging in. See Running an audit for how to read each section.

From audit to fix

The audit is diagnostic — the fixes it recommends are exactly what a grounded Bookbag agent is built to deliver: retrieval over your real data to kill hallucinations, runtime guardrails to stop policy violations, and next-step actions plus human handoff to lift resolution. See Response quality for how grounding produces trustworthy answers.

What's next

Running an audit

Step-by-step: prepare transcripts, run the audit, and read the report.

Response quality

How grounding, citations, and retrieval produce trustworthy answers.

Best practices

Techniques for consistently accurate, on-brand agents.

States

Running an audit