Support Audit overview
Bookbag's free Support Audit scores your existing customer-support transcripts for hallucinations, policy violations, answer quality, and resolution rate — no account required. Find out where your current support is leaking trust.
View as MarkdownThe Support Audit is a free tool that grades your existing customer-support conversations. You paste in real transcripts — from your current chatbot, your help desk, or a competitor's bot — and Bookbag returns a scored report: how often answers are unverifiable, how many replies break policy, an overall quality score, and an estimated resolution rate.
Run an audit at app.bookbag.ai/audit without creating an account. Paste your transcripts, get a scored report, and share it via a link.
Who it's for
The audit is built for teams that already run support — with a chatbot, a help desk, or both — and want a fast, objective read on quality before switching tools. If you have a bot that hallucinates order numbers or promises refunds it shouldn't, the audit quantifies the damage and tells you how to fix it.
- Evaluating a switch from Chatbase, Intercom, or another tool — audit their transcripts side by side.
- Auditing your own bot to find where it invents facts or breaks policy.
- Setting a baseline before deploying a Bookbag agent, so you can measure the improvement.
What it measures
The audit scores the agent's replies (not the customer's messages) across four dimensions:
| Metric | What it means |
|---|---|
| Hallucination rate | The percentage of agent replies that contain unverifiable or likely-invented claims — fabricated order numbers, prices, or policies the customer never provided. |
| Policy violations | A count of risky replies: unauthorized refunds or guarantees, PII exposure, legal/medical advice, rude tone, or over-promised delivery dates. |
| Quality score | A composite 0–100 score for overall answer quality, penalized by hallucination and policy-violation density. |
| Resolution rate | The estimated share of conversations that reached a clear resolution rather than being left unresolved or deflected. |
The report also includes the total agent turns and the number of conversations analyzed, plus specific findings (with excerpts) and a prioritized list of fixes.
How scoring works
You choose the judge engine that scores your transcripts:
| Engine | How it scores | When to use it |
|---|---|---|
| Heuristic | A deterministic, multi-signal scorer using pattern rules (invented order numbers, refund/promise/PII/tone regexes). Fast and runs with no model key. | A quick baseline, or when you want fully reproducible scores. |
| LLM judge | A language model reviews each transcript as a strict QA auditor and returns structured findings. The report engine reads llm:<model>. | A more nuanced read that catches subtler hallucinations and policy issues. |
The heuristic engine is the default and always available. The LLM judge is optional — pick the model in the dashboard audit's engine dropdown.
The report
Every audit produces a shareable, branded report. It carries a report token and a public URL, so you can send the results to a teammate or stakeholder without them logging in. See Running an audit for how to read each section.
From audit to fix
The audit is diagnostic — the fixes it recommends are exactly what a grounded Bookbag agent is built to deliver: retrieval over your real data to kill hallucinations, runtime guardrails to stop policy violations, and next-step actions plus human handoff to lift resolution. See Response quality for how grounding produces trustworthy answers.