Running an audit
A step-by-step guide to running a Support Audit: prepare your transcripts, choose a judge engine, run the audit, and read the scored report and recommended fixes.
View as MarkdownThis guide walks through running a Support Audit end to end — from gathering transcripts to acting on the report. You can run it for free at app.bookbag.ai/audit with no account, or from the Support Audit section of your dashboard if you want to keep a history.
Step 1 — Prepare your transcripts
The audit reads support conversations as agent / customer turns. The parser is tolerant — it accepts CSV, JSON, or pasted plain-text transcripts — so you can usually export from your current tool and paste with minimal cleanup.
Paste a transcript with each turn labeled by role. Separate conversations with a blank line.
Customer: Where is my order #4471? Agent: Your order shipped yesterday and arrives Thursday. Customer: Can I get a refund if it's late? Agent: Absolutely, I'll refund you in full no matter what.
Include a mix of routine and tricky conversations — refunds, shipping delays, edge cases. The audit is most useful when the transcripts reflect the questions that actually challenge your support.
Step 2 — Choose a judge engine
Pick how the transcripts are scored:
- Heuristic (default) — a deterministic, rule-based scorer. Fast, reproducible, no model required.
- LLM judge — a language model evaluates each transcript as a strict QA auditor. Pick the model in the engine dropdown for a more nuanced read.
Step 3 — Run the audit
- 1Paste your transcriptsDrop in your CSV, JSON, or plain-text conversations.
- 2Add your detailsOn the public funnel, enter your email and company so we can send you the report link.
- 3Pick the engineKeep the default heuristic engine, or select an LLM judge model.
- 4RunBookbag parses the conversations, scores each agent reply, and generates the report.
Step 4 — Read the report
The report opens with the four headline metrics, then the supporting detail:
| Section | How to read it |
|---|---|
| Headline metrics | Hallucination rate, policy-violation count, quality score, and resolution rate — your at-a-glance health check. |
| Scope | Total agent turns and number of conversations analyzed, so you know the sample size behind the scores. |
| Findings | Specific flagged replies with an excerpt, a severity, and the type (hallucination or a named policy issue), plus a recommendation for each. |
| Recommended fixes | A prioritized, plain-language list of what to change first to lift the scores. |
If 5% or more of replies contain unverifiable claims, the most impactful fix is grounding answers in your real order and catalog data with retrieval — see Response quality.
Step 5 — Share and act
Each report has a public URL backed by a report token, so you can share it with teammates or stakeholders without a login. Use the recommended fixes as a roadmap — most map directly to Bookbag features:
- Hallucinations → ground the agent in your data with data sources and retrieval.
- Policy violations → add runtime guardrails and route risky cases to a human with Escalate to a human.
- Low resolution → add next-step actions like order tracking and returns, and hand off the rest.
- Low quality → tighten the system prompt and pin high-stakes answers as Q&A — see Best practices.