Running an audit

A step-by-step guide to running a Support Audit: prepare your transcripts, choose a judge engine, run the audit, and read the scored report and recommended fixes.

View as Markdown

This guide walks through running a Support Audit end to end — from gathering transcripts to acting on the report. You can run it for free at app.bookbag.ai/audit with no account, or from the Support Audit section of your dashboard if you want to keep a history.

Step 1 — Prepare your transcripts

The audit reads support conversations as agent / customer turns. The parser is tolerant — it accepts CSV, JSON, or pasted plain-text transcripts — so you can usually export from your current tool and paste with minimal cleanup.

Paste a transcript with each turn labeled by role. Separate conversations with a blank line.

Customer: Where is my order #4471?
Agent: Your order shipped yesterday and arrives Thursday.
Customer: Can I get a refund if it's late?
Agent: Absolutely, I'll refund you in full no matter what.

Use a representative sample

Include a mix of routine and tricky conversations — refunds, shipping delays, edge cases. The audit is most useful when the transcripts reflect the questions that actually challenge your support.

Step 2 — Choose a judge engine

Pick how the transcripts are scored:

Heuristic (default) — a deterministic, rule-based scorer. Fast, reproducible, no model required.
LLM judge — a language model evaluates each transcript as a strict QA auditor. Pick the model in the engine dropdown for a more nuanced read.

Step 3 — Run the audit

1
Paste your transcripts
Drop in your CSV, JSON, or plain-text conversations.
2
Add your details
On the public funnel, enter your email and company so we can send you the report link.
3
Pick the engine
Keep the default heuristic engine, or select an LLM judge model.
4
Run
Bookbag parses the conversations, scores each agent reply, and generates the report.

Step 4 — Read the report

The report opens with the four headline metrics, then the supporting detail:

Section	How to read it
Headline metrics	Hallucination rate, policy-violation count, quality score, and resolution rate — your at-a-glance health check.
Scope	Total agent turns and number of conversations analyzed, so you know the sample size behind the scores.
Findings	Specific flagged replies with an excerpt, a severity, and the type (hallucination or a named policy issue), plus a recommendation for each.
Recommended fixes	A prioritized, plain-language list of what to change first to lift the scores.

High hallucination rate?

If 5% or more of replies contain unverifiable claims, the most impactful fix is grounding answers in your real order and catalog data with retrieval — see Response quality.

Each report has a public URL backed by a report token, so you can share it with teammates or stakeholders without a login. Use the recommended fixes as a roadmap — most map directly to Bookbag features:

Hallucinations → ground the agent in your data with data sources and retrieval.
Policy violations → add runtime guardrails and route risky cases to a human with Escalate to a human.
Low resolution → add next-step actions like order tracking and returns, and hand off the rest.
Low quality → tighten the system prompt and pin high-stakes answers as Q&A — see Best practices.

What's next

Response quality

How Bookbag grounds answers to eliminate the issues the audit flags.

Build your first agent

Turn the audit's fixes into a deployed, grounded agent.

Best practices

Get accurate, on-brand answers consistently.

Audit overview

Members & roles

Step 1 — Prepare your transcripts

Step 2 — Choose a judge engine

Step 3 — Run the audit

Step 4 — Read the report

Step 5 — Share and act

What's next