BookbagBookbag

Running an audit

A step-by-step guide to running a Support Audit: prepare your transcripts, choose a judge engine, run the audit, and read the scored report and recommended fixes.

View as Markdown

This guide walks through running a Support Audit end to end — from gathering transcripts to acting on the report. You can run it for free at app.bookbag.ai/audit with no account, or from the Support Audit section of your dashboard if you want to keep a history.

Step 1 — Prepare your transcripts

The audit reads support conversations as agent / customer turns. The parser is tolerant — it accepts CSV, JSON, or pasted plain-text transcripts — so you can usually export from your current tool and paste with minimal cleanup.

Paste a transcript with each turn labeled by role. Separate conversations with a blank line.

Customer: Where is my order #4471?
Agent: Your order shipped yesterday and arrives Thursday.
Customer: Can I get a refund if it's late?
Agent: Absolutely, I'll refund you in full no matter what.
Use a representative sample

Include a mix of routine and tricky conversations — refunds, shipping delays, edge cases. The audit is most useful when the transcripts reflect the questions that actually challenge your support.

Step 2 — Choose a judge engine

Pick how the transcripts are scored:

  • Heuristic (default) — a deterministic, rule-based scorer. Fast, reproducible, no model required.
  • LLM judge — a language model evaluates each transcript as a strict QA auditor. Pick the model in the engine dropdown for a more nuanced read.

Step 3 — Run the audit

  1. 1
    Paste your transcripts
    Drop in your CSV, JSON, or plain-text conversations.
  2. 2
    Add your details
    On the public funnel, enter your email and company so we can send you the report link.
  3. 3
    Pick the engine
    Keep the default heuristic engine, or select an LLM judge model.
  4. 4
    Run
    Bookbag parses the conversations, scores each agent reply, and generates the report.

Step 4 — Read the report

The report opens with the four headline metrics, then the supporting detail:

SectionHow to read it
Headline metricsHallucination rate, policy-violation count, quality score, and resolution rate — your at-a-glance health check.
ScopeTotal agent turns and number of conversations analyzed, so you know the sample size behind the scores.
FindingsSpecific flagged replies with an excerpt, a severity, and the type (hallucination or a named policy issue), plus a recommendation for each.
Recommended fixesA prioritized, plain-language list of what to change first to lift the scores.
High hallucination rate?

If 5% or more of replies contain unverifiable claims, the most impactful fix is grounding answers in your real order and catalog data with retrieval — see Response quality.

Step 5 — Share and act

Each report has a public URL backed by a report token, so you can share it with teammates or stakeholders without a login. Use the recommended fixes as a roadmap — most map directly to Bookbag features:

  • Hallucinations → ground the agent in your data with data sources and retrieval.
  • Policy violations → add runtime guardrails and route risky cases to a human with Escalate to a human.
  • Low resolution → add next-step actions like order tracking and returns, and hand off the rest.
  • Low quality → tighten the system prompt and pin high-stakes answers as Q&A — see Best practices.

What's next