Start Here
The core concepts behind Bookbag's AI QA & Evaluation Platform.
AI QA & Evaluation Platform
A structured evaluation layer for AI-generated content that routes every AI output through human-authority review lanes — approved, needs fix, or blocked — producing audit trails and training data.
Safe to Deploy
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message meets all quality, compliance, and brand standards and is approved for delivery without human review.
Needs Fix
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message has quality or compliance issues that a QA reviewer can correct before the message is approved for delivery.
Blocked Verdict
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message has serious issues requiring SME (subject matter expert) review, rationale, and evidence before any decision is made.
Compliance
AI Audit Trail
A complete, immutable record of every decision made about AI-generated content — including who reviewed it, when, which rubric applied, the verdict, rationale, and any corrections.
AI Outbound Compliance
The practice of ensuring AI-generated outbound communications meet applicable legal, regulatory, and industry standards before delivery to recipients.
Audit-Ready Review
A review process designed to produce documentation that satisfies regulatory examination, enterprise procurement, and compliance audit requirements from the start.
CAN-SPAM for AI Messaging
The application of the CAN-SPAM Act requirements to AI-generated commercial email, including truthful headers, honest subject lines, physical address inclusion, and opt-out mechanisms.
FINRA AI Compliance
The application of FINRA advertising and communications rules (particularly Rule 2210) to AI-generated financial services outbound, ensuring AI content meets the same regulatory standards as human-authored communications.
Outbound Deliverability Risk
The risk that AI-generated outbound messages damage sender reputation, trigger spam filters, or reduce inbox placement rates.
TCPA AI Compliance
Ensuring AI-generated text messages, calls, and voicemails comply with the Telephone Consumer Protection Act's consent, timing, and content requirements.
Training Data
AI Brand Safety
Protecting brand reputation by ensuring AI-generated communications maintain approved tone, messaging standards, and factual accuracy.
AI Hallucination Detection
The process of identifying factually incorrect, fabricated, or unsubstantiated claims in AI-generated content before it reaches recipients.
AI Message Quality
The overall standard of AI-generated outbound communications measured across dimensions including accuracy, tone, compliance, personalization relevance, and conversion effectiveness.
DPO Training Data
Direct Preference Optimization data — pairs of AI outputs where one version is human-preferred over another, used to align language models with human quality standards.
Gold Standard Rewrites
Expert-corrected versions of AI-generated messages that serve as the approved reference examples for quality, tone, compliance, and effectiveness.
Preference Ranking Data
Ordered rankings of multiple AI output variations by human reviewers, used to train models on quality gradients rather than binary good/bad distinctions.
SFT Export
Supervised Fine-Tuning export — extracting human-corrected message pairs (original AI output + approved correction) in formats suitable for fine-tuning language models.
AI Decision Auditing
AI Decision Auditing
Evaluating AI-generated decisions — not just messages — against evidence, policy context, and regulation to produce structured verdicts and compliance-ready audit trails.
Evidence Payload
The structured data package submitted for AI decision auditing — including the AI's decision, the evidence it considered, policy context, model trace, and metadata.
Evidence Sufficiency
Whether the submitted evidence meets the threshold required for a given AI decision — ranging from complete documentation to critical evidence missing.
Model Trace
The reasoning chain an AI model used to reach a decision — documenting the sequence of steps, data transformations, and rule applications from input to output.
Policy Context
The regulatory rules, internal policies, and compliance frameworks that an AI decision is evaluated against — providing the 'should' against which the AI's actual decision is compared.
Taxonomy Template
An industry-specific set of failure categories, business impact ratings, and evaluation criteria used to structure AI decision auditing for a particular regulated domain.
Verdicts
AI QA & Evaluation Platform
A structured evaluation layer for AI-generated content that routes every AI output through human-authority review lanes — approved, needs fix, or blocked — producing audit trails and training data.
Blocked Verdict
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message has serious issues requiring SME (subject matter expert) review, rationale, and evidence before any decision is made.
Message Gating
The process of evaluating and routing AI-generated messages through defined review lanes, producing structured verdicts and training data for every message.
Needs Fix
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message has quality or compliance issues that a QA reviewer can correct before the message is approved for delivery.
Safe to Deploy
A verdict in the AI QA & Evaluation Platform indicating that an AI-generated message meets all quality, compliance, and brand standards and is approved for delivery without human review.
SME Escalation
The process of routing high-risk AI-generated content to designated subject matter experts who have final authority to approve, correct, or reject the content with documented rationale.
Governance
AI QA & Evaluation Platform for Customer-Facing AI
A structured evaluation layer for AI-generated customer-facing content — routing every AI-generated message through human review lanes (approved, needs fix, or blocked) to enforce policy, safety, brand standards, and escalation handling while producing audit trails and training data.
Compliance Verdict
A documented decision — approved, needs fix, or blocked — rendered on every AI-generated message after evaluation against regulatory guardrails, PHI detection, hallucination checks, and quality standards.
Conversation Logging
The systematic recording of every AI-generated message with full provenance — including the model that generated it, the prompt used, the context provided, and the governance verdict applied — before delivery.
PHI Detection
Automated scanning of AI-generated communications for protected health information (PHI) as defined by HIPAA — including the 18 identifiers that constitute PHI — before messages reach patients or customers.
Workflows
Annotator Calibration
The process of training and aligning human reviewers to apply rubrics consistently, measured through gold set evaluation and inter-annotator agreement metrics.
Human-in-the-Loop AI
An AI system design where human reviewers participate in the decision-making process, providing oversight, corrections, and authority that the AI alone cannot provide.
QA Review Workflow
A structured process where quality assurance reviewers evaluate, correct, and approve AI-generated content using defined rubrics and authority levels.
Rubric Versioning
The practice of version-stamping review rubrics so every verdict can be traced to the specific rules that applied at the time of the decision.
Taxonomy Config
A configurable schema that defines the rubrics, label definitions, scoring criteria, and review rules for a specific project or compliance domain.
See how these concepts work in practice
Join the teams shipping safer AI with real-time evaluation, audit trails, and continuous improvement.