Ship Safer AI with Three Lines of Code
The Gate API evaluates every AI output against your taxonomy in real time. Python and Node.js SDKs with zero external dependencies. Advisory or enforced mode. Fail-open or fail-closed. Returns a decision in 1–4 seconds.
View full API documentationfrom bookbag import BookbagClient
client = BookbagClient(api_key="bk_gate_xxx")
result = client.gate.evaluate(
input="What is my refund policy?",
output="Full refund within 90 days."
)
if result.policy_action == "block":
fallback_response() # Critical issue
else:
send_response(output) # Safe to shipconst { BookbagClient } = require('@bookbag/sdk')
const client = new BookbagClient({ apiKey: 'bk_gate_xxx' })
const result = await client.gate.evaluate({
input: 'What is my refund policy?',
output: 'Full refund within 90 days.'
})
if (result.policy_action === 'block') fallbackResponse()One Platform. Every AI Output Covered.
From real-time API gates to batch analysis, from AI-only evaluation to full human review — the infrastructure to evaluate, annotate, improve, and audit every AI output your organization produces.
Real-Time Quality Gates
Gate API + SDK. Every AI response evaluated in 1–4 seconds. Allow, flag, or block before it reaches users.
Multi-Stage Evaluation Pipeline
Fast (1-pass), Standard (2-pass), Deep (3-pass). Per-stage model selection — cheap model for triage, smart model for edge cases.
Three Review Modes
AI-only for speed. Human-only for gold standard. Hybrid for production with continuous improvement.
Customizable Taxonomies
Define what matters — hallucination, compliance, tone, safety. Rubric templates for any domain. Version-stamped for audit.
Training Data Generation
Every correction becomes SFT, DPO, or ranking data. Export to fine-tune your models. Close the feedback loop.
Complete Audit Trail
Every evaluation logged with full provenance. Who reviewed, when, which rubric, what decision. Compliance-ready from day one.
How It Works
From API call to decision in seconds. Your AI generates → Bookbag evaluates → Your app enforces.
Your AI generates
Chatbot response, copilot suggestion, agent action, content draft — any AI output headed to users.
SDK evaluates
One API call sends input + output to the evaluation pipeline. Python, Node.js, or REST.
Multi-stage scoring
Stage 1: fast triage. Stage 2: QA verification. Stage 3: expert review. Each stage uses the model you choose.
Policy decides
Your rules translate scores into actions: allow, review, block, or require SME. Advisory or enforced.
Your app enforces
Act on the decision. Full audit trail persisted automatically. Every evaluation searchable and exportable.
Three Ways to Evaluate
Choose the review mode that fits your risk profile, volume, and quality requirements. Switch modes per project.
Automated
Full AI evaluation, real-time. Decision returned synchronously via Gate API. No human involvement. Results in 1–4 seconds.
Assisted
AI evaluates and returns a decision immediately. Flagged items are queued for human review in the background. Best of both worlds.
Human
Expert human review on every item. Three-tier workflow: annotator, QA reviewer, subject matter expert. Gold-standard quality.
Built For Teams Deploying AI
Whether you're shipping a chatbot, operating in a regulated industry, or building AI products — Bookbag provides the evaluation infrastructure you need.
AI-First Companies
Deploying chatbots, copilots, or AI agents? Gate every response. Catch hallucinations before users do. Build trust with systematic evaluation.
Regulated Industries
Healthcare, finance, legal, government. Every AI decision needs an audit trail. Every evaluation needs documented human oversight.
AI Vendors & Platforms
Build quality into your product. Ship evaluation as a feature. Unblock enterprise deals with audit trails and governance built in.
Enterprise ML Teams
Systematic evaluation across models. Generate SFT, DPO, and ranking datasets from every correction. Close the feedback loop between production and training.
Solutions
Bookbag adapts to how your organization deploys AI.
AI-First Companies
Gate every chatbot, copilot, and agent response. Catch hallucinations, enforce policies, build user trust at scale.
Regulated Industries
Healthcare, finance, legal, government. Audit trails, human oversight, and compliance-ready evaluation for every AI decision.
AI Vendors & Platforms
Build evaluation into your product. Ship with audit trails and governance. Unblock enterprise deals.
ML & AI Teams
Systematic model evaluation. Generate training data from corrections. Compare models side by side. Close the feedback loop.
Credit-Based Pricing That Scales
Developer tier: Free — 100 credits/month with Gate API access.
Paid plans from $6,000/month with advanced evaluation depth, workforce management, and enterprise features.
Frequently Asked Questions
Ready to evaluate your AI?
Join the teams shipping safer AI with real-time evaluation, audit trails, and continuous improvement.