Can You Catch AI Mistakes?

Response Correctness Test

Can you spot factual errors, hallucinations, and unsupported claims in AI responses?

Tone Scoring Test

Can you detect when AI tone misses the mark — too cold, too casual, or off-brand?

Personalization Quality Test

Can you tell when AI personalization is genuine versus surface-level or completely off-target?

Completeness Test

Can you spot when AI answers only part of the question and quietly ignores the rest?

Conciseness Test

Can you spot when AI responses are bloated, padded, or unnecessarily long?

Gold Standard Rewrite Test

Can you pick the best rewrite when multiple options all sound reasonable?

Outbound Sales & Marketing

Cold Email Quality Gate

Would you send these AI-generated cold emails? Most contain issues you'd miss.

LinkedIn Outreach Audit

Can you spot the problems in AI-generated LinkedIn messages before they damage your professional brand?

Follow-Up Sequence Audit

AI follow-up sequences often contradict themselves across steps. Can you catch the inconsistencies?

Subject Line Risk Scanner

AI-generated subject lines are often the first thing spam filters and prospects judge. Can you spot the risky ones?

AI SDR Output Audit

AI SDR tools hallucinate prospect data more than you think. Can you catch the fabricated details?

Overpromising Risk Detector

AI loves to make bold claims. Can you spot the guarantees, inflated metrics, and unrealistic promises?

Response Quality Ranker

Given multiple AI-generated versions of the same email, can you consistently pick the best one?

Lead Qualification Audit

AI lead scoring looks confident, but can you spot when it gets BANT, firmographics, and intent signals wrong?

Compliance & Policy

Can You Spot AI Policy Violations?

AI systems routinely violate messaging policies, brand guidelines, and opt-out requirements. Can you catch them before they reach your customers?

Can You Catch Non-Compliant AI Claims?

AI loves to make bold claims about financial returns, medical outcomes, and legal entitlements. Can you identify the ones that cross the line?

Can You Spot Missing AI Disclaimers?

Regulated industries require specific disclaimers in communications. AI systems routinely omit them. Can you identify what's missing?

Can You Identify AI Regulatory Violations?

AI systems generate content that violates FINRA, SEC, FDA, FTC, CAN-SPAM, GDPR, and TCPA regulations daily. Can you catch them?

Can You Spot PII Leaks in AI Responses?

AI systems accidentally expose Social Security numbers, email addresses, phone numbers, and account numbers in seemingly innocent responses. Can you catch the leaks?

Can You Catch HIPAA Violations in AI Content?

AI systems in healthcare expose protected health information in alarming ways. Can you identify the violations before they become breach notifications?

Can You Spot AI Giving Unauthorized Legal Advice?

AI systems confidently deliver legal strategy, tax instructions, and immigration guidance they have no authority to give. Can you tell the difference between information and advice?

Factuality & Hallucination

Hard

Hallucination Detection Test

Can you spot when AI fabricates statistics, dates, product features, and company history that sound completely plausible?

Citation Verification Test

Can you tell when AI invents research papers, court cases, regulatory guidance, and academic sources that don't exist?

Hard

Entity Hallucination Test

Can you spot when AI invents companies, people, products, conferences, and awards that don't exist?

Attribute Hallucination Test

Can you spot when AI assigns wrong facts to real companies, people, and products?

Context Grounding Test

Given a source document, can you tell where the AI's response stays grounded vs. where it drifts into unsupported territory?

Knowledge Base Adherence Test

When AI has RAG context, can you tell if it actually used the provided documents or fell back on its training data?

Structured Output Validation

JSON Correction Test

Can you spot syntax errors, structural mistakes, and invalid values in AI-generated JSON?

Schema Compliance Test

Can you tell when AI output drifts from the expected schema?

Field Validation Test

Can you spot invalid, out-of-range, or badly formatted data in AI-extracted fields?

Field Extraction Test

Can you catch when AI extracts the wrong CRM data from emails and conversations?

Classification Accuracy Test

Can you spot when AI assigns the wrong category, sentiment, urgency, or intent?

Conversation & Customer Support

AI Support Quality Test

Can you spot when AI gives customers wrong, outdated, or incomplete support answers?

Conversation Outcome Test

Can you correctly classify how AI support conversations actually ended?

Customer Satisfaction Prediction Test

Can you predict how a customer actually feels based on what they write -- and what they don't?

Escalation Judgment Test

The AI handled it, but should it have? Test when AI support should hand off to a human.

AI Empathy Scoring Test

How well does AI respond to customers in emotional crisis? Rate the empathy quality.

AI Clarity Scoring Test

Can you spot when AI instructions are confusing, circular, or impossible to follow?

Multi-Turn Coherence Test

Can you catch when AI loses track of what the customer already told them?

Bias & Governance

Bias Severity Classification Test

Can you accurately classify the severity of bias in AI outputs — from critical immediate harm to minor stylistic issues?

Protected Class Bias Detection Test

Can you identify subtle bias against protected classes in realistic AI outputs — racial, gender, age, disability, and more?

Disparate Impact Detection Test

Can you spot when facially neutral AI criteria create discriminatory outcomes across protected groups?

Plain Language Compliance Test

Can you identify when AI-generated government and enterprise communications fail plain language and accessibility standards?

Unauthorized AI Commitment Test

Can you spot when AI makes promises, guarantees, or commitments that it has no authority to make?

AI Decision Auditing

AI Decision Correctness Test

Can you tell when an AI eligibility engine reaches the wrong conclusion? Review approval decisions across benefits, lending, insurance, and hiring.

AI Calculation Error Detection Test

AI systems make math mistakes that humans trust blindly. Can you catch arithmetic errors, wrong thresholds, formula mistakes, and insufficient-evidence decisions?

AI Error Business Impact Classification Test

When an AI gets it wrong, what is actually at stake? Classify the real-world impact of AI decision errors across financial, legal, human, and reputational dimensions.

Model & Prompt Evaluation

Prompt Comparison Evaluation Test

Two prompts, same input, different outputs. Can you consistently pick the better prompt? Test your evaluation skills across customer support, content generation, data extraction, and summarization.

AI Confidence Calibration Test

AI models report confidence scores, but are they trustworthy? Learn to spot high-confidence wrong answers, low-confidence correct answers, and dangerous calibration gaps.

Production Gating

Production Gating Decision Test

Every AI output needs a gating decision before it reaches a real person: allow, flag, block, or escalate to an expert. Can you make the right call?