Start with what you actually need
Most AI customer support buying decisions go wrong at the start: the buyer evaluates features before defining what problem they are solving. Before you look at a single product, write down three things: the ticket categories driving the most volume, the current pain (response time, after-hours gaps, agent burnout, peak-season scaling), and the primary outcome you want — deflection rate, response time, cost per ticket, or CSAT.
For ecommerce specifically, the non-negotiable is live order data access. Any tool that cannot connect to your order management system and answer questions about a specific customer's order will fail at the most common ticket types. Generic AI knowledge is not enough.
The AI agent must connect to live order data. A tool that can only answer from a knowledge base cannot resolve order tracking, return eligibility, or refund status — the three highest-volume ticket types in ecommerce.
Core evaluation criteria
Evaluate AI customer support tools against these criteria in roughly this order of importance for ecommerce:
| Criterion | What to look for | Why it matters |
|---|---|---|
| Shopify / OMS integration | Native, not webhook-only | Order data quality determines answer quality |
| Resolution rate | What % of contacts resolve without human? | This is the primary ROI driver |
| Human handoff quality | Full context transferred to agent | Bad handoffs destroy CSAT |
| Channel coverage | Chat, email, social in one system | Fragmented tools create gaps |
| Time to deploy | Hours to days, not months | Ecommerce moves fast |
| Accuracy and hallucination rate | Grounded answers, not fabricated ones | Wrong answers are worse than no answers |
| CSAT on AI-handled tickets | Benchmark against your current human CSAT | The bar is surprisingly achievable |
| Reporting and analytics | Deflection, volume by type, CSAT | Needed to improve over time |
Pricing models compared
Pricing structure is one of the most consequential decisions in AI support buying — and the most commonly overlooked. There are four dominant models, and they have very different incentive structures:
Per-resolution pricing
You pay for every ticket the AI resolves. This sounds attractive until you realize the vendor benefits financially when your volume is high — and they define what counts as a "resolution." Stores that automate well and grow fast end up with bills that scale directly with their success. WISMO spikes during peak season can generate invoices you did not budget for.
Per-seat pricing (inherited from live chat tools)
You pay per human agent using the platform. This made sense before AI but creates an odd structure: the more you automate, the less you use the seats you are paying for. If your goal is deflection, per-seat pricing actively works against you.
Flat monthly pricing
A single monthly fee regardless of ticket volume or resolutions. This aligns incentives: the vendor wants you to automate as much as possible because it demonstrates value and you stay a customer. Budget predictability is high. Bookbag uses flat pricing for exactly this reason — a BFCM volume spike should not generate a surprise bill.
Usage-based (API / compute costs)
You pay for the underlying API calls or compute. This can be very cheap at low volume and very expensive at high volume. Requires engineering involvement to manage and is typically not appropriate for non-technical ecommerce teams.
Red flags to avoid
- No live order data access — the tool can only answer generic FAQ questions, not questions about a specific customer's order. This is disqualifying for ecommerce.
- Per-resolution pricing with a cap or overage structure — your peak season bill will be unpredictable and potentially very large.
- "AI" that is actually a rule-based flow builder — look for vendors that let you ask the bot an off-script question during a demo. If it falls over, it is not a real AI agent.
- Multi-month implementation timelines — modern ecommerce AI tools with Shopify integration should deploy in hours or days. Months-long implementations indicate legacy architecture.
- No CSAT measurement on AI-handled tickets — if the vendor cannot show you satisfaction scores on automated resolutions, they are hiding something.
- Escalation paths that require the customer to start over — any tool that loses conversation context on handoff to a human is creating the worst possible customer experience.
Questions to ask vendors
- 1Show me the agent resolving a return request for a specific order number — what does the customer experience look like end to end?
- 2What is your median deflection rate across customers of our size and type? Can you show me a case study, not just a headline number?
- 3How does your pricing model work during BFCM when our volume triples? Walk me through a specific example.
- 4What happens when the AI is not confident in an answer? Walk me through the escalation flow and what the human agent sees.
- 5How long does implementation take? What does it require from our technical team?
- 6How do we update the agent when our policies change — for example, a new return window or a holiday shipping delay?
- 7What is your CSAT data on AI-handled tickets vs. human-handled tickets across your customer base?
How to run a proper trial
A trial that does not test the real use case teaches you nothing. Run your pilot like this:
- 1Connect real data: link your actual Shopify store and import your real help content. Dummy data produces unrealistic results.
- 2Run in shadow mode for one week: let the AI generate responses without sending them. Have a human agent review and score them for accuracy and tone.
- 3Measure accuracy by ticket type: do not average accuracy across all ticket types. Order tracking should be very high; edge cases will be lower. Know the breakdown.
- 4Test failure mode: ask the agent a question it should not know the answer to. Does it escalate gracefully, or does it fabricate a confident wrong answer?
- 5Test the handoff: trigger an escalation and experience the human agent side. Is the context there? Is the conversation readable?
- 6Run it live for two weeks on a single channel: measure deflection rate, CSAT, and escalation rate before deciding to expand.
Key takeaways
- Define your top three problem ticket types before evaluating any tool.
- Live order data access is non-negotiable for ecommerce AI support.
- Flat pricing aligns vendor incentives with yours; per-resolution pricing does not.
- A proper trial uses real data and tests failure modes, not just happy-path demos.
- Ask vendors for CSAT data on AI-handled tickets — it is the most revealing single metric.