Does Observe require changes to my agent code?

No if you use the Gateway (transparent MCP proxy). Minimal if you use the SDK — one start_run call and Observe captures every subsequent tool_call, output, and end_run event on that run.

What does Observe capture?

AgentRuns (one per agent turn), AgentSpans (LLM turns, tool calls, retrievals), AgentToolCalls (intercepted call + decision + reason), AgentSpanEvents (detector hits, reviewer actions, webhook deliveries). All keyed by organization_id + run_uid + session_id.

How long are traces kept?

Cloud Starter: 30 days. Cloud Pro: 365 days. Enterprise: configurable with data-retention policies per org. Archived traces remain queryable for audit.

Is Observe the same as Datadog / Langfuse / Arize?

Similar on the visibility surface — different on the enforcement surface. Observe records what happened. Guardrails enforces what should happen. Evaluation scores what's good. Governance packages the evidence. One platform.

Observe

See every AI action. Before you govern them.

Observe is the system of record. Every agent, every tool call, every LLM turn. Full causal history so when something looks wrong you can see what led the model there — not just what it did.

Get early access Read the docs

The AI activity layer

Five entities. One data layer. Zero hand-wiring.

Inventory

Applications, agents, MCP servers, datasets. Agents auto-register when they first hit the gate. Apps, MCP servers, and datasets are declarative.

Live activity feed

SSE-driven stream of every intercepted call. Filter by decision, tool, agent, session, or run. Expand a row to see the full policy trace.

Causal trace

Every run is a span tree — prompts, LLM turns, prior tool results, detector flags — so you can answer 'why did the agent try that?' not just 'what did it try?'

Session replay

Tag a run with session_id to group an entire conversation. Replay a failing session to reproduce the misbehavior.

Cross-product insights

Top agents by block count, top uncovered tools, top detector flags — queries that only work when Observe, Guardrails, and Evaluation share a data layer.

OpenTelemetry-shaped

AgentSpans match OTLP semantics. Import existing OTLP traces via /api/v1/traces, export to your SIEM if you need to.

From one line of Python to a live dashboard

The SDK sends events. The dispatcher writes them to Observe. The UI shows them in real time.

agent.py

from bookbag import BookbagClient
client = BookbagClient(api_key=os.environ["BOOKBAG_API_KEY"])

# start_run creates an Agent row the first time + opens the run
run = client.agent.start_run(
    agent_id="refund-bot",
    metadata={"application_id": 7, "session_id": "conv_8f3a2c"},
)

# every tool_call and output under this run_uid feeds the Activity feed
client.agent.tool_call(run_uid=run["run_uid"], tool="lookup_customer",
                       arguments={"email": "jane@example.com"})
client.agent.output(run_uid=run["run_uid"], text="I've pulled your order history...")
client.agent.end_run(run_uid=run["run_uid"], outcome="success")

session · conv_8f3a2c·support-agent · live

streaming