Control of the conversation
Hermes is one of the few voice stacks designed for outbound operator use, not consumer assistants. Latency budgets are workflow-friendly, and the audio path is observable end-to-end.
How to wire Hermes voice and messaging into a production operator that books, collects, follows up, and escalates — without a swivel-chair human in the middle.
The stack
Updated · 2026-05-21
Hermes carries the outbound voice and SMS conversation. ElevenLabs synthesizes the voice in the brand it should sound like.
Anthropic Claude plans the call, decides what to say, and routes the next action — with explicit guardrails per workflow.
Glama presents every downstream tool (CRM, ticketing, scheduling, payments) over MCP, so the operator calls one uniform interface.
Stripe issues payment intents and invoices the operator can open and reconcile on signed authority.
Supabase holds account state and conversation history. MongoDB stores the append-only action log with the model reasoning attached.
Hermes is one of the few voice stacks designed for outbound operator use, not consumer assistants. Latency budgets are workflow-friendly, and the audio path is observable end-to-end.
Anthropic Claude with a tightly-scoped system prompt and tool surface produces fewer hallucinations on a known workflow than open-ended chat models. We pin model version per workflow so behavior never drifts silently.
Glama as the tool gateway means the operator only learns one calling convention. New tools become one MCP entry — no per-vendor SDK churn inside the prompt.
Stripe payment intents leave a third-party receipt for every action the operator takes. The internal MongoDB log carries the same intent IDs, so finance reconciliation is two cross-joins, not an investigation.
Accounts-receivable follow-up where outbound voice closes faster than email-only sequences
Insurance, healthcare, and field-service scheduling where humans are dropping the phone on the floor
Quote follow-up in B2B distribution and construction where a five-minute call recovers a margin a chase email never does
After-hours intake for law firms and brokerages where the lead would otherwise go cold to a competitor by morning
Pros
Cons
Pros
Cons
Pros
Cons
Pin model version per workflow. Behavior drift is the single biggest risk in production voice operators; freeze the model and only roll the next version through your eval set.
Capture the full audio plus the reasoning trace for every escalation. Operators are only trusted because we can show why they did what they did.
Run Hermes against a tier ladder. Tier A (high-trust accounts) gets one cadence; Tier C (broken-promise accounts) gets another. The operator pulls tier from Supabase memory; the cadence lives in code, not in the model.
Use Stripe's idempotency keys for every payment intent. The operator must be safe to retry a tool call without doubling a charge.
Wire Glama's MCP audit feed straight into the MongoDB log. Every tool the operator touches lands in one immutable record.
Set hard escalation rules — exceeded retries, ambiguous intent, sensitive language — that route to a human operator with the full conversation context attached.
For a tightly scoped workflow — accounts-receivable follow-up, appointment confirmation, intake triage — yes. The operator handles the standard conversation. The five percent of calls that need a human get escalated with the full context attached, so the human picks up where the operator left off.
Claude's tool-use behavior is more predictable on long, multi-turn workflows where the operator needs to refuse out-of-scope requests. We use both depending on the workflow, but for production voice the AIMOCS default is Claude.
Hermes records and stores audio with explicit consent at call start. We hold transcripts and the reasoning trace in MongoDB inside the customer's region. For regulated industries we add a redaction pass before any data leaves the operator's container.
Every operator runs with a signed authority bar — a list of actions it is allowed to take without a human. If the model wants to step outside, it escalates instead. The audit log shows both the attempted action and the reason it was blocked.
Four to six weeks from kickoff to first money. Week one is workflow mapping, weeks two-three are integration, week four is operator-on-shadow against your real data, weeks five-six are graduated handover.
We don't advise on AI. We run it for you.