Conversational AI & Voice Agents

Where conversational AI actually works

Voice is the interface most businesses still solve with people, scripts, and shift rotas. That is where a well designed voice AI agent creates immediate leverage. I design and ship voice systems that handle real conversations end to end, from the first ring to the structured record that lands in a database once the call is over.

The stack I default to is ElevenLabs for natural, low latency speech, Twilio for telephony, SMS fallback, and DTMF handling, and n8n as the orchestration layer that connects the agent to the rest of the business. For workloads where cost per minute matters more than voice quality, I move parts of the stack to Vonage or Vapi. The choice is a design decision, not a preference.

Why the market is finally ready

Voice AI latency dropped below the 800 millisecond conversational threshold across the major providers during 2024. That is the single change that moved voice AI out of demo territory and into production use for real customer facing conversations.

The addressable market is very large. Analyst estimates put global outbound calling operations at over 250 billion dollars annually, and inbound support at similar scale. A voice agent that reliably handles even 20 percent of that volume is a substantial commercial system.

Conversational design as engineering

A voice agent lives or dies on how it handles the boring middle of a call. Interruptions, silence, voicemail detection, transfer, escalation, dead air, callback logic. I treat each of these as a first class engineering concern with clear branching, retries, and logged outcomes, not as prompt tricks.

Every conversation ends with a structured record. Call duration, intent captured, action taken, payment collected, transfer target, and any human review flags all get written back to Supabase so the business can see the operation, not just the transcript.

What clients bring me in for

Most engagements start with a specific problem. Outbound collections that need to scale without a bigger call floor. Inbound routing across brands and departments. Voice ordering that has to talk to a live commerce backend. Multilingual qualification for a subsidy or benefit programme. After hours and weekend coverage for a local services business. I have shipped each of these into production.

If you are evaluating voice AI and want a candid view of what to build in house, what to buy, and where the real risk sits, that is the kind of consulting call I take.

Recently shipped

Outbound debt collection for a telecom provider. Voicemail detection, SMS fallback, DTMF payment capture on the call itself, multi brand routing, full call logging back to Supabase.

Voice ordering for a retail brand talking directly to the Shopify GraphQL API. Orders placed by phone land in the same order management flow as web orders, no re entry step.

A conversational receptionist replacing a traditional IVR menu across seven departments, with an embedded knowledge layer that resolves simple queries before any transfer.

A multilingual qualification agent for an energy subsidy programme, built and tested across two languages.

How engagements are priced

Voice AI builds are usually fixed price at the pilot stage, then move to a per minute usage model once the agent is live, calculated with a healthy margin over telephony and AI cost. For local services and small businesses I run a low setup fee plus per minute model that scales with real usage.

Frequently asked

What does a production voice AI agent actually cost to run?+

Cost is dominated by voice synthesis and telephony per minute, not by the LLM. ElevenLabs plus Twilio typically lands between $0.10 and $0.25 per minute at production quality. Cheaper Vonage based stacks come in lower when voice fidelity matters less than volume.

How long does it take to ship a voice agent to production?+

A focused single use case agent (inbound routing, outbound qualification, one payment flow) is usually 3 to 6 weeks from scope to live traffic. Multi brand or multi intent systems take longer, mostly on conversation design and edge cases.

Can a voice AI agent take phone payments securely?+

Yes. Using Twilio Pay with DTMF capture, card details are entered by the caller directly and never touch the LLM or your logs, which keeps the flow PCI friendly by design.

Do you build in house or use platforms like Vapi and Retell?+

Both. For fast validation or low risk workflows a platform like Vapi is fair. For production systems where you own the data model, cost structure, and telephony behaviour, an ElevenLabs plus Twilio plus n8n stack is more durable.

Estimate the impact

Move the sliders. See the shape of the numbers.

A first pass calculator for this practice area. Useful for the first budget conversation, not a quote.

Estimate the run rate

calc.sh

Rough what a production voice agent costs to keep on the phone.

Two levers dominate voice AI economics: minutes and the voice engine you choose. Move the sliders. See the blended monthly run rate.

Call minutes per month5,000 min

20040,000

Voice engine cost0.12 £ / min

0.050.35

Calls that reach a valuable outcome22 %

560

// results, recomputed live

Voice engine£600 / mo
ElevenLabs, priced per minute
Telephony£70 / mo
Twilio inbound plus outbound blend
Orchestration and storage£95 / mo
n8n plus Supabase
Blended monthly run rate£765
Valuable outcomes captured275 / mo
Payment, transfer, qualification, callback

Useful for the first budget conversation. Real quotes tune the voice engine, telephony route, and orchestration layer to the specific workload.

Other pillars

Continue reading

AI Automation Systems

Self hosted n8n and Supabase systems that quietly run the parts of a business no one wants to touch again.

AI Product & SaaS Development

Shipping AI powered products from prompt to pricing page, with a founder's judgement on what to build and what to skip.

AI Support & Ops Systems

Ticket triage, agentic ops, and internal tools that let a small team behave like a much larger one.