Where conversational AI actually works
Voice is the interface most businesses still solve with people, scripts, and shift rotas. That is where a well designed voice AI agent creates immediate leverage. I design and ship voice systems that handle real conversations end to end, from the first ring to the structured record that lands in a database once the call is over.
The stack I default to is ElevenLabs for natural, low latency speech, Twilio for telephony, SMS fallback, and DTMF handling, and n8n as the orchestration layer that connects the agent to the rest of the business. For workloads where cost per minute matters more than voice quality, I move parts of the stack to Vonage. The choice is a design decision, not a preference.
Conversational design as engineering
A voice agent lives or dies on how it handles the boring middle of a call. Interruptions, silence, voicemail detection, transfer, escalation, dead air, callback logic. I treat each of these as a first class engineering concern with clear branching, retries, and logged outcomes, not as prompt tricks.
Every conversation ends with a structured record. Call duration, intent captured, action taken, payment collected, transfer target, and any human review flags all get written back to Supabase so the business can see the operation, not just the transcript.
What clients bring me in for
Most engagements start with a specific problem. Outbound collections that need to scale without a bigger call floor. Inbound routing across brands and departments. Voice ordering that has to talk to a live commerce backend. Multilingual qualification for a subsidy or benefit programme. I have shipped each of these into production.
If you are evaluating voice AI and want a candid view of what to build in house, what to buy, and where the real risk sits, that is the kind of consulting call I take.
What does a production voice AI agent actually cost to run?+
Cost is dominated by voice synthesis and telephony per minute, not by the LLM. ElevenLabs plus Twilio typically lands between $0.10 and $0.25 per minute at production quality. Cheaper Vonage based stacks come in lower when voice fidelity matters less than volume.
How long does it take to ship a voice agent to production?+
A focused single use case agent (inbound routing, outbound qualification, one payment flow) is usually 3 to 6 weeks from scope to live traffic. Multi brand or multi intent systems take longer, mostly on conversation design and edge cases.
Can a voice AI agent take phone payments securely?+
Yes. Using Twilio Pay with DTMF capture, card details are entered by the caller directly and never touch the LLM or your logs, which keeps the flow PCI friendly by design.
Do you build in house or use platforms like Vapi and Retell?+
Both. For fast validation or low risk workflows a platform like Vapi is fair. For production systems where you own the data model, cost structure, and telephony behaviour, an ElevenLabs plus Twilio plus n8n stack is more durable.