Play 04
Call Center Voice AI
High✅ Ready
Voice-enabled customer service with real-time STT→LLM→TTS streaming.
Build a phone-answering AI agent. Azure Communication Services handles the call, Speech Service converts audio to text, GPT-4o processes the intent and generates a response, then TTS speaks it back — all streaming in real time. Includes escalation triggers, PII redaction, and call recording consent flows.
Architecture Pattern
STT→LLM→TTS streaming pipeline, intent detection, escalation
Azure Services
Communication ServicesAI Speech (STT + TTS)Azure OpenAI (gpt-4o)Container AppsContent Safety
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Voice AI Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (107 lines), evaluate (102 lines), tune (114 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with Speech + Communication keys + envFile
TuneKit (AI Config)
- config/openai.json — model selection, temperature for voice
- config/guardrails.json — PII redaction, consent, profanity filter
- config/agents.json — escalation triggers, hold music
- config/model-comparison.json — GPT-4o vs GPT-4o-mini latency
Tuning Parameters
Speech config (language, speed)Grounding promptsFallback chainsResponse latency targetsAudio encoding
Estimated Cost
Dev/Test
$200–400/mo
Production
$2.5K–10K/mo