Play 04

Call Center Voice AI

High✅ Ready

Voice-enabled customer service with real-time STT→LLM→TTS streaming.

Build a phone-answering AI agent. Azure Communication Services handles the call, Speech Service converts audio to text, GPT-4o processes the intent and generates a response, then TTS speaks it back — all streaming in real time. Includes escalation triggers, PII redaction, and call recording consent flows.

Architecture Pattern

STT→LLM→TTS streaming pipeline, intent detection, escalation

Azure Services

Communication ServicesAI Speech (STT + TTS)Azure OpenAI (gpt-4o)Container AppsContent Safety

DevKit (.github Agentic OS)

agent.md — root orchestrator with builder→reviewer→tuner handoffs
3 agents — Voice AI Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
3 skills — deploy (107 lines), evaluate (102 lines), tune (114 lines)
4 prompts — /deploy, /test, /review, /evaluate with agent routing
.vscode/mcp.json — FrootAI MCP with Speech + Communication keys + envFile

TuneKit (AI Config)

config/openai.json — model selection, temperature for voice
config/guardrails.json — PII redaction, consent, profanity filter
config/agents.json — escalation triggers, hold music
config/model-comparison.json — GPT-4o vs GPT-4o-mini latency

Tuning Parameters

Speech config (language, speed)Grounding promptsFallback chainsResponse latency targetsAudio encoding

Estimated Cost

Dev/Test

$200–400/mo

Production

$2.5K–10K/mo

User Guide Open in VS Code View on GitHub Setup Guide Configurator Ask Agent FAI Back to FrootAI