Play 96
Real-Time Voice Agent v2
Very High✅ Ready
Next-gen bidirectional voice agent with sub-200ms latency and MCP tool integration.
Next-generation bidirectional WebSocket voice agent using Azure AI Voice Live SDK with MCP tool integration, voice activity detection, function calling during conversation, avatar rendering, and real-time transcription. Sub-200ms response latency for natural conversational AI.
Architecture Pattern
Voice agent loop: audio capture - VAD - STT - LLM reasoning - function calling - TTS - avatar rendering - transcription
Azure Services
Azure AI Voice LiveAzure OpenAIAzure Container AppsAzure FunctionsAzure Cosmos DB
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Voice V2 Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (248 lines), evaluate (120 lines), tune (235 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with OpenAI + Speech inputs + envFile
TuneKit (AI Config)
- config/openai.json - conversational prompts and function schemas
- config/voice.json - VAD mode, latency targets, avatar quality
- config/guardrails.json - response latency SLA, safety thresholds
- evaluation/eval.py - Latency <200ms P95, User satisfaction >4.2
Tuning Parameters
Voice activity detection modeResponse latency targetFunction calling timeoutAvatar rendering qualityTranscription language
Estimated Cost
Dev/Test
$150-350/mo
Production
$5K-15K/mo