Play 44
Foundry Local On-Device
High✅ Ready
On-device LLM inference for air-gapped environments with cloud escalation.
On-device LLM inference for air-gapped and data-sovereign environments — local model handles routine queries, cloud escalates for complex reasoning, with automatic fallback, sync, and fleet management via IoT Hub. Supports disconnected operation with queued sync. Ideal for manufacturing floors, field operations, and government classified environments.
Architecture Pattern
Hybrid local-cloud inference: confidence-based escalation, offline queue, fleet sync
Azure Services
Azure OpenAIAzure IoT HubAzure MonitorKey Vault
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Foundry Local Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (246 lines), evaluate (178 lines), tune (233 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with OpenAI key + cache path inputs + envFile
TuneKit (AI Config)
- config/openai.json — gpt-4o for cloud, local model config
- config/edge.json — escalation threshold, sync interval, memory budget
- config/guardrails.json — model validation, inference safety
- evaluation/eval.py — Local accuracy >80%, Escalation rate <20%
Tuning Parameters
Local model thresholdCloud escalation confidenceSync intervalDevice memory budgetQueue retention policy
Estimated Cost
Dev/Test
$50–120/mo
Production
$1K–4K/mo