Play 19
Edge AI Phi-4
High🔧 Skeleton
Deploy Phi-4 SLM on edge devices with ONNX quantization and offline inference.
Run a small language model on edge devices without cloud connectivity. Phi-4 quantized to INT4 via ONNX Runtime runs on devices with 4GB+ RAM. IoT Hub manages device fleet, syncs model updates, and collects telemetry. Supports offline inference with periodic cloud sync for model updates.
Architecture Pattern
Edge AI, SLM, ONNX quantization, offline inference, device sync
Azure Services
IoT HubContainer InstancesONNX RuntimeAzure Storage
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Edge AI Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (115 lines), evaluate (100 lines), tune (112 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with IoT Hub + HuggingFace inputs + envFile
TuneKit (AI Config)
- config/edge.json — quantization level, model config, memory constraints
- config/sync.json — update schedule, rollback rules
Tuning Parameters
Quantization level (INT4/INT8)Model configSync scheduleDevice memory budget
Estimated Cost
Dev/Test
$20–50/mo
Production
$100–500/mo