Play 06
Document Intelligence
Medium🔧 Skeleton
Extract, classify, and structure document data with OCR + LLM.
Feed PDFs, invoices, receipts, and forms into Azure Document Intelligence for OCR, then GPT-4o extracts structured fields into typed JSON. Cosmos DB stores results. The pipeline handles multi-page documents, handwriting, tables, and stamps. PII masking built in.
Architecture Pattern
OCR+LLM extraction, structured output, form recognition
Azure Services
Document IntelligenceAzure OpenAI (gpt-4o)Blob StorageCosmos DB
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — DocIntel Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (117 lines), evaluate (110 lines), tune (105 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with Document Intelligence key + envFile
TuneKit (AI Config)
- config/openai.json — extraction prompts, gpt-4o multimodal
- config/extraction-schema.json — field definitions per doc type
- config/guardrails.json — PII masking
- evaluation/test-set.jsonl — doc samples per type
Tuning Parameters
Extraction promptsConfidence thresholdsField schemasDocument classification rules
Estimated Cost
Dev/Test
$80–200/mo
Production
$1.2K–3K/mo