Play 06

Document Intelligence

Medium🔧 Skeleton

Extract, classify, and structure document data with OCR + LLM.

Feed PDFs, invoices, receipts, and forms into Azure Document Intelligence for OCR, then GPT-4o extracts structured fields into typed JSON. Cosmos DB stores results. The pipeline handles multi-page documents, handwriting, tables, and stamps. PII masking built in.

Architecture Pattern

OCR+LLM extraction, structured output, form recognition

Azure Services

Document IntelligenceAzure OpenAI (gpt-4o)Blob StorageCosmos DB

DevKit (.github Agentic OS)

agent.md — root orchestrator with builder→reviewer→tuner handoffs
3 agents — DocIntel Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
3 skills — deploy (117 lines), evaluate (110 lines), tune (105 lines)
4 prompts — /deploy, /test, /review, /evaluate with agent routing
.vscode/mcp.json — FrootAI MCP with Document Intelligence key + envFile

TuneKit (AI Config)

config/openai.json — extraction prompts, gpt-4o multimodal
config/extraction-schema.json — field definitions per doc type
config/guardrails.json — PII masking
evaluation/test-set.jsonl — doc samples per type

Tuning Parameters

Extraction promptsConfidence thresholdsField schemasDocument classification rules

Estimated Cost

Dev/Test

$80–200/mo

Production

$1.2K–3K/mo

User Guide Open in VS Code View on GitHub Setup Guide Configurator Ask Agent FAI Back to FrootAI