Play 27

AI Data Pipeline

High✅ Ready

ETL with LLM augmentation — classify, enrich, and redact at scale.

ETL pipeline enhanced with LLM intelligence. Data flows through Azure Data Factory, and at each stage GPT-4o-mini (chosen for cost efficiency on high-volume processing) classifies records, extracts entities, scores quality, and redacts PII. Schema detection auto-maps incoming formats. Event Hubs handles real-time ingestion. Cosmos DB stores enriched output. Batch processing handles millions of records with automatic retry and dead-letter queues.

Architecture Pattern

LLM-augmented ETL: classify, extract, enrich, redact, lakehouse integration

Azure Services

Azure OpenAI (gpt-4o-mini)Data FactoryBlob StorageCosmos DBEvent Hubs

DevKit (.github Agentic OS)

agent.md — root orchestrator with builder→reviewer→tuner handoffs
3 agents — Data Pipeline Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
3 skills — deploy (104 lines), evaluate (105 lines), tune (103 lines)
4 prompts — /deploy, /test, /review, /evaluate with agent routing
.vscode/mcp.json — FrootAI MCP with Storage + OpenAI inputs + envFile

TuneKit (AI Config)

config/openai.json — gpt-4o-mini for cost efficiency, batch mode
config/pipeline.json — stage definitions, batch size, retry rules
config/guardrails.json — PII redaction rules, quality thresholds
evaluation/eval.py — Classification accuracy >90%, PII recall >95%

Tuning Parameters

Classification prompts per data typePII detection rules (GDPR/HIPAA)Quality score thresholdsBatch size (100→10K)Dead-letter retry policySchema mapping rules

Estimated Cost

Dev/Test

$50–150/mo

Production

$800–3K/mo

User Guide Open in VS Code View on GitHub Setup Guide Configurator Ask Agent FAI Back to FrootAI