FAI Development Workflows
Azure AI-specific development patterns — RAG debugging, agent lifecycle, evaluation-driven development, cost tracking.
RAG Development Cycle
Building a production RAG pipeline follows a 5-phase cycle. Each phase has specific validation criteria before moving to the next:
RAG Debugging Workflow
When RAG quality is low, debug each layer independently. Start from retrieval and work forward:
# 1. Test retrieval directly — bypass the LLM
curl -X POST "$SEARCH_ENDPOINT/indexes/$INDEX/docs/search" \
-H "api-key: $SEARCH_KEY" \
-H "Content-Type: application/json" \
-d '{
"search": "How do I configure RBAC?",
"queryType": "semantic",
"semanticConfiguration": "default",
"top": 5,
"select": "title,content,chunk_id"
}'
# 2. Check: Are the top-5 results relevant?
# If not → chunking strategy or embedding model is wrong
# If yes → problem is in the prompt or generation layer# Compare embedding similarity between query and expected doc
import numpy as np
from openai import AzureOpenAI
client = AzureOpenAI(...)
q_emb = client.embeddings.create(
input="How do I configure RBAC?",
model="text-embedding-3-large"
).data[0].embedding
doc_emb = client.embeddings.create(
input="RBAC configuration requires assigning roles...",
model="text-embedding-3-large"
).data[0].embedding
similarity = np.dot(q_emb, doc_emb)
print(f"Cosine similarity: {similarity:.4f}")
# Expected: > 0.80 for relevant pairs# 3. Test generation with known-good context
node engine/index.js \
solution-plays/01-enterprise-rag/fai-manifest.json \
--eval --query "How do I configure RBAC?" \
--context "RBAC configuration requires assigning roles..."
# Check groundedness score
# < 3.5 → system prompt needs stronger grounding instructions
# ≥ 4.0 → generation is properly groundedAgent Development Cycle
FAI agents follow the Build → Review → Tune chain. Each phase has a dedicated agent pattern:
Build Phase
Create the agent file with frontmatter, implement using config/ values, wire into the target play. The builder agent generates initial structure.
# Scaffold a new agent
node scripts/scaffold-primitive.js agent
# Creates: agents/fai-<name>.agent.md
# With frontmatter: description, model, tools, waf, playsReview Phase
Self-review against security, quality, and WAF compliance. Check tool access follows least privilege, description is accurate, and pillar alignment is correct.
# Validate the new agent
node scripts/validate-primitives.js --verbose
# Check: description ≥ 10 chars?
# Check: tool names are valid?
# Check: WAF pillars are from the 6-pillar set?
# Check: filename is lowercase-hyphen?Tune Phase
Verify config values are production-appropriate. Test the agent in Copilot Chat with real scenarios. Measure response quality and iterate.
# Load the play that uses this agent
node engine/index.js \
solution-plays/01-enterprise-rag/fai-manifest.json \
--status
# Open Copilot Chat and invoke:
# @fai-rag-architect "Review my search index config"
# Verify: agent stays in scope, uses allowed tools onlyEvaluation-Driven Development
Like test-driven development but for AI quality. Write evaluation criteria before building, then iterate until all metrics pass:
{
"evaluation": {
"metrics": {
"groundedness": { "threshold": 4.0, "weight": 0.3 },
"relevance": { "threshold": 3.5, "weight": 0.25 },
"coherence": { "threshold": 4.0, "weight": 0.2 },
"fluency": { "threshold": 4.0, "weight": 0.15 },
"safety": { "threshold": 1.0, "weight": 0.1 }
},
"min_weighted_score": 3.8,
"test_queries": 50,
"fail_fast": true
}
}The cycle: define thresholds → build pipeline → run eval → check scores → tune config → re-eval. Never ship a play where any metric is below threshold. The FAI Engine blocks deployment if fail_fast: true and any metric fails.
Cost Tracking Workflow
AI costs can spiral without monitoring. Track token usage, model routing efficiency, and caching hit rates:
# Estimate monthly cost for a play
npx frootai cost 01 --scale prod
# Sample output:
# Azure OpenAI (GPT-4o): $340/mo (850K tokens/day)
# Azure AI Search (S1): $250/mo (1 index, 50GB)
# Azure Container Apps: $45/mo (2 replicas, 0.5 vCPU)
# Total estimated: $635/mo
# Track token usage over time
node engine/index.js \
solution-plays/01-enterprise-rag/fai-manifest.json \
--cost --period 7dRoute simple queries to GPT-4o-mini ($0.15/1M tokens), escalate complex ones to GPT-4o ($2.50/1M tokens). A 70/30 split can save 60% on model costs.
Cache responses keyed by embedding similarity. A 40% cache hit rate eliminates 40% of LLM calls. Use Azure Cache for Redis with vector similarity.
Set max_tokens per request and daily token caps per play. Alert at 80% of budget. The FAI Engine enforces these limits at runtime.
Prompt Iteration Workflow
Systematic prompt improvement using versioned configs and A/B evaluation:
# config/openai.json — version your system prompts
{
"model": "gpt-4o",
"temperature": 0.3,
"max_tokens": 2048,
"system_prompt_version": "v3",
"system_prompt": "You are an Azure architecture expert. Answer ONLY from the provided context. If the context does not contain the answer, say 'I don't have information about that.' Never speculate beyond the provided documents."
}
# Iterate: change prompt → run eval → compare scores
# v1: Basic prompt → groundedness: 3.2
# v2: + "answer ONLY from" → groundedness: 3.8
# v3: + "never speculate" → groundedness: 4.3 ✓Debugging LLM Calls
When responses are unexpected, inspect the actual API calls:
# Set environment variable for verbose logging
$env:FAI_DEBUG = "true"
# Run the engine — all API calls are logged
node engine/index.js \
solution-plays/01-enterprise-rag/fai-manifest.json \
--eval --verbose
# Log output includes:
# → Request: model, temperature, max_tokens, messages[]
# → Response: finish_reason, usage.total_tokens, content
# → Timing: latency_ms, tokens_per_secondfinish_reason: length — Response was truncated. Increase max_tokens.
finish_reason: content_filter — Content safety blocked the response. Check input for policy violations.
429 Too Many Requests — Rate limited. Implement retry with exponential backoff or increase TPM quota.
High latency (>5s) — Large context window. Reduce retrieved chunks or use streaming.
Security Audit Workflow
Run before every deployment. Covers secrets, identity, content safety, and prompt injection resistance:
# 1. Scan for secrets in the codebase
node hooks/fai-secrets-scanner/scan.js
# 2. Verify Managed Identity configuration
az identity show --name "$MI_NAME" --resource-group "$RG"
# 3. Check content safety filters are active
az cognitiveservices account show \
--name "$AOAI_NAME" --resource-group "$RG" \
--query "properties.contentFilter"
# 4. Test prompt injection resistance
node engine/index.js \
solution-plays/01-enterprise-rag/fai-manifest.json \
--eval --test-set "evaluation/adversarial-prompts.jsonl"