FAI Before/After Showcase
See the transformation: generic AI code → WAF-aligned, production-ready, FAI-wired solutions.
Why Before/After Matters
Most AI projects start with a quick prototype — a raw OpenAI call, a manual deployment, no evaluation. These prototypes work in demos but fail in production. They leak secrets, hallucinate without guardrails, cost 10x more than they should, and break under load.
The FAI Protocol transforms these prototypes into production-grade architectures by wiring in the Well-Architected Framework, automated evaluation, infrastructure-as-code, and the builder/reviewer/tuner quality chain. Below are 5 real transformations showing the gap between generic and FAI-aligned solutions.
1. Raw OpenAI Call → Guarded RAG Pipeline
The problem: A direct OpenAI call with no retrieval, no safety filters, hardcoded API keys, and no evaluation. It hallucinates freely and leaks credentials.
Before — Generic
import openai
# Hardcoded API key (security vulnerability)
openai.api_key = "sk-abc123..."
def ask(question: str) -> str:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": question}],
temperature=1.0, # Maximum randomness
)
return response.choices[0].message.content
# No retrieval, no grounding, no safety filters
# No retry logic, no cost tracking, no evaluation
print(ask("What is our refund policy?"))After — FAI-Aligned
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from openai import AzureOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential
credential = DefaultAzureCredential() # Managed Identity
search = SearchClient(
endpoint=os.environ["SEARCH_ENDPOINT"],
index_name="knowledge-base",
credential=credential,
)
client = AzureOpenAI(
azure_endpoint=os.environ["OPENAI_ENDPOINT"],
azure_ad_token_provider=credential.get_token,
api_version="2024-06-01",
)
@retry(stop=stop_after_attempt(3), wait=wait_exponential())
def ask(question: str) -> dict:
# 1. Hybrid search with semantic reranker
results = search.search(
search_text=question,
vector_queries=[get_embedding(question)],
query_type="semantic",
top=5,
)
context = "\n".join([r["content"] for r in results])
# 2. Grounded completion with safety
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.1, # Near-deterministic
max_tokens=2000,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Context:\n{context}\n\nQ: {question}"},
],
)
# 3. Groundedness check (guardrail >= 0.85)
return {"answer": response.choices[0].message.content, "sources": [...]}2. Manual Deployment → Bicep Infrastructure-as-Code
The problem: Azure resources created via portal clicks. No reproducibility, no version control, secrets pasted into app settings, no diagnostic settings.
Before — Portal Clicks
1. Open Azure Portal
2. Create Resource Group (click, click, click...)
3. Create OpenAI resource (copy API key to notepad)
4. Create AI Search (paste key into app settings)
5. Create Container App (upload code via ZIP deploy)
6. Pray nothing breaks
# Problems:
# - No version control for infrastructure
# - Secrets in app settings (not Key Vault)
# - No diagnostic settings or alerts
# - Cannot reproduce in staging/prodAfter — FAI Bicep
@description('Deploy Enterprise RAG infrastructure')
param location string = resourceGroup().location
@secure()
param openAiEndpoint string
// Managed Identity — no API keys anywhere
resource identity 'Microsoft.ManagedIdentity/userAssigned@2023-01-31' = {
name: 'id-rag-${uniqueString(resourceGroup().id)}'
location: location
}
// Key Vault for secrets
resource vault 'Microsoft.KeyVault/vaults@2023-07-01' = {
name: 'kv-rag-${uniqueString(resourceGroup().id)}'
location: location
properties: {
enableRbacAuthorization: true
enableSoftDelete: true
}
}
// AI Search with diagnostic settings
resource search 'Microsoft.Search/searchServices@2024-03-01-preview' = {
name: 'srch-rag-${uniqueString(resourceGroup().id)}'
location: location
sku: { name: 'standard' }
}
// Deploy: az deployment group create -g rg-rag -f main.bicep
// Reproducible, version-controlled, secure-by-default3. No Testing → FAI Evaluation Pipeline
The problem:AI outputs are never evaluated. No groundedness checks, no relevance scores, no regression detection. “Does it look right?” is the only QA.
Before — No Evaluation
# "Testing" the AI:
response = ask("What is our refund policy?")
print(response) # Read it and hope it looks correct
# No metrics, no thresholds, no CI/CD gates
# Hallucinations ship to production undetected
# Quality degrades silently after model updatesAfter — FAI Eval Pipeline
from azure.ai.evaluation import GroundednessEvaluator, RelevanceEvaluator
from azure.ai.evaluation import evaluate
# Define evaluators matching fai-manifest guardrails
evaluators = {
"groundedness": GroundednessEvaluator(model_config),
"relevance": RelevanceEvaluator(model_config),
}
# Run against test dataset
results = evaluate(
data="evaluation/test-cases.jsonl",
evaluators=evaluators,
target=rag_pipeline, # Your RAG function
)
# FAI guardrail gates (from fai-manifest.json)
thresholds = {"groundedness": 0.85, "relevance": 0.80}
for metric, threshold in thresholds.items():
score = results.metrics[f"{metric}.mean"]
status = "PASS" if score >= threshold else "FAIL"
print(f"{metric}: {score:.2f} (threshold: {threshold}) [{status}]")
if status == "FAIL":
raise SystemExit(f"Quality gate failed: {metric}")4. Scattered Config Files → FAI Manifest
The problem: Agent prompts, environment variables, model configs, and deployment scripts scattered across 15 files with no shared context. Agents duplicate rules, configs drift between environments.
Before — Scattered
.env # API keys (committed to git!)
agent-prompt.txt # System prompt (no WAF alignment)
config.yaml # Model settings (gpt-4, temp 0.7)
deploy.sh # Manual deployment script
requirements.txt # Dependencies (no lock file)
# Problems:
# - No shared context between components
# - Agent doesn't know about security rules
# - Config drifts between dev/staging/prod
# - No guardrail thresholds defined anywhereAfter — FAI Manifest
{
"play": "01-enterprise-rag",
"version": "1.0.0",
"context": {
"knowledge": ["F1", "F2", "R2", "O4"],
"waf": ["security", "reliability", "cost-optimization"]
},
"primitives": {
"agents": ["fai-rag-architect", "fai-azure-ai-search-expert"],
"instructions": ["python-waf", "bicep-waf"],
"skills": ["fai-play-initializer", "deploy-01-enterprise-rag"],
"hooks": ["fai-secrets-scanner", "fai-cost-guardian"]
},
"guardrails": {
"groundedness": 0.85,
"relevance": 0.80,
"coherence": 0.90,
"safety": 0.95
},
"infrastructure": {
"services": ["azure-openai", "azure-ai-search", "azure-cosmos-db"],
"compute": "azure-container-apps"
}
}
// One file wires: context + agents + instructions + skills
// + hooks + guardrails + infrastructure5. Manual Code Review → Builder/Reviewer/Tuner Chain
The problem: A single AI agent writes code with no quality checks. No WAF compliance review, no config validation, no evaluation gates. The code works but violates security policies and overspends on API calls.
Before — Single Agent
# User: "Build me a RAG pipeline"
# Agent: *generates code*
# User: *deploys to production*
# No security review → hardcoded secrets
# No cost review → gpt-4 for simple lookups ($$$)
# No eval review → hallucinations in production
# No WAF alignment → no retry, no monitoringAfter — FAI Agent Chain
# Step 1: @builder implements the RAG pipeline
# → Generates code using config/openai.json values
# → Follows python-waf.instructions.md automatically
# → Uses Managed Identity (no hardcoded secrets)
# Step 2: @reviewer checks the implementation
# → Scans for WAF compliance (6 pillars)
# → Verifies retry logic on external calls
# → Checks for secret leaks, missing error handling
# → Validates Azure best practices
# Step 3: @tuner optimizes for production
# → Validates config/openai.json thresholds
# → Runs evaluation against guardrail scores
# → Checks cost efficiency (model routing)
# → Confirms groundedness >= 0.85, safety >= 0.95
# Result: Production-ready code that passed 3 quality gatesTransformation Summary
| Scenario | Before | After (FAI) | WAF Pillars |
|---|---|---|---|
| RAG Pipeline | Hardcoded keys, no grounding | Managed Identity, hybrid search, guardrails | Security, Reliability |
| Deployment | Portal clicks, no IaC | Bicep modules, @secure params, diagnostics | Operational, Security |
| Testing | Manual “looks right” check | Eval pipeline with CI/CD quality gates | Responsible AI |
| Config | 15 scattered files, no shared context | One fai-manifest.json wires everything | All 6 Pillars |
| Review | Single agent, no checks | 3-agent chain with eval gates | All 6 Pillars |