FAI Before/After Showcase

See the transformation: generic AI code → WAF-aligned, production-ready, FAI-wired solutions.

L9·14 min read·High

Why Before/After Matters

Most AI projects start with a quick prototype — a raw OpenAI call, a manual deployment, no evaluation. These prototypes work in demos but fail in production. They leak secrets, hallucinate without guardrails, cost 10x more than they should, and break under load.

The FAI Protocol transforms these prototypes into production-grade architectures by wiring in the Well-Architected Framework, automated evaluation, infrastructure-as-code, and the builder/reviewer/tuner quality chain. Below are 5 real transformations showing the gap between generic and FAI-aligned solutions.

1. Raw OpenAI Call → Guarded RAG Pipeline

The problem: A direct OpenAI call with no retrieval, no safety filters, hardcoded API keys, and no evaluation. It hallucinates freely and leaks credentials.

Before — Generic

app.py — Raw OpenAI

import openai

# Hardcoded API key (security vulnerability)
openai.api_key = "sk-abc123..."

def ask(question: str) -> str:
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": question}],
        temperature=1.0,  # Maximum randomness
    )
    return response.choices[0].message.content

# No retrieval, no grounding, no safety filters
# No retry logic, no cost tracking, no evaluation
print(ask("What is our refund policy?"))

After — FAI-Aligned

app.py — Guarded RAG Pipeline

from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from openai import AzureOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential

credential = DefaultAzureCredential()  # Managed Identity

search = SearchClient(
    endpoint=os.environ["SEARCH_ENDPOINT"],
    index_name="knowledge-base",
    credential=credential,
)

client = AzureOpenAI(
    azure_endpoint=os.environ["OPENAI_ENDPOINT"],
    azure_ad_token_provider=credential.get_token,
    api_version="2024-06-01",
)

@retry(stop=stop_after_attempt(3), wait=wait_exponential())
def ask(question: str) -> dict:
    # 1. Hybrid search with semantic reranker
    results = search.search(
        search_text=question,
        vector_queries=[get_embedding(question)],
        query_type="semantic",
        top=5,
    )
    context = "\n".join([r["content"] for r in results])

    # 2. Grounded completion with safety
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.1,  # Near-deterministic
        max_tokens=2000,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Context:\n{context}\n\nQ: {question}"},
        ],
    )
    # 3. Groundedness check (guardrail >= 0.85)
    return {"answer": response.choices[0].message.content, "sources": [...]}

2. Manual Deployment → Bicep Infrastructure-as-Code

The problem: Azure resources created via portal clicks. No reproducibility, no version control, secrets pasted into app settings, no diagnostic settings.

Before — Portal Clicks

Manual Steps (not reproducible)

1. Open Azure Portal
2. Create Resource Group (click, click, click...)
3. Create OpenAI resource (copy API key to notepad)
4. Create AI Search (paste key into app settings)
5. Create Container App (upload code via ZIP deploy)
6. Pray nothing breaks

# Problems:
# - No version control for infrastructure
# - Secrets in app settings (not Key Vault)
# - No diagnostic settings or alerts
# - Cannot reproduce in staging/prod

After — FAI Bicep

infra/main.bicep

@description('Deploy Enterprise RAG infrastructure')
param location string = resourceGroup().location
@secure()
param openAiEndpoint string

// Managed Identity — no API keys anywhere
resource identity 'Microsoft.ManagedIdentity/userAssigned@2023-01-31' = {
  name: 'id-rag-${uniqueString(resourceGroup().id)}'
  location: location
}

// Key Vault for secrets
resource vault 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: 'kv-rag-${uniqueString(resourceGroup().id)}'
  location: location
  properties: {
    enableRbacAuthorization: true
    enableSoftDelete: true
  }
}

// AI Search with diagnostic settings
resource search 'Microsoft.Search/searchServices@2024-03-01-preview' = {
  name: 'srch-rag-${uniqueString(resourceGroup().id)}'
  location: location
  sku: { name: 'standard' }
}

// Deploy: az deployment group create -g rg-rag -f main.bicep
// Reproducible, version-controlled, secure-by-default

3. No Testing → FAI Evaluation Pipeline

The problem:AI outputs are never evaluated. No groundedness checks, no relevance scores, no regression detection. “Does it look right?” is the only QA.

Before — No Evaluation

test.py — Manual Testing

# "Testing" the AI:
response = ask("What is our refund policy?")
print(response)  # Read it and hope it looks correct

# No metrics, no thresholds, no CI/CD gates
# Hallucinations ship to production undetected
# Quality degrades silently after model updates

After — FAI Eval Pipeline

evaluation/run_eval.py

from azure.ai.evaluation import GroundednessEvaluator, RelevanceEvaluator
from azure.ai.evaluation import evaluate

# Define evaluators matching fai-manifest guardrails
evaluators = {
    "groundedness": GroundednessEvaluator(model_config),
    "relevance": RelevanceEvaluator(model_config),
}

# Run against test dataset
results = evaluate(
    data="evaluation/test-cases.jsonl",
    evaluators=evaluators,
    target=rag_pipeline,  # Your RAG function
)

# FAI guardrail gates (from fai-manifest.json)
thresholds = {"groundedness": 0.85, "relevance": 0.80}
for metric, threshold in thresholds.items():
    score = results.metrics[f"{metric}.mean"]
    status = "PASS" if score >= threshold else "FAIL"
    print(f"{metric}: {score:.2f} (threshold: {threshold}) [{status}]")
    if status == "FAIL":
        raise SystemExit(f"Quality gate failed: {metric}")

4. Scattered Config Files → FAI Manifest

The problem: Agent prompts, environment variables, model configs, and deployment scripts scattered across 15 files with no shared context. Agents duplicate rules, configs drift between environments.

Before — Scattered

Disconnected Config Files

.env                  # API keys (committed to git!)
agent-prompt.txt      # System prompt (no WAF alignment)
config.yaml           # Model settings (gpt-4, temp 0.7)
deploy.sh             # Manual deployment script
requirements.txt      # Dependencies (no lock file)

# Problems:
# - No shared context between components
# - Agent doesn't know about security rules
# - Config drifts between dev/staging/prod
# - No guardrail thresholds defined anywhere

After — FAI Manifest

solution-plays/01-enterprise-rag/fai-manifest.json

{
  "play": "01-enterprise-rag",
  "version": "1.0.0",
  "context": {
    "knowledge": ["F1", "F2", "R2", "O4"],
    "waf": ["security", "reliability", "cost-optimization"]
  },
  "primitives": {
    "agents": ["fai-rag-architect", "fai-azure-ai-search-expert"],
    "instructions": ["python-waf", "bicep-waf"],
    "skills": ["fai-play-initializer", "deploy-01-enterprise-rag"],
    "hooks": ["fai-secrets-scanner", "fai-cost-guardian"]
  },
  "guardrails": {
    "groundedness": 0.85,
    "relevance": 0.80,
    "coherence": 0.90,
    "safety": 0.95
  },
  "infrastructure": {
    "services": ["azure-openai", "azure-ai-search", "azure-cosmos-db"],
    "compute": "azure-container-apps"
  }
}
// One file wires: context + agents + instructions + skills
// + hooks + guardrails + infrastructure

5. Manual Code Review → Builder/Reviewer/Tuner Chain

The problem: A single AI agent writes code with no quality checks. No WAF compliance review, no config validation, no evaluation gates. The code works but violates security policies and overspends on API calls.

Before — Single Agent

One Agent, No Review

# User: "Build me a RAG pipeline"
# Agent: *generates code*
# User: *deploys to production*

# No security review → hardcoded secrets
# No cost review → gpt-4 for simple lookups ($$$)
# No eval review → hallucinations in production
# No WAF alignment → no retry, no monitoring

After — FAI Agent Chain

Builder → Reviewer → Tuner

# Step 1: @builder implements the RAG pipeline
# → Generates code using config/openai.json values
# → Follows python-waf.instructions.md automatically
# → Uses Managed Identity (no hardcoded secrets)

# Step 2: @reviewer checks the implementation
# → Scans for WAF compliance (6 pillars)
# → Verifies retry logic on external calls
# → Checks for secret leaks, missing error handling
# → Validates Azure best practices

# Step 3: @tuner optimizes for production
# → Validates config/openai.json thresholds
# → Runs evaluation against guardrail scores
# → Checks cost efficiency (model routing)
# → Confirms groundedness >= 0.85, safety >= 0.95

# Result: Production-ready code that passed 3 quality gates

Transformation Summary

Scenario	Before	After (FAI)	WAF Pillars
RAG Pipeline	Hardcoded keys, no grounding	Managed Identity, hybrid search, guardrails	Security, Reliability
Deployment	Portal clicks, no IaC	Bicep modules, @secure params, diagnostics	Operational, Security
Testing	Manual “looks right” check	Eval pipeline with CI/CD quality gates	Responsible AI
Config	15 scattered files, no shared context	One fai-manifest.json wires everything	All 6 Pillars
Review	Single agent, no checks	3-agent chain with eval gates	All 6 Pillars