FrootAI — AmpliFAI your AI Ecosystem Get Started

Skip to lab content

Lab

/lab

Research-quality data from a production engine. Reproducible. Cite-able. Every number is verifiable — every benchmark ships a runnable script.

3 Benchmarks3 Datasets5 Experiments

AVM vs Hand-Authored Cost

Monthly cost delta across 100 enterprise workloads — AVM-composed Bicep vs hand-authored.

100 workloadsRead study →

Carbon per Region

Carbon footprint across 200 AVM compositions by Azure region — grid-intensity aware.

200 compositionsRead study →

Pipeline Speed

Harvest pipeline throughput: Node vs Python vs azd template ingest per language.

Featured Benchmarks

Comparison studies with reproducible scripts. Every number is verifiable.

bar

AVM-Composed vs Hand-Authored Bicep: Cost Across 100 Workloads

Monthly Azure cost comparison between AVM-composed infrastructure and hand-authored Bicep templates across 100 enterprise workloads. AVM compositions average 12% lower cost due to right-sized SKU selection.

Avg. savings: 12%Read full study
bar

Carbon Footprint per Region Across 200 AVM Compositions

Grid-intensity-aware carbon estimates for 200 AVM-composed templates deployed across 12 Azure regions. Sweden Central and France Central show 60-70% lower emissions than US East due to renewable grid mix.

Lowest region: Sweden CentralRead full study
horizontal-bar

Harvest Pipeline Speed: Node vs Python vs azd Ingest

End-to-end throughput of the 7-stage harvest pipeline across 50 repos. Node implementation processes S1-S7 in 4.2s median; Python in 6.8s; azd template baseline in 12.1s.

Fastest: Node 4.2sRead full study

Public Datasets

Downloadable Parquet datasets — CC0-licensed, schema-documented, weekly-refreshed.

Harvested Plays Catalog

Every harvested Solution Play with provenance fields — source repo, pipeline version, confidence scores, LLM steps, composition metadata. Updated weekly as new plays are harvested.

10 rows · Updated

0 downloads · CC0-1.0

AVM Module Taxonomy

Full Azure Verified Module catalog with classification, WAF pillar coverage flags, cost-band estimates, and popularity ranking. Covers both Bicep and Terraform modules. Weekly-refreshed from upstream registries.

20 rows · Updated

0 downloads · CC0-1.0

WAF Pillar Results

Per-play WAF compliance scores across all 5 Well-Architected Framework pillars plus per-CAF-domain scores. Nullable scores where zero applicable checks exist. Weekly-refreshed.

20 rows · Updated

0 downloads · CC0-1.0

Latest Experiments

Research-quality posts from the engine team. Each tagged shipped / archived / referenced.

AVM Resolver Confidence Calibration: How We Tuned the V2 Module Scorer

shipped

The V2 Resolver assigns a confidence score to each candidate AVM module. This experiment documents how we calibrated the scorer against 500 hand-labeled module selections to achieve 92% top-1 accuracy.

View source

LLM Extraction Prompt Ablation: Which System Message Fields Matter Most?

shipped

We ablated the S3 Extract system message — removing one section at a time — to measure which prompt components contribute most to RepoFacts accuracy. domain and services fields drop accuracy by 15-20% when removed.

View source

Retrieval NDCG: text-embedding-3-large vs ada-002 vs E5-large

shipped

Compared three embedding models on the S4 Retrieve stage across 50 repos. text-embedding-3-large achieves NDCG@5 = 0.84, 12% higher than ada-002 (0.75) and 8% higher than E5-large (0.78).

View source

Policy Overlay Strictness vs Deployment Success Rate

shipped

Tested 5 strictness levels of the S7 policy overlay across 100 workloads. 'Strict' mode blocks 18% of deployments but catches 100% of non-compliant SKUs. 'Permissive' mode blocks 2% but misses 35% of violations.

View source

WAF Check Coverage Gap Analysis: Which Pillars Have Blind Spots?

referenced

Mapped all 80+ WAF checks against the Azure Well-Architected review criteria. Found 12 criteria with zero automated checks — all in the Cost Optimization pillar. Prioritized for V1.20+ implementation.

View source

Experiment Status

Per §0.5 principle #8: every experiment is tagged shipped, archived, or referenced from production. Quarterly review culls stale entries.

Shipped4

Experiment results integrated into production engine.

Referenced1

Cited by production code or external publications.

Archived0

Superseded or no longer applicable. Retained for reproducibility.

Quarterly Review History

QuarterBeforeCulledArchivedPromotedCull Rate
2026-Q22026-06-1450000%

Latest: Initial lab launch — all experiments are new. No culls needed.

Quarterly review policy: Every quarter, all experiments are reviewed. Stale experiments with no production reference are moved to archived. This prevents sandbox-rot — per §0.5 principle #8. Next review scheduled for W11.28.