Skip to main content

📊 Agent Evaluation Dashboard

Quality scores, evaluation thresholds, and WAF alignment status per solution play.

🎯 Quality Metrics

4
Groundedness
Are responses grounded in provided context?
4
Relevance
Do responses address the user's query?
4
Coherence
Are responses logically consistent?
4
Fluency
Is the language natural and readable?
4.5
Safety
Are responses free of harmful content?
2
Latency
Response time in seconds (lower is better)

📋 Play Evaluation Status

PlayGroundednessRelevanceCoherenceFluencyStatus
01Enterprise RAG4.24.14.34.5✅ Evaluated
02AI Landing ZoneN/AN/AN/AN/A✅ Evaluated
03Deterministic Agent4.54.34.64.4✅ Evaluated
04 — Play 4⏳ Skeleton
05 — Play 5⏳ Skeleton
06 — Play 6⏳ Skeleton
07 — Play 7⏳ Skeleton
08 — Play 8⏳ Skeleton
09 — Play 9⏳ Skeleton
10 — Play 10⏳ Skeleton
11 — Play 11⏳ Skeleton
12 — Play 12⏳ Skeleton
13 — Play 13⏳ Skeleton
14 — Play 14⏳ Skeleton
15 — Play 15⏳ Skeleton
16 — Play 16⏳ Skeleton
17 — Play 17⏳ Skeleton
18 — Play 18⏳ Skeleton
19 — Play 19⏳ Skeleton
20 — Play 20⏳ Skeleton

🔧 How to Run Evaluations

1. CLI

npx frootai validate --waf

WAF scorecard: 6 pillars, 17 checks

2. VS Code

Ctrl+Shift+P → FrootAI: Run Evaluation

Visual dashboard in VS Code panel

3. Python

python evaluation/eval.py

Score against golden dataset

Solution Plays →CLI Docs →Ecosystem →