Play 41
AI Red Teaming
High✅ Ready
Automated adversarial testing — prompt injection, jailbreak simulation, safety scorecards.
Automated adversarial testing of AI systems — prompt injection attacks, jailbreak simulations, harmful output detection, and compliance-ready safety scorecards for EU AI Act and NIST AI RMF. Uses Azure AI Foundry and Content Safety to systematically probe vulnerabilities. Produces audit-ready reports for enterprise governance teams and regulatory bodies.
Architecture Pattern
Adversarial agent orchestration with multi-vector attack simulation and safety scoring
Azure Services
Azure AI FoundryAzure Content SafetyAzure OpenAI (gpt-4o)Azure MonitorKey Vault
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Red Team Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (206 lines), evaluate (173 lines), tune (266 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with Content Safety + OpenAI key inputs + envFile
TuneKit (AI Config)
- config/openai.json — gpt-4o, temp=0.1, seed=42
- config/security.json — attack vectors, severity thresholds
- config/guardrails.json — content safety, PII redaction
- evaluation/eval.py — Attack coverage >90%, Detection accuracy >95%
Tuning Parameters
Attack diversitySeverity thresholdsJailbreak detection thresholdReport formatCompliance frameworks (EU AI Act, NIST)
Estimated Cost
Dev/Test
$80–150/mo
Production
$2K–5K/mo