Play 23
Browser Automation
High✅ Ready
AI-driven web navigation using vision + Playwright MCP.
Uses AI to navigate websites, fill forms, extract data, take screenshots, and execute multi-step web workflows — entirely driven by natural language instructions. Combines Playwright MCP Server for browser control (navigate, click, type, screenshot), GPT-4o Vision for understanding page content and making navigation decisions, and structured task planning for breaking complex web tasks into executable steps. Domain allowlist prevents arbitrary browsing.
Architecture Pattern
Browser automation: vision model + Playwright, task planning, domain-restricted
Azure Services
Azure OpenAI (gpt-4o Vision)Container AppsPlaywright MCP Server
DevKit (.github Agentic OS)
- agent.md — root orchestrator with builder→reviewer→tuner handoffs
- 3 agents — Browser Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
- 3 skills — deploy (102 lines), evaluate (100 lines), tune (103 lines)
- 4 prompts — /deploy, /test, /review, /evaluate with agent routing
- .vscode/mcp.json — FrootAI MCP with OpenAI + target URL inputs + envFile
TuneKit (AI Config)
- config/openai.json — gpt-4o vision model, temp=0.1
- config/browser.json — domain allowlist, timeouts, viewport config
- config/guardrails.json — no credential entry, screenshot PII redaction
- evaluation/eval.py — Task completion >85%, Error rate <10%
Tuning Parameters
Domain allowlistVision prompts for page understandingAction timeout per stepRetry config on navigation failureMax navigation depthScreenshot resolution
Estimated Cost
Dev/Test
$100–200/mo
Production
$1K–3K/mo