Play 23

Browser Automation

High✅ Ready

AI-driven web navigation using vision + Playwright MCP.

Uses AI to navigate websites, fill forms, extract data, take screenshots, and execute multi-step web workflows — entirely driven by natural language instructions. Combines Playwright MCP Server for browser control (navigate, click, type, screenshot), GPT-4o Vision for understanding page content and making navigation decisions, and structured task planning for breaking complex web tasks into executable steps. Domain allowlist prevents arbitrary browsing.

Architecture Pattern

Browser automation: vision model + Playwright, task planning, domain-restricted

Azure Services

Azure OpenAI (gpt-4o Vision)Container AppsPlaywright MCP Server

DevKit (.github Agentic OS)

agent.md — root orchestrator with builder→reviewer→tuner handoffs
3 agents — Browser Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
3 skills — deploy (102 lines), evaluate (100 lines), tune (103 lines)
4 prompts — /deploy, /test, /review, /evaluate with agent routing
.vscode/mcp.json — FrootAI MCP with OpenAI + target URL inputs + envFile

TuneKit (AI Config)

config/openai.json — gpt-4o vision model, temp=0.1
config/browser.json — domain allowlist, timeouts, viewport config
config/guardrails.json — no credential entry, screenshot PII redaction
evaluation/eval.py — Task completion >85%, Error rate <10%

Tuning Parameters

Domain allowlistVision prompts for page understandingAction timeout per stepRetry config on navigation failureMax navigation depthScreenshot resolution

Estimated Cost

Dev/Test

$100–200/mo

Production

$1K–3K/mo

User Guide Open in VS Code View on GitHub Setup Guide Configurator Ask Agent FAI Back to FrootAI