Play 95

Multimodal Search Engine v2

Very High✅ Ready

Unified search across images, text, code, and audio with cross-modal reasoning.

Unified search across images, text, code, and audio with cross-modal reasoning. A user can search by uploading an image and get related code snippets, documentation, and audio explanations. Combines Azure AI Search vector indexes across modalities with GPT-4o for cross-modal synthesis.

Architecture Pattern

Cross-modal search: query decomposition - modality-specific indexing - vector fusion - cross-modal synthesis - relevance feedback

Azure Services

Azure AI SearchAzure AI VisionAzure AI SpeechAzure OpenAIAzure Container Apps

DevKit (.github Agentic OS)

agent.md — root orchestrator with builder→reviewer→tuner handoffs
3 agents — Multimodal Builder (gpt-4o), Reviewer (gpt-4o-mini), Tuner (gpt-4o-mini)
3 skills — deploy (254 lines), evaluate (101 lines), tune (227 lines)
4 prompts — /deploy, /test, /review, /evaluate with agent routing
.vscode/mcp.json — FrootAI MCP with OpenAI + AI Search inputs + envFile

TuneKit (AI Config)

config/openai.json - cross-modal synthesis and query expansion prompts
config/search.json - fusion weights, index configs, diversity scores
config/guardrails.json - relevance minimums, latency budgets
evaluation/eval.py - Cross-modal NDCG >0.75, Latency <500ms

Tuning Parameters

Cross-modal fusion weightsModality-specific index configResult diversity scoreQuery expansion depthRelevance feedback loop

Estimated Cost

Dev/Test

$120-300/mo

Production

$4K-12K/mo

User Guide Open in VS Code View on GitHub Setup Guide Configurator Ask Agent FAI Back to FrootAI