FrootAI — AmpliFAI your AI Ecosystem Get Started

Voice search & voice chat

Voice search & voice chat

FrootAI.dev ships browser-native voice input across every search bar and chat surface. Tap the mic icon, speak naturally, and the engine turns your sentence into keywords that find the right primitive, play, or recipe. For Agent FAI it goes one step further: keywords are expanded into a grounded-context block that's injected into the LLM prompt, so the model answers with our SoT instead of guessing.

🔒 Privacy: all transcription happens in your browser via the Web Speech API. No audio bytes leave your device. We log only aggregate counters (mic on/off, transcript length bucket, language tag) via cookie-free Plausible analytics. See Data protection §2.5 for the full story.


§1 — Where the mic appears

SurfaceAuto-search on finalContinuous modeSettings popover
Orchard search
Primitives index
Primitives / [category]
Solution Plays
Marketplace
Registry-site
Playground
Workflows
Cookbook
Chatbot (Agent FAI full)✅ auto-send
AgentFaiWidget (mini chat)✅ auto-send

The 9 catalog search bars use single-utterance mode in English by default — they're short-form lookup interfaces and the gear-icon settings popover would add visual noise. Both chat surfaces expose the gear icon next to the mic so you can pick a different language and flip hands-free mode on for a long brainstorming session.


§2 — Supported languages

The settings popover offers eleven BCP-47 language tags out of the box:

`en-US`, `en-GB`, `es-ES`, `fr-FR`, `de-DE`, `pt-BR`, `hi-IN`, `ja-JP`, `ko-KR`, `zh-CN`, `ar-SA`.

Your choice is saved in `localStorage` (`frootai-voice-prefs-v1`) so it persists across visits. The default on first load is whatever your browser reports via `navigator.language`.

Actual transcription quality depends on your browser engine — Chrome and Edge use Google's cloud recognition, Safari uses Apple's on-device model. We don't choose for you; we just hand the BCP-47 tag to the browser.


§3 — Hands-free (continuous) mode

By default the mic listens for one utterance, fires the final transcript on pause, and turns itself off. That's the right behaviour for one-shot questions.

For long brainstorming sessions, open the gear popover next to the mic and toggle Hands-free mode on. The recognition session stays open across pauses; each pause fires a fresh final transcript and Agent FAI replies. Tap the mic again to stop.


§4 — Browser support matrix

The Web Speech API is widely deployed but not universal. The mic button auto-hides itself entirely when the API is unavailable, so the search bar / chat input never shows a broken control.

BrowserStatusNotes
Chrome desktop✅ fullUses Google cloud recognition
Chrome Android✅ fullUses Google cloud recognition
Edge desktop✅ fullSame engine as Chrome
Safari macOS 14.1+✅ fullOn-device recognition
Safari iOS 14.5+✅ fullOn-device recognition
Brave / Vivaldi / Arc✅ fullChromium-based
Firefox desktop❌ off by defaultRequires `media.webspeech.recognition.enable` flag in `about:config`. Until flipped, mic auto-hides.
Firefox Android❌ unsupportedSame as desktop
Tor Browser❌ unsupportedAPI blocked by privacy hardening
Older browsers❌ unsupportedMic hides automatically

You'll also need to grant the microphone permission the first time the page asks. If you previously denied it, the mic button will appear but tapping it will silently fail — re-enable it in your browser's site settings.


§5 — Agent FAI grounding (what makes voice answers smart)

When you speak (or type) a sentence at Agent FAI, we don't just send the raw text to the LLM. We run a deterministic grounding step first:

  1. Tokenize the sentence, strip voice fillers ("could", "please", "show me", "how do I", etc.).
  2. Expand the remaining tokens via our shared synonym pack — so "rag" pulls in "retrieval", "vector", "search"; "chatbot" pulls in "agent", "assistant"; "infra" pulls in "infrastructure", "bicep", "terraform".
  3. Run smartSearch across five primitive catalogs (agents, skills, instructions, hooks, plugins) using the same per-catalog presets the visual search uses.
  4. Inject the top matches as a `[GROUNDED CONTEXT]` block at the end of the user message before it goes to the LLM. The model sees the user's question plus the canonical names/descriptions of the primitives we already shipped that solve it.
  5. Show the extracted keywords + matched primitives as chips above the input, so you can see what we matched on (and click through directly to the primitive doc if you don't even need the LLM answer).

This is the maturity layer: voice + grounding turn a casual sentence into a query that lands on the right SoT entry, every time. See ground-query.ts for the source.


§6 — Privacy and analytics

We track four cookie-free, content-free events via Plausible:

  • `voice_start` { surface, lang, continuous } — mic turned on
  • `voice_final` { surface, transcript_length, lang } — final transcript received (NO content)
  • `search_quality` { bucket, query_length } — applies to all searches, not just voice
  • `search_no_results` { query_hint } — first two words only, lowercased, for documentation-gap analysis

No audio, no transcript content, no user identifier. The browser does the recognition; we only count that it happened. See Data protection §2.5 for the legal text.


§7 — Troubleshooting

The mic icon doesn't appear at all. Your browser doesn't expose the Web Speech API. On Firefox, enable the flag in `about:config` (see browser matrix above). Otherwise, switch to a Chromium-based browser or Safari.

The mic appears but tapping it does nothing. You probably denied microphone permission earlier. Re-enable it in your browser's site settings for frootai.dev, then reload.

The transcript is wrong / in the wrong language. Open the gear icon next to the mic and pick the right BCP-47 language tag. Your choice persists across sessions.

Agent FAI didn't use the grounded context. Look at the chips above the input — if they show keywords and primitives, the grounding block was injected. If the LLM ignored it anyway, that's a model limitation; rephrase more directly ("using <primitive name>") to nudge it.