Voice search & voice chat
FrootAI.dev ships browser-native voice input across every search bar and chat surface. Tap the mic icon, speak naturally, and the engine turns your sentence into keywords that find the right primitive, play, or recipe. For Agent FAI it goes one step further: keywords are expanded into a grounded-context block that's injected into the LLM prompt, so the model answers with our SoT instead of guessing.
🔒 Privacy: all transcription happens in your browser via the Web Speech API. No audio bytes leave your device. We log only aggregate counters (mic on/off, transcript length bucket, language tag) via cookie-free Plausible analytics. See Data protection §2.5 for the full story.
§1 — Where the mic appears
| Surface | Auto-search on final | Continuous mode | Settings popover |
|---|---|---|---|
| Orchard search | ✅ | ❌ | ❌ |
| Primitives index | ✅ | ❌ | ❌ |
| Primitives / [category] | ✅ | ❌ | ❌ |
| Solution Plays | ✅ | ❌ | ❌ |
| Marketplace | ✅ | ❌ | ❌ |
| Registry-site | ✅ | ❌ | ❌ |
| Playground | ✅ | ❌ | ❌ |
| Workflows | ✅ | ❌ | ❌ |
| Cookbook | ✅ | ❌ | ❌ |
| Chatbot (Agent FAI full) | ✅ auto-send | ✅ | ✅ |
| AgentFaiWidget (mini chat) | ✅ auto-send | ✅ | ✅ |
The 9 catalog search bars use single-utterance mode in English by default — they're short-form lookup interfaces and the gear-icon settings popover would add visual noise. Both chat surfaces expose the gear icon next to the mic so you can pick a different language and flip hands-free mode on for a long brainstorming session.
§2 — Supported languages
The settings popover offers eleven BCP-47 language tags out of the box:
`en-US`, `en-GB`, `es-ES`, `fr-FR`, `de-DE`, `pt-BR`, `hi-IN`, `ja-JP`, `ko-KR`, `zh-CN`, `ar-SA`.
Your choice is saved in `localStorage` (`frootai-voice-prefs-v1`) so it persists across visits. The default on first load is whatever your browser reports via `navigator.language`.
Actual transcription quality depends on your browser engine — Chrome and Edge use Google's cloud recognition, Safari uses Apple's on-device model. We don't choose for you; we just hand the BCP-47 tag to the browser.
§3 — Hands-free (continuous) mode
By default the mic listens for one utterance, fires the final transcript on pause, and turns itself off. That's the right behaviour for one-shot questions.
For long brainstorming sessions, open the gear popover next to the mic and toggle Hands-free mode on. The recognition session stays open across pauses; each pause fires a fresh final transcript and Agent FAI replies. Tap the mic again to stop.
§4 — Browser support matrix
The Web Speech API is widely deployed but not universal. The mic button auto-hides itself entirely when the API is unavailable, so the search bar / chat input never shows a broken control.
| Browser | Status | Notes |
|---|---|---|
| Chrome desktop | ✅ full | Uses Google cloud recognition |
| Chrome Android | ✅ full | Uses Google cloud recognition |
| Edge desktop | ✅ full | Same engine as Chrome |
| Safari macOS 14.1+ | ✅ full | On-device recognition |
| Safari iOS 14.5+ | ✅ full | On-device recognition |
| Brave / Vivaldi / Arc | ✅ full | Chromium-based |
| Firefox desktop | ❌ off by default | Requires `media.webspeech.recognition.enable` flag in `about:config`. Until flipped, mic auto-hides. |
| Firefox Android | ❌ unsupported | Same as desktop |
| Tor Browser | ❌ unsupported | API blocked by privacy hardening |
| Older browsers | ❌ unsupported | Mic hides automatically |
You'll also need to grant the microphone permission the first time the page asks. If you previously denied it, the mic button will appear but tapping it will silently fail — re-enable it in your browser's site settings.
§5 — Agent FAI grounding (what makes voice answers smart)
When you speak (or type) a sentence at Agent FAI, we don't just send the raw text to the LLM. We run a deterministic grounding step first:
- Tokenize the sentence, strip voice fillers ("could", "please", "show me", "how do I", etc.).
- Expand the remaining tokens via our shared synonym pack — so "rag" pulls in "retrieval", "vector", "search"; "chatbot" pulls in "agent", "assistant"; "infra" pulls in "infrastructure", "bicep", "terraform".
- Run smartSearch across five primitive catalogs (agents, skills, instructions, hooks, plugins) using the same per-catalog presets the visual search uses.
- Inject the top matches as a `[GROUNDED CONTEXT]` block at the end of the user message before it goes to the LLM. The model sees the user's question plus the canonical names/descriptions of the primitives we already shipped that solve it.
- Show the extracted keywords + matched primitives as chips above the input, so you can see what we matched on (and click through directly to the primitive doc if you don't even need the LLM answer).
This is the maturity layer: voice + grounding turn a casual sentence into a query that lands on the right SoT entry, every time. See ground-query.ts for the source.
§6 — Privacy and analytics
We track four cookie-free, content-free events via Plausible:
- `voice_start` { surface, lang, continuous } — mic turned on
- `voice_final` { surface, transcript_length, lang } — final transcript received (NO content)
- `search_quality` { bucket, query_length } — applies to all searches, not just voice
- `search_no_results` { query_hint } — first two words only, lowercased, for documentation-gap analysis
No audio, no transcript content, no user identifier. The browser does the recognition; we only count that it happened. See Data protection §2.5 for the legal text.
§7 — Troubleshooting
The mic icon doesn't appear at all. Your browser doesn't expose the Web Speech API. On Firefox, enable the flag in `about:config` (see browser matrix above). Otherwise, switch to a Chromium-based browser or Safari.
The mic appears but tapping it does nothing. You probably denied microphone permission earlier. Re-enable it in your browser's site settings for frootai.dev, then reload.
The transcript is wrong / in the wrong language. Open the gear icon next to the mic and pick the right BCP-47 language tag. Your choice persists across sessions.
Agent FAI didn't use the grounded context. Look at the chips above the input — if they show keywords and primitives, the grounding block was injected. If the LLM ignored it anyway, that's a model limitation; rephrase more directly ("using <primitive name>") to nudge it.