Using Voice
- 1 Overview
- 1.1 Who it’s for
- 1.2 Benefits
- 1.3 Example Use cases
- 1.4 Step-by-Step Guide
- 2 Tips & Tricks
- 3 Limitations
Overview
Voice is offering users a seamless way to convert spoken audio into real-time transcriptions. This capability lets users speak their prompts instead of typing, enhancing accessibility and interaction within AI-powered experiences.
If you’re unable to access certain features or sections of this article, it’s possible that your firm doesn’t have access or hasn’t upgraded to the latest version. Please reach out to your internal support team for further assistance.
Who it’s for
Mobile users that are on-the-go
Users who prefer speaking over typing
Accessibility-conscious product teams
Enterprises needing secure and compliant transcription
Benefits
Speed & Efficiency
Speaking is often faster than typing, allowing users to input prompts more quickly — especially helpful for long or complex queries.
Hands-Free Operation
Ideal for multitasking or mobile workflows where typing may be impractical or disruptive.
Natural Language Input
Encourages more conversational, free-form prompts, which align well with how GPT-based systems interpret intent.
Accessibility
Enables users with motor impairments, temporary injuries, or vision difficulties to interact more easily with AI systems.
Reduced Cognitive Load
Users can focus on articulating their ideas instead of editing text as they type.
Example Use cases
Financial Analysts on the Move
Analysts commuting or walking between meetings can speak questions like “What’s the impact of today’s Fed announcement on bond yields?” into the prompt box without stopping to type.
Executive Summarization
A manager uses voice to say: “Summarize Q1 performance highlights from our internal reports and flag any anomalies,”.
Customer Service Triage
Agents can dictate incoming customer questions directly into the system — useful for hands-busy environments or for real-time escalation.
Want to contribute a use case? Contact us.
Is there functionality that could be useful to your business use case not covered? get in touch.
Step-by-Step Guide
Locate the Microphone Icon
Visit a chat in the app that includes a text input.
Look for the microphone icon on the right side of the input.
Click to Activate Voice Input
Click the microphone icon to begin recording.
You may be prompted to grant microphone access if it’s your first time using the feature.
Speak Your Prompt
Clearly speak your message or prompt.
You’ll see the transcribed text appear in the input field as you talk — in near real time1.
Privacy Note: Voice data is not stored. It is transcribed in real time and discarded immediately after transcription.
Stick to one language
Do not switch between languages during your single prompts as the models can not yet handle these transitions easily. Some anglicism might be okay but refrain from speaking successive sentences in different languages.
Finish and Submit
Once you’ve finished speaking, click the "Stop“ button to stop recording.
Note: The system does not automatically stop listening; it will continue to capture input until you manually stop.
Review Before Sending
After speaking, always check the transcribed text in the prompt box — you can edit anything before submitting it.
Review and optionally edit the transcribed text.
Press Enter or click “Send” to submit your prompt.
If no toggle is present, refer to the https://unique-ch.atlassian.net/wiki/spaces/PUBDOC/pages/1385071607 guide to setup the landscape correctly.
Documentation for Infrastructure is available at https://unique-ch.atlassian.net/wiki/spaces/PUBDOC/pages/1385367154.
Tips & Tricks
Tap, Speak, Send
Use the microphone icon to quickly input long or complex prompts hands-free. Especially useful for mobile workflows or when referencing multiple variables aloud.
Speak Naturally
The system is optimized for natural language. Don’t worry about being overly formal — just talk as if you're asking a colleague.
Pause for Clarity
Brief pauses help improve transcription accuracy and let the system segment your speech correctly (useful for multi-part prompts).
Combine With Keyboard if Necessary
Speak the main part of your prompt, then use the keyboard to fine-tune or add parameters (e.g., “... but exclude any data before 2020”).
If you don’t see the microphone toggle, please double check the following:
You’re using a supported browser.
You’ve granted microphone permissions.
Your device has a working microphone.
The feature has been setup successfully setup as outlined in https://unique-ch.atlassian.net/wiki/spaces/PUBDOC/pages/1385071607.
Limitations
We currently support four languages: English, German, French, and Italian. English and German work better, with fewer mistakes.
French and Italian have some challenges, like:
Emotion and Tone: Sometimes the feeling or tone isn't captured correctly.
Regional Dialects: Different accents can cause errors.
Symbols and Numbers: These can be tricky to transcribe.
Shouted or Slow Speech: Loud or slow talking might lead to mistakes.
Homophones and Ambiguity: Words that sound alike or unclear meanings can be confusing.
Context and Conversation: Understanding the flow of conversation can be difficult.
Our solution uses Microsoft’s Azure Text-to-Speech, so we're limited by what it can do. We'll update and improve as new features become available.
Real-time streaming is technically impossible due to the network latency and processing of the AI services. The average stream response time is between 1~5 seconds (5G, Wifi) depending on the network of the browser/user.
Author | @Dana Ritter |
|---|
© 2026 Unique AG. All rights reserved. Privacy Policy – Terms of Service