Voice Administration
This feature (even though implemented with generic web sockets) is only supported and available for clients with access to Azure AI Speech Services.
Overview
This page outlines the administration options for the Unique AI Voice feature.
Purpose and scope of feature
The feature documentation for end-users can be found at Voice.
Dependencies or integrations
Documentation for architecture is available at Voice Infrastructure.
Configuration Guide
Head over to Voice Infrastructure to learn how to deploy and enable the feature for your Unique AI.
User Management
Permissions and roles
There are no specific Unique AI permissions needed.
End Users see the toggle as soon as the infrastructure is provisioned and wired up correctly.
Technical administrators currently have no options to configure the feature, it becomes available within all spaces once enabled.
Security Considerations
Private networking
Depending on the Voice Infrastructureand the design choices made there, the voice stream either gets routed (or even hairpin routed) over the internet or at least the Azure back bone.
To ensure encapsulation within your network, refer to the Voice Infrastructure guidelines.
Limitations
Some voices or audio inputs may not be recognized correctly by Microsoft STT due to frequency range limitations. If your voice is unusually high or low in pitch, recognition accuracy may be reduced. Check Voice | Tips & Tricks to improve the recognition.
Sporadically on iPhone™ the voice is not recognized due to the microphone being denied access or not being accessible properly. Please get in touch with your Unique SPOC providing as much device configuration settings as you can gather (iOS Version, Screenshot of Microphone privacy settings, iOS Device Model, etc.)
Occasionally, the system may not detect the spoken language correctly, especially if you switch between multiple languages during a session or conversation. Check Voice | Tips & Tricks.
There may be a limit to the duration of continuous speech that can be processed in a single recognition session. Long, uninterrupted speech may be truncated or fail to process completely. Exact limits depend on Microsoft’s STT API and can be checked in their documentation.
System may struggle to differentiate between multiple speakers or voices in a recording.
Accents and Dialects, different accents and dialects can be challenging for systems to accurately transcribe, especially for non-native speakers.
Noise and Audio Quality, system might struggles with background noise, overlapping speech, or low-quality microphones, that could lead to transcription errors.
Author | PTFCHAT |
---|
© 2025 Unique AG. All rights reserved. Privacy Policy – Terms of Service