Voice Feature - Architecture
This article outlines the streamlined architecture of a voice feature that integrates Microsoft Speech-to-Text services, emphasizing secure and efficient communication without introducing new protocols or security requirements.
System Architecture
User Interaction: The user speaks into the microphone, and the browser records the audio natively.
Communication Flow:
The browser transmits the audio data via HTTPS and WebSocket to the backend.
This process uses existing protocols, ensuring no new communication patterns are introduced.
Websockets over Https are already in use in the browser for streaming the LLM answers and are no new requirements.
Backend Integration:
The backend connects to Microsoft Speech-to-Text services for transcription. It connects through workload identity as the other Microsoft Services do like OpenAI.
The service is deployed internally within the Azure subscription, maintaining security and efficiency.
Security and Protocols:
No new ports need to be opened, and security requirements remain unchanged.
WebSocket is already utilized for communication with large language models, ensuring consistency.
Whats new:
deployment of Text To Speech of Microsoft.
Inside of Uniques backend a new Microservice handles the communication.
What not changing:
No new ports or cummunication patterns than we already use.
No internal services of Microsoft are directly exposed.
Author | @Andreas Hauri |
---|
© 2025 Unique AG. All rights reserved. Privacy Policy – Terms of Service