Voice Feature - Architecture

Voice Feature - Architecture

 

 

 

This article outlines the streamlined architecture of a voice feature that integrates Microsoft Speech-to-Text services, emphasizing secure and efficient communication without introducing new protocols or security requirements.

System Architecture

  • User Interaction: The user speaks into the microphone, and the browser records the audio natively.

  • Communication Flow:

    • The browser transmits the audio data via HTTPS and WebSocket to the backend.

    • This process uses existing protocols, ensuring no new communication patterns are introduced.

    • Websockets over Https are already in use in the browser for streaming the LLM answers and are no new requirements.

  • Backend Integration:

    • The backend connects to Microsoft Speech-to-Text services for transcription. It connects through workload identity as the other Microsoft Services do like OpenAI.

    • The service is deployed internally within the Azure subscription, maintaining security and efficiency.

  • Security and Protocols:

    • No new ports need to be opened, and security requirements remain unchanged.

    • WebSocket is already utilized for communication with large language models, ensuring consistency.

Whats new:

  • deployment of Text To Speech of Microsoft.

  • Inside of Uniques backend a new Microservice handles the communication.

What not changing:

  • No new ports or cummunication patterns than we already use.

  • No internal services of Microsoft are directly exposed.

 


Author

@Andreas Hauri

 

 

© 2025 Unique AG. All rights reserved. Privacy PolicyTerms of Service