Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt
namesummary

Overview of Unique FinanceGPT architecture and basic concepts.

...

Table of Contents
minLevel1
maxLevel6
outlinefalse
styledefault
typelist
printablefalse

...

Overview Architecture

Drawio
mVer2
zoom1
simple0
inComment0
custContentId744719394
pageId436535478
lbox1
diagramDisplayNameArchitecture

...

Diagram
contentVer9
hiResPreview1
revision9
baseUrlhttps://unique-ch.atlassian.net/wiki
diagramNameUntitled Diagram-1725246947769.drawio
pCenter1
aspectEaHfKQj33FmLmX9Ya9_p 1
width1150.5
links
tbstyle
height1020.5

Unique components

Ingestion Service

...

The ingestion service is responsible for taking in files from various sources, such as web pages, SharePoint, or Atlassian products, and bringing them into the system. It handles different file types, including PDFs, Word documents, Excel files, PowerPoint presentations, text files, CSV files, and Markdown files. The service converts these files into Markdown format, preserving the structure and extracting important information like titles, subtitles, and tables. It then creates chunks out of these documents and saves them into a vector database and a Postgres database. The ingestion service also handles scalability and performance, as it needs to be able to handle large volumes of documents being ingested into the system.

Panel
panelIconId2139
panelIcon:information_source:
panelIconTextℹ️
bgColor#E6FCFF

More details: Ingestion

Ingestion Workers

The ingestion workers are responsible for processing different types of files that are ingested into the system. They take in files such as PDFs, Word documents, Excel files, PowerPoint presentations, text files, CSV files, and Markdown files. Each type of file needs to be processed in a different way to extract the necessary information. For example, in the case of PDFs, the ingestion workers need to extract titles, subtitles, tables, and other relevant information. The workers convert the files into Markdown format, which helps preserve the structure of the document and allows the models to better understand and generate results based on the content. The ingestion workers also create chunks out of the documents, which are then saved into a vector database and a Postgres database. This process allows for efficient storage and retrieval of the ingested documents.

...

The theme service is responsible for allowing users to customize the appearance of the Unique financeGPT platform according to their branding preferences. It enables users to set their own colours, logos, and other visual elements to create a personalized and branded experience within the platform.

Panel
panelIconId2139
panelIcon:information_source:
panelIconTextℹ️
bgColor#E6FCFF

More details: Style Unique FinanceGPT to your Corporate Identity

Anonymization Service

Ensures data privacy by anonymizing sensitive information in user prompts and de-anonymizing model responses.

...

The benchmarking enables the client to test prompts on a large scale to ensure a high quality (accuracy) of the output (answers) by automatically comparing answers to the ground truth and creating a score using LLMs and vector distance as well as detections of hallucinations to make sure data and model drift is detected early on.

Panel
panelIconId2139
panelIcon:information_source:
panelIconTextℹ️
bgColor#E6FCFF

More details: Benchmarking

Analytics

Analytics reports (e.g., user engagement) are available via API or also via Unique UI.

Panel
panelIconId2139
panelIcon:information_source:
panelIconTextℹ️
bgColor#E6FCFF

More details:Analytics

Tokenizers

A tokenizer is a crucial component that processes input text to be understood by the model. It segments text into tokens, which can be words, subwords, or characters. Each token is then matched with a unique integer from a pre-established vocabulary on which the model was trained. For words not in the vocabulary, the tokenizer uses special strategies, such as breaking them down into known subwords or using a placeholder for unknown tokens. Additionally, tokenizers may encode extra information like text format and token positions to aid the FinanceGPT's comprehension. Once tokenized, the sequence of integers is ready for the model to process. After FinanceGPT generates its output, a reverse process, known as detokenization, is used to convert the token IDs back into readable text.

...

Unique offers an SDK specifically designed for FinanceGPT via an public API.

Video Explainer

https://youtu.be/nTB3I6QcUcY

Resources

...

Panel
panelIconId2139
panelIcon:information_source:
panelIconTextℹ️
bgColor#E6FCFF

Please read here more:

...

Software Development Kit (SDK)

Logs

Unique produces these types of logs:

  • Applicaiton logs (no CID data): Sent to stdout can be collected by log scrapers.

  • Auditlogs (includes prompts and CID data): Sent to encrypted and secured write only storage account. For compliance and investigation puprposes

  • DLP Logs (contains CID data): Available via API for Data Leakage pervention purposes and analysis.

  • Kubernetes logs: Available for collection via log scrapers

Monitoring:

  • Unique provides standart Prometheus metrics per service that can be collected.

...