Overview of Unique FinanceGPT architecture and basic concepts.

Overview Architecture

Unique components

Ingestion Service

Responsible for taking in files from various sources and converting them into Markdown format for storage and retrieval.

The ingestion service is responsible for taking in files from various sources, such as web pages, SharePoint, or Atlassian products, and bringing them into the system. It handles different file types, including PDFs, Word documents, Excel files, PowerPoint presentations, text files, CSV files, and Markdown files. The service converts these files into Markdown format, preserving the structure and extracting important information like titles, subtitles, and tables. It then creates chunks out of these documents and saves them into a vector database and a Postgres database. The ingestion service also handles scalability and performance, as it needs to be able to handle large volumes of documents being ingested into the system.

More details: Ingestion

Ingestion Workers

The ingestion workers are responsible for processing different types of files that are ingested into the system. They take in files such as PDFs, Word documents, Excel files, PowerPoint presentations, text files, CSV files, and Markdown files. Each type of file needs to be processed in a different way to extract the necessary information. For example, in the case of PDFs, the ingestion workers need to extract titles, subtitles, tables, and other relevant information. The workers convert the files into Markdown format, which helps preserve the structure of the document and allows the models to better understand and generate results based on the content. The ingestion workers also create chunks out of the documents, which are then saved into a vector database and a Postgres database. This process allows for efficient storage and retrieval of the ingested documents.

Postgres and VectorDB

In the Unique Finance GPT platform, Postgres and VectorDB are used to store and retrieve data. Postgres is used to store the markdown and metadata of the ingested documents. It is also used to save the history of user interactions and chat logs. The metadata stored in Postgres includes information about the scope of access to documents, allowing for access control based on user roles and permissions.

On the other hand, VectorDB is used to store the embeddings of the documents. Embeddings are numerical representations of the documents that capture their semantic meaning. By storing the embeddings in VectorDB, the system can perform efficient vector searches to retrieve relevant documents based on user queries. The vector search is combined with full-text search to provide high-quality search results.

Both Postgres and VectorDB play crucial roles in the architecture of the Unique Finance GPT platform, enabling secure storage and retrieval of documents and supporting various functionalities such as access control, search, and auditability.

Org Structure Service

Handles the organizational structure of an enterprise, allowing for the creation of departments, sub-departments, and user groups for access control.

The Org structure service in the Unique financeGPT platform is responsible for creating and managing the organizational structure within an enterprise. It allows the platform to mimic the departments and access rights structure of the company. This service ensures that only authorized users have access to specific documents based on their roles and permissions within the organization. It helps define the scope of access for each user and enables better quality control and data security within the platform.

Chat Service

Enables users to interact with the system and chat against ingested documents, supporting login, chat history storage, and chat assistants.

The chat service in the Unique financeGPT platform allows users to interact with the system by asking questions or making requests. It provides a chat interface where users can input their queries and receive responses from the system. The chat service also handles the retrieval of relevant documents from the knowledge center based on the user's query and presents the information in a conversational format. Additionally, the chat service keeps track of the chat history, including the prompts, responses, and any streamed information, and saves this data for auditing purposes. It also handles the theming of the chat interface to match the branding and colours of the organization using the platform.

Theme Service

Allows users to customize the appearance and branding of the chat interface to align with their organization's identity.

The theme service is responsible for allowing users to customize the appearance of the Unique financeGPT platform according to their branding preferences. It enables users to set their own colours, logos, and other visual elements to create a personalized and branded experience within the platform.

More details: Style Unique FinanceGPT to your Corporate Identity

Anonymization Service

Ensures data privacy by anonymizing sensitive information in user prompts and de-anonymizing model responses.

The anonymization service in the Unique FinanceGPT platform is responsible for ensuring that sensitive information, such as customer identification data (CID), is not presented to the models during chat interactions. It works by taking the user's prompt, which may contain CID data or other sensitive tokens, and replacing those tokens with anonymized placeholders. The service then sends the anonymized prompt for processing, ensuring that the models do not have access to the original sensitive information. Once the models generate a response, the anonymization service replaces the anonymized tokens with the original sensitive tokens, allowing the user to receive the response without any impact on the anonymization process. This ensures that the chat can be conducted securely while protecting sensitive data.

Models

The Unique FinanceGPT platform can connect and use different models. It is capable of using models like GPT-4, GPT-3.5 Turbo, LLaMA, and even open-source or custom models. The connection to the models is done by sending the user's prompt, along with the relevant documents, to the chosen model. The system allows for flexibility in selecting the appropriate model for each assistant or prompt. It is also mentioned that multiple instances of the same model can be used to increase throughput and handle rate limits, even in different data centers. The system ensures a good user experience by automatically retrying if there are any issues with the models, using exponential backoffs.

Embeddings

Any embedding model can be used usually Ada of OpenAI is used in the standard but it is not mandatory.

Audit Logs

Maintains comprehensive logs of system interactions, API calls, and user activities for security, accountability, and compliance purposes.

The audit log service is responsible for recording and storing all the relevant information about the interactions and activities that occur within the Unique FinanceGPT platform. It captures and logs various events such as API calls, responses, WebSocket streams, and other relevant information. The audit logs are written into a secure container that can only be accessed by authorized auditors. These logs provide a detailed record of the system's activities, allowing for security monitoring, compliance, and auditing purposes.

Benchmarking (Prompt Testing)

The benchmarking enables the client to test prompts on a large scale to ensure a high quality (accuracy) of the output (answers) by automatically comparing answers to the ground truth and creating a score using LLMs and vector distance as well as detections of hallucinations to make sure data and model drift is detected early on.

More details: Benchmarking

Analytics

Analytics reports (e.g., user engagement) are available via API or also via Unique UI.

More details: Analytics

Tokenizers

A tokenizer is a crucial component that processes input text to be understood by the model. It segments text into tokens, which can be words, subwords, or characters. Each token is then matched with a unique integer from a pre-established vocabulary on which the model was trained. For words not in the vocabulary, the tokenizer uses special strategies, such as breaking them down into known subwords or using a placeholder for unknown tokens. Additionally, tokenizers may encode extra information like text format and token positions to aid the FinanceGPT's comprehension. Once tokenized, the sequence of integers is ready for the model to process. After FinanceGPT generates its output, a reverse process, known as detokenization, is used to convert the token IDs back into readable text.

Embedded data pipelines

This is a streamlined processes integrated within FinacneGPTs architecture that facilitate the seamless transformation of raw data into actionable insights. These pipelines are carefully designed to preprocess input text, manage data flow through the model's layers, and post-process the output to generate coherent and contextually appropriate responses. The pipelines handle tasks such as tokenization, embedding, attention mechanism management, etc.

Model Pre-training

FinanceGPT was pre-trained phase where it is trained on a massive dataset of text or code. This extensive exposure allows the model to learn a wide range of linguistic patterns, syntactic structures, and semantic relationships. During pre-training, BookGPT identifies and extracts meaningful features from the text, developing an internal representation of language that captures the intricate relationships between words, phrases, and sentences. This comprehensive understanding forms the foundation for downstream tasks, enabling BookGPT to transfer its knowledge effectively to new domains and applications, enhancing its versatility and performance in various contexts.

Fine-tuning

This feature is currently only available for the On Premise Tenant deployment model.

FinanceGPT allows for further training on a specific dataset to adapt its knowledge and improve its performance on tasks relevant to that dataset. By fine-tuning FinanceGPT on a dataset that includes bilingual or multilingual financial texts, the model learns to translate domain-specific vocabulary more accurately. Furthermore, financial language is often nuanced and context-dependent. Fine-tuning helps the model grasp these subtleties in different languages, improving the quality of translation. Lastly, financial terms can have different meanings in different contexts. Fine-tuning on context-rich examples helps the model disambiguate terms more effectively during translation.

Fine-tuning shows significant improvements in RAG by honing the model's ability to fetch and integrate more accurate and contextually relevant data into its responses.

Unique can provide a dedicated API that allows developers from our clients to customize FinanceGPT for their specific tasks or datasets.

Training Playground

Unique offers clients the possibility of interactive environments or platforms designed to help users experiment with, train, and fine-tune AI models, including experimental models, without needing deep technical expertise in machine learning. These playgrounds provide user-friendly interfaces and tools to facilitate various stages of model development, from data preparation to model deployment. Additionally, they allow users to test different model architectures, tweak training parameters, and visualize performance metrics, making it easy to iterate and improve experimental models effectively. These environments also include evaluation tools to assess model performance and ensure the models meet the customers desired criteria before deployment. Furthermore, an integrated experiment registry helps track, organize, and manage all experiments, to enhance reproducibility, collaboration, and efficiency in the model development process.

GenAI SDK

Unique offers an SDK specifically designed for FinanceGPT via an open API.

Please read here more: https://unique-ch.atlassian.net/wiki/x/-wYFGg

Remediations

Model Remediation

Unique is constantly trying to make FinanceGPT as holistic and accurate as possible. With a robust model remediation process, we can adjust the model based on the output of its training.

Issue Identification: The first step involves detecting problems with the model's outputs, such as biases, inaccuracies, generation of harmful content, or lack of adherence to ethical guidelines
Root Cause Analysis: Once issues are identified, it's important to understand why they are occurring. This could be due to biases in the training data, flaws in the model architecture, or limitations in the training process
Model Retraining: If the underlying cause is related to the data the model was trained on, retraining the model with remediated or augmented datasets can help correct the issues
Architecture Adjustments: Sometimes, the architecture of FinanceGPT itself may need to be modified. This could involve changing the neural network layers, activation functions, or other architectural elements to improve the model's performance
Fine-tuning: In some cases, further fine-tuning of the model on curated datasets helps to correct specific issues without a full retraining cycle
Hyperparameter Optimization: Adjusting the hyperparameters, such as learning rate, batch size, or regularization terms, sometimes mitigates issues by altering the learning process
Reinforcement Learning from Human Feedback (RLHF): Feedback from our client's experience of using FinanceGPT is highly valuable to guide the model towards more desirable outputs. This can be an effective method for teaching the model the nuances of what is considered appropriate or high-quality content

Data Remediation

The quality and integrity of the data FinanceGPT is trained on is crucial for its performance. If the training data contains errors, biases, or problematic content, these issues can be reflected in the model's output. Especially for FSI clients, it is very important to mitigate these as much as possible.

We have, therefore, implemented steps to counter this:

Data Cleaning: Identifying and correcting errors or inconsistencies in the data, such as typos, formatting issues, and incorrect labels.
Bias Mitigation: Detecting and addressing biases in the data to prevent the model from perpetuating or amplifying these biases in its responses
Data Filtering: Removing low-quality, irrelevant, or harmful data from the training set to ensure the model learns from high-quality information
Data Enrichment: Easily adding new, high-quality data to the training set to improve the model's knowledge and capabilities, particularly in underrepresented domains
Data Anonymization: Ensuring that personally identifiable information (PII) is removed or obscured to protect privacy and comply with regulations
Data Validation: Using various techniques to validate the accuracy and relevance of the data before it's used in training (see also: https://unique-ch.atlassian.net/wiki/x/SQEFGg )

Prompt Remediation

The quality of output of FinanceGPT is heavily influenced by the input. In general, high-quality prompts lead to a significant increase in the performance output.

Clarifying Ambiguity: Rewriting prompts to be more specific and clear to reduce ambiguity and help the model generate more precise responses.
Reducing Bias: Adjusting prompts to minimize the potential for biased outputs, which may involve rephrasing or providing a more balanced context.
Ensuring Context: Providing sufficient background information in the prompt to guide the model in understanding the context and generating relevant responses.
Guiding Content: Including instructions or constraints within the prompt to direct the model away from generating undesirable content and towards producing outputs that adhere to ethical guidelines or content policies.
Iterative Refinement: Using an iterative approach where the initial output is assessed, and the prompt is refined based on the model's response to improve the result.
Incorporating User Feedback: Using feedback from users to understand how prompts can be misunderstood or misinterpreted by the model, and adjusting them accordingly.
Chain of Thought Prompting: Crafting prompts that include an explicit reasoning process, which can help the model generate more thoughtful and detailed responses.

Adversarial Remediation

We try to identify and mitigate weaknesses or vulnerabilities that could be exploited by adversaries as much as possible. By strengthening FinanceGPT against adversarial attacks, which are techniques that attempt to fool machine learning models with deceptive input we can mitigate undesired or harmful output.

Adversarial Attack Identification: The first step is to identify potential adversarial attacks. This involves understanding how an adversarial example or input could be crafted to mislead the AI model
Model Assessment: FinanceGPT is tested using these adversarial examples to see if it can be tricked into making incorrect predictions or classifications
Vulnerability Analysis: If the model is found to be vulnerable, the specific weaknesses that allowed the adversarial attack to succeed are analyzed
Remediation Strategies: We dynamically adjust our strategy depending on the attack. This could involve retraining the model with a more diverse dataset that includes adversarial examples, altering the model architecture, applying defensive techniques like adversarial training, or implementing input validation mechanisms to detect and reject adversarial inputs.
Implementation: The chosen remediation strategies are implemented to enhance the model's robustness against adversarial attacks.
Continuous Monitoring and Testing: The model is continuously monitored and tested to ensure that the remediation is effective and to identify any new potential vulnerabilities.

Prompt Injection Remediation

Prompt injection attacks occur when a user intentionally crafts an input (or prompt) to manipulate the AI into performing actions or generating responses that it shouldn't, such as revealing sensitive information, generating harmful content, or behaving in unintended ways.

The remediation process for prompt injection typically involves several key steps:

Detection: The first step is to detect potential prompt injections. This can be done through monitoring for unusual patterns in user inputs, such as inputs that contain certain command-like structures or attempts to directly interface with the underlying model in an unauthorized way.

Analysis: Once a potential prompt injection is detected, the input is analyzed to understand how it is trying to manipulate the model. This might involve looking for hidden commands, unusual syntax, or other indicators that the input is not a standard user query.
Response Rules: Based on the analysis, rules or patterns are created to identify and block similar prompt injection attempts in the future. This could involve creating a list of banned phrases, patterns, or implementing more sophisticated natural language understanding algorithms to detect when the model is being prompted inappropriately.
Model Training: The language model may be retrained or fine-tuned to recognize and resist prompt injection attempts. This training can include exposure to adversarial examples so that the model learns to respond to them appropriately.
Policy Enforcement: Implementing and enforcing policies that define acceptable and unacceptable uses of the AI system can also help mitigate prompt injection attacks. Users can be informed of these policies, and actions can be taken against those who violate them.
User Input Sanitization: Before processing user inputs, they can be sanitized to remove or neutralize elements that could lead to prompt injection. This might involve stripping out certain characters, using input validation techniques, or rephrasing inputs in a way that reduces the risk of manipulation.
Continuous Improvement: As attackers develop new methods, we constantly monitor relevant security news to stay on top of new attack vectors.

Pre-Prod Fine-Tuning clusters

FinanceGPT leverages a dedicated AI cluster to enhance its performance and efficiency. The cluster's powerful computational resources allow FinanceGPT to undergo fine-tuning on a dedicated dataset of relevant documents, such as adapting its pre-trained algorithms to the nuances of financial language and concepts. This process is accelerated by the cluster's ability to parallelize tasks and handle extensive hyperparameter optimization, ensuring that FinanceGPT not only retains its general linguistic capabilities but also excels in generating industry-specific insights. The AI cluster further supports rapid iteration and robust evaluation, enabling FinanceGPT to evolve continuously through data-driven refinements and ultimately providing a reliable and high-performing model ready for deployment in various financial applications.

Pre-Prod Inference clusters

FinanceGPT capitalizes on the power of inference clusters to deliver real-time financial insights and analysis with exceptional speed and accuracy. By harnessing the distributed computing capabilities of inference clusters, FinanceGPT can efficiently process vast streams of financial data, offering instant responses to complex queries. This setup ensures that as FinanceGPT interprets market trends and generates forecasts, it scales seamlessly to meet the demands of multiple users simultaneously, making it an invaluable tool for financial professionals seeking a competitive edge in a data-driven industry.

Failover Inference Clusters

Our clusters are designed to ensure a high availability and robustness of the AI services we provide. They are crucial in our environments where continuous operation is essential like the financial services industry. We have therefore mechanisms in place to prevent major system outages like whole system redundancy, load balancing or automatic failover.

Observability Jobs

Unique goes beyond traditional monitoring with a focus on providing deep insights into the operation and interaction of AI systems in various environments. This includes detecting issues that could affect their performance, accuracy, or reliability. Recognizing that every customer has unique requirements for integrating FinanceGPT into their IT infrastructure, we are committed to providing tailored solutions. Our aim is to ensure that our AI model performs as expected post-deployment. We diagnose and resolve problems during the process, and continuously monitor the models to ensure they meet the operational goals discussed with the client over time.
Read here for more information about Infrastructure requirements: Infrastructure requirements

Observability Logs and Metrics

Observability logs and metrics are important for maintaining transparency and performance in our FinanceGPT system. Our logs record system behaviour, including errors, warnings, informational messages, and other relevant events. This information is critical for understanding the sequence of operations and debugging issues.

On the other hand, we use metrics to measure various aspects of our system's performance and health, such as resource utilization and operational efficiency. These metrics allow us to evaluate latency, throughput, and accuracy. They also enable us to monitor CPU usage and memory consumption, ensuring that FinanceGPT operates at the desired power capacity.

We actively involve our clients in this process as much as possible which allows us to maximize the potential of FinanceGPT and tailor it to their specific needs and preferences.

Observability Control Tower

Both observabilities are managed by a designated control tower which serves as a command center that oversees all aspects of system observability and ensures that these run and work as intended.

Responsible AI (RAI) Logs and Metrics

Our RAI Logs are detailed records that capture decisions, behaviours, and operations of FinanceGPT, specifically focusing on aspects that pertain to ethical considerations, compliance with regulations, and interactions that could have social impacts. These logs are of great importance for us to ensure transparency, auditability - for third parties or regulatory bodies to review AI actions to ensure compliance with legal and ethical standards - and accountability to address issues related to bias, fairness, or errors.

We use RAI Metrics to assess and ensure that FinanceGPT is operating within defined ethical guidelines and performance standards. These metrics measure fairness - if decisions are unbiased and equitable across different groups - or explainability to measure the simplicity of explanations of the output.

Video Explainer

https://youtu.be/nTB3I6QcUcY

Resources

UniqueFinance GPT Overview: Figma Link (only for internals not publicly available)

Author	Andreas Hauri