Production Readiness Checklist

DISCLAIMER

These are best-practices either from the industry, Uniques experience or both. This list is non-conclusive and does not 100% guarantee a seamless rollout, go-live or production setup.


Scope

The scope of this document is to provide a production readiness checklist preparing Unique FinanceGPT for live deployment. It includes advice towards Documentation, Personnel, Infrastructure, Monitoring, Container, Product Configuration and more.

“You” in this document is the person/team overseeing the rollout, go-live or production setup.

Aspects

Each aspects represents an area of expertise so to say. They are roughly structured bottom to top, where the bottom is the pure infrastructure and the top the configuration of Unique FinanceGPT.

General

  • Reduce the manual interaction with your platform to an absolute minimum even in dire situations as cleaning up might induce worse impacts than investing some minutes more to permanently fix the change

  • Ensure, that corrective actions that have a cost impact can be taken any time (e.g. scale ups) even if it means a post-mortem to probably down-scale again or justify the increased spending

  • For all factors described below, ensure you have aligned internally when, by whom, how often and when components get updated (version upgrades, patches by the cloud provider, etc.)

Access/Security

Infrastructure

The bullets herein are leaning towards Azure as the hyper-scaler of choice. Clients running on another provider find the equivalent of each bullet.

  • Ensure, you read Essential Prerequisites for Customer Managed Tenant

  • It is strongly advised to make use of configuration management tools such as Terraform to provision all of herein

  • Review the Database Server, ensure you have visibility on CPU, memory and connections, evaluate before production if a scale-increase is needed (workloads might need to be adjusted as well, see Reference Case)

  • Review the Redis Cache, ensure you have visibility on CPU, memory and connections, evaluate before production if a scale-increase is needed (workloads might need to be adjusted as well, see Reference Case)

  • Revisit LLM scales and tokens per minute, plan an increase in capacity in time

  • Revisit the Container Orchestrator setup, ensuring you have enough space to scale up or out

    • Compute and Memory

    • Network (especially subnet sizes)

  • Ensure you do not rely on Uniques Container registry for pulling or scaling, images must be pulled from your local mirror. Also, refrain from using imagePullPolicy: Always (Kuberentes best practices)

Scaling the database in production will result in downtime (if not heavy HA investments were made). In reference cases the downtime was ~15min but this relies purely on the Database provider, not Unique FinanceGPT.

Monitoring

  • Ensure visibility over the ingress traffic (namely the Application Gateway), especially the rate of errors (HTTP status code >499)

  • Ensure log access, meaning stdout of Unique workloads, 3rd party workloads (like the Vector Databse or the IDP) and Azure services themselves (e.g. Kubernetes Pod Inventory or Events)

  • Ensure proper visibility on the Container Orchestrator, its nodes, its events, the pods as well as all other Resources and Custom Resources used by Unique FinanceGPT

    • It is strongly advised to have better means of access than kubectl or similar Command Line (using kubectl works but increases incident duration to to the amount of typing repetitive commands)

Product

  • For all services, adhere to best practices for production workloads

    • Set requests and limits

    • Zone redundant storage should be used if the infrastructure uses multiple AZ as well

    • Container processes should not run as root

    • Ensure, temporary data is written to ephemeral volumes (emptyDir)

    • Pod Disruption Budgets in place where possible

    • etc.

  • For critical-path services (chat, ingestion, ingestion-workers)

    • choose a proper sizing for compute, memory and potentially storage

    • choose a production-ready scaling strategy

      • Enable horizontal scaling choosing realistic scale targets (see Reference Case)

      • Enable event-driven scaling where needed (see Reference Case)

Personnel

  • In early production phases, ensure trained and savvy personnel is present that can cover Essential Prerequisites for Customer Managed Tenant

  • Ensure personnel present has trained on a test system to investigate logs, telemetry and other infrastructure KPI

  • Ensure personnel has necessary soft- and hard-skills to take necessary actions

  • Ensure personnel can decide and take action or define a clear path with available chains to escalate and decide

Knowledge

  • Ensure you wrote internal documentation or any form of knowledge persistence about your setup in relation to all bullets mentioned in Essential Prerequisites for Customer Managed Tenant

    • Each client setup is different as all landing zones are different. Uniques documentation does not reflect customisations and setups made within your specific setup which is crucial in production

    • Special focus must be knowledge about networking, DNS, general flow of ingress traffic as well as potential hurdles on the way out (egress, e.g. Firewalls or SSL interceptors)

  • Ensure you built enough internal knowledge to first-level react to incidents without relying on engineers joining calls (Engineers will support but it increases unavailability)

  • Ensure, potential worst case scenarios are internally known and documented (e.g. restore a database backup, reindexing all documents or similar). You are advised to perform a risk-model on your own and then consulting Unique to help you mitigate your top 3 so to say (each client has different critical-path risks)


Reference Cases

Gringotts Wizarding Bank

Gringotts Wizarding Bank (GWB) uses Azure as their provider of choice, they have provisioned Unique FinanceGPT as code and set everything up magically.

Ballpark Numbers

The numbers for GWBs production setup look approximately as follows:

  • ~3700 monthly active users

  • “business day” usage pattern, meaning a lot of traffic between 9am and 6pm with peaks around 10am and 2pm and some fluctuating traffic between

Use Cases

GWB invested into these use cases:

  • Private ChatGPT

  • Directives

  • Knowledge Search

  • Coding and Development

  • Document Analysis

  • Investment

  • Prospectus Extraction

  • Contract Analysis

  • IT Support

Infrastructure Configuration

  • Azure Postgres Server Size: GP_Standard_D8ds_v5

    • This server is used for all Unique related workloads including 3rd party services (e.g. IDP)

    • This SKU allows setting max_connections to 3200

  • Requested and approved quota increases for the most heavily used Azure OpenAI models

  • GWB also attached a PTU via Connect custom LXM (LLM) with Unique Finance GPT to not be purely reliant on Pay-as-you-Go models

Product Configuration

Backing Services

  • Tyk Application Gateway runs with 3 replicas, 600m CPU and 1GiB memory each

  • Zitadel IDP runs with 2 replicas, 1 CPU and 1500MiB each

  • Qdrant Vector Database runs with 2 replicas, 100m CPU, 3.8Gi requested and 4Gi limited memory each

Unique Services

  • The uses cases heavily rely on the application repository, which runs at
    MAX_HEAP_MB: 750, 500m CPU, 850Mi requested and 900Mi limited memory

  • The chat backend is scaled up to
    MAX_HEAP_MB: 1850, 1 CPU, 1900Mi requested and 2Gi limited memory

    • DATABASE_CONNECTION_POOL_LIMIT: "30"

    • Horizontal Pod Autoscaling is enabled allowing up to 6 replicas

  • For upload and chat cases, 4 chat Ingestion workers are running permanently from 8am to 7pm. and are limited to 8 replicas max. using KEDAs scaled object

  • Basic knowledge base ingestion is limited from min 0 to max 8 replicas using KEDAs scaled object


Author

Solution Engineering

 

© 2024 Unique AG. All rights reserved. Privacy PolicyTerms of Service