Production Readiness Checklist
DISCLAIMER
These are best-practices either from the industry, Uniques experience or both. This list is non-conclusive and does not 100% guarantee a seamless rollout, go-live or production setup.
- 1 Scope
- 2 Aspects
- 2.1 General
- 2.2 Access/Security
- 2.3 Infrastructure
- 2.4 Monitoring
- 2.5 Product
- 2.6 Personnel
- 2.7 Knowledge
- 3 Reference Cases
- 3.1 Gringotts Wizarding Bank
- 3.1.1 Ballpark Numbers
- 3.1.2 Use Cases
- 3.1.3 Infrastructure Configuration
- 3.1.4 Product Configuration
- 3.1.4.1 Backing Services
- 3.1.4.2 Unique Services
- 3.1 Gringotts Wizarding Bank
Scope
The scope of this document is to provide a production readiness checklist preparing Unique FinanceGPT for live deployment. It includes advice towards Documentation, Personnel, Infrastructure, Monitoring, Container, Product Configuration and more.
“You” in this document is the person/team overseeing the rollout, go-live or production setup.
Aspects
Each aspects represents an area of expertise so to say. They are roughly structured bottom to top, where the bottom is the pure infrastructure and the top the configuration of Unique FinanceGPT.
General
Reduce the manual interaction with your platform to an absolute minimum even in dire situations as cleaning up might induce worse impacts than investing some minutes more to permanently fix the change
Ensure, that corrective actions that have a cost impact can be taken any time (e.g. scale ups) even if it means a post-mortem to probably down-scale again or justify the increased spending
For all factors described below, ensure you have aligned internally when, by whom, how often and when components get updated (version upgrades, patches by the cloud provider, etc.)
Access/Security
Ensure, you have access (permanent or just-in-time) to all components relating to Unique FinanceGPT
Infrastructure requirements outlines which infrastructure components are needed
Ensure, you have access to logs and telemetry of mentioned services
Ensure, you are empowered enough (permanent or just-in-time) to modify resources
It is strongly advised to use a secrets manager to store your confidential information
Infrastructure
The bullets herein are leaning towards Azure as the hyper-scaler of choice. Clients running on another provider find the equivalent of each bullet.
Ensure, you read Essential Prerequisites for Customer Managed Tenant
It is strongly advised to make use of configuration management tools such as Terraform to provision all of herein
Review the Database Server, ensure you have visibility on CPU, memory and connections, evaluate before production if a scale-increase is needed (workloads might need to be adjusted as well, see Reference Case)
Review the Redis Cache, ensure you have visibility on CPU, memory and connections, evaluate before production if a scale-increase is needed (workloads might need to be adjusted as well, see Reference Case)
Revisit LLM scales and tokens per minute, plan an increase in capacity in time
Revisit the Container Orchestrator setup, ensuring you have enough space to scale up or out
Compute and Memory
Network (especially subnet sizes)
Ensure you do not rely on Uniques Container registry for pulling or scaling, images must be pulled from your local mirror. Also, refrain from using
imagePullPolicy: Always
(Kuberentes best practices)
Scaling the database in production will result in downtime (if not heavy HA investments were made). In reference cases the downtime was ~15min but this relies purely on the Database provider, not Unique FinanceGPT.
Monitoring
Ensure visibility over the ingress traffic (namely the Application Gateway), especially the rate of errors (HTTP status code
>499
)Ensure log access, meaning
stdout
of Unique workloads, 3rd party workloads (like the Vector Databse or the IDP) and Azure services themselves (e.g. Kubernetes Pod Inventory or Events)Ensure proper visibility on the Container Orchestrator, its nodes, its events, the pods as well as all other Resources and Custom Resources used by Unique FinanceGPT
It is strongly advised to have better means of access than
kubectl
or similar Command Line (usingkubectl
works but increases incident duration to to the amount of typing repetitive commands)
Product
For all services, adhere to best practices for production workloads
Set requests and limits
Zone redundant storage should be used if the infrastructure uses multiple AZ as well
Container processes should not run as root
Ensure, temporary data is written to ephemeral volumes (
emptyDir
)Pod Disruption Budgets in place where possible
For critical-path services (chat, ingestion, ingestion-workers)
choose a proper sizing for compute, memory and potentially storage
choose a production-ready scaling strategy
Enable horizontal scaling choosing realistic scale targets (see Reference Case)
Enable event-driven scaling where needed (see Reference Case)
Personnel
In early production phases, ensure trained and savvy personnel is present that can cover Essential Prerequisites for Customer Managed Tenant
Ensure personnel present has trained on a test system to investigate logs, telemetry and other infrastructure KPI
Ensure personnel has necessary soft- and hard-skills to take necessary actions
Ensure personnel can decide and take action or define a clear path with available chains to escalate and decide
Knowledge
Ensure you wrote internal documentation or any form of knowledge persistence about your setup in relation to all bullets mentioned in Essential Prerequisites for Customer Managed Tenant
Each client setup is different as all landing zones are different. Uniques documentation does not reflect customisations and setups made within your specific setup which is crucial in production
Special focus must be knowledge about networking, DNS, general flow of ingress traffic as well as potential hurdles on the way out (egress, e.g. Firewalls or SSL interceptors)
Ensure you built enough internal knowledge to first-level react to incidents without relying on engineers joining calls (Engineers will support but it increases unavailability)
Ensure, potential worst case scenarios are internally known and documented (e.g. restore a database backup, reindexing all documents or similar). You are advised to perform a risk-model on your own and then consulting Unique to help you mitigate your top 3 so to say (each client has different critical-path risks)
Reference Cases
Gringotts Wizarding Bank
Gringotts Wizarding Bank (GWB) uses Azure as their provider of choice, they have provisioned Unique FinanceGPT as code and set everything up magically.
Ballpark Numbers
The numbers for GWBs production setup look approximately as follows:
~3700 monthly active users
“business day” usage pattern, meaning a lot of traffic between 9am and 6pm with peaks around 10am and 2pm and some fluctuating traffic between
Use Cases
GWB invested into these use cases:
Private ChatGPT
Directives
Knowledge Search
Coding and Development
Document Analysis
Investment
Prospectus Extraction
Contract Analysis
IT Support
Infrastructure Configuration
Azure Postgres Server Size:
GP_Standard_D8ds_v5
This server is used for all Unique related workloads including 3rd party services (e.g. IDP)
This SKU allows setting
max_connections
to3200
Requested and approved quota increases for the most heavily used Azure OpenAI models
GWB also attached a PTU via Connect custom LXM (LLM) with Unique Finance GPT to not be purely reliant on Pay-as-you-Go models
Product Configuration
Backing Services
Tyk Application Gateway runs with
3
replicas,600m
CPU and1GiB
memory eachZitadel IDP runs with
2
replicas,1
CPU and1500MiB
eachQdrant Vector Database runs with
2
replicas,100m
CPU,3.8Gi
requested and4Gi
limited memory each
Unique Services
The uses cases heavily rely on the application repository, which runs at
MAX_HEAP_MB: 750
,500m
CPU,850Mi
requested and900Mi
limited memoryThe chat backend is scaled up to
MAX_HEAP_MB: 1850
,1
CPU,1900Mi
requested and2Gi
limited memoryDATABASE_CONNECTION_POOL_LIMIT: "30"
Horizontal Pod Autoscaling is enabled allowing up to
6
replicas
For upload and chat cases,
4
chat Ingestion workers are running permanently from 8am to 7pm. and are limited to 8 replicas max. using KEDAs scaled objectBasic knowledge base ingestion is limited from min
0
to max8
replicas using KEDAs scaled object
Author | Solution Engineering |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service