Artefact Caching
Unique is built with scalability, security and resilience in mind. This is why Unique consciously chooses to not release Container Images that have large (AI) artefacts inside.
High level rationale
Packing large language models (LLMs) directly into Docker images is not a scalable or efficient practice for several reasons, including potential security risks:
Image Size and Deployment: Including large models in Docker images drastically increases their size, leading to slower build, transfer, and deployment times. This delays updates and consumes unnecessary storage resources across environments.
Resource Wastage: Each time the model is updated, the entire image must be rebuilt and redeployed, even if the application code hasn’t changed. This is inefficient and unnecessarily resource-intensive.
Possible Security Risks
Exposure of Sensitive Data: Embedding models directly in images increases the risk of accidental exposure, especially if the image is pushed to a public or insecure registry. Sensitive proprietary or licensed models may be leaked.
Limited Model Access Control: Using a centralized storage system for model loading allows granular access controls (e.g., authentication, encryption, audit trails). When packed into an image, these controls are harder to enforce.
Immutable Images: Good practices for container security recommend keeping images immutable and limiting their contents to the application code and necessary dependencies. Packing models violates this principle, making images larger and harder to secure.
Flexibility and Maintainability: By externalising models and loading them via a volume or centralized storage, we can independently manage, update, or replace them without impacting the application container. This approach also allows us to apply versioning, secure storage, and controlled access to the models.
Scalability: Externalizing models supports shared storage and caching, reducing duplication across containers and enabling efficient scaling without bloating the infrastructure.
For these reasons, we follow industry good practices by loading immutable models from a volume or secure storage. This approach ensures better security, resource efficiency, and operational flexibility.
Additional complexity: AI innovation
The AI world changes fast, the chance that a new use case is solved with yet another model is high. Given this circumstance it's not unusual that a service uses not 1 but let’s say a hand-full of models.
If each of them is 6GB, which is rather small for most models, this would result in a 30GB docker image, a thought that most infrastructure and cluster admins would not allow and support.
Artefacts Cache
Affected by this good practice are mostly AI/Python workloads and therefore, since version 1.1.0, the Unique ai-service
Helm Chart supports side or pre-loading models/artefacts.
This image high level depicts the methods outlined below, where HF Hub is symbolic for any artefact origin.
Contract
The only contract between the service and the underlying volume is the specification of a path, where the models (ff. artefacts) are expected. The agreed path can be specified in the caches configuration.
How the artefacts get there is a separate discussion, see below.
Loading Artefacts
Unique services do not natively (in the application code) pack or download models. This is due to the fact that Unique is also offered as Self-Hosting solution in isolated environments which have no access to the internet.
Via initContainer
The initContainer, whether using the ArtefactCache or an own implementation downloads the model to the agreed path from a source of the clients choice. This is the cleanest method but not always feasible (e.g. in isolated networks).
Via kubectl cp
This options involves providing the model to the workloads with either an automation or human interaction.
Using a command like kubectl cp /model/model1 my-namespace/my-pod:/models/model1
an operator can provide the necessary models to the deployed workloads.
If a Persistent Volume is used (a practice the Unique ai-service chart implements as well), this action has to be done only once for every model.
Via custom docker builds
Unique clearly discourages this practice for the reasons outlined above in High Level Rationale.
If both the above options are literally non-feasible, self-hosting clients can as last resort to custom docker builds. Unique can advise on this practice pro rata. A potential implementation of this looks as follows:
FROM uniquecr.azurecr.io/ingestor:any-version
COPY downloaded-model1 /models/model1
COPY downloaded-model2 /models/model2
...
This bloated image can then be pushed into the Self-Hosted registry.
Author | Solution Engineering |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service