Artefact Caching

Unique is built with scalability, security and resilience in mind. This is why Unique consciously chooses to not release Container Images that have large (AI) artefacts inside.

1 High level rationale
- 1.1 Additional complexity: AI innovation
2 Artefacts Cache
- 2.1 Contract
- 2.2 Loading Artefacts
  - 2.2.1 Via initContainer
  - 2.2.2 Via kubectl cp
  - 2.2.3 Via custom docker builds

High level rationale

Packing large language models (LLMs) directly into Docker images is not a scalable or efficient practice for several reasons, including potential security risks:

Image Size and Deployment: Including large models in Docker images drastically increases their size, leading to slower build, transfer, and deployment times. This delays updates and consumes unnecessary storage resources across environments.

Resource Wastage: Each time the model is updated, the entire image must be rebuilt and redeployed, even if the application code hasn’t changed. This is inefficient and unnecessarily resource-intensive.

Possible Security Risks
1. Exposure of Sensitive Data: Embedding models directly in images increases the risk of accidental exposure, especially if the image is pushed to a public or insecure registry. Sensitive proprietary or licensed models may be leaked.
2. Limited Model Access Control: Using a centralized storage system for model loading allows granular access controls (e.g., authentication, encryption, audit trails). When packed into an image, these controls are harder to enforce.
3. Immutable Images: Good practices for container security recommend keeping images immutable and limiting their contents to the application code and necessary dependencies. Packing models violates this principle, making images larger and harder to secure.
Flexibility and Maintainability: By externalising models and loading them via a volume or centralized storage, we can independently manage, update, or replace them without impacting the application container. This approach also allows us to apply versioning, secure storage, and controlled access to the models.

Scalability: Externalizing models supports shared storage and caching, reducing duplication across containers and enabling efficient scaling without bloating the infrastructure.

For these reasons, we follow industry good practices by loading immutable models from a volume or secure storage. This approach ensures better security, resource efficiency, and operational flexibility.

Additional complexity: AI innovation

The AI world changes fast, the chance that a new use case is solved with yet another model is high. Given this circumstance it's not unusual that a service uses not 1 but let’s say a hand-full of models.

If each of them is 6GB, which is rather small for most models, this would result in a 30GB docker image, a thought that most infrastructure and cluster admins would not allow and support.

Artefacts Cache

Affected by this good practice are mostly AI/Python workloads and therefore, since version 1.1.0, the Unique ai-service Helm Chart supports side or pre-loading models/artefacts.

This image high level depicts the methods outlined below, where HF Hub is symbolic for any artefact origin.

Contract

The only contract between the service and the underlying volume is the specification of a path, where the models (ff. artefacts) are expected. The agreed path can be specified in the caches configuration.

How the artefacts get there is a separate discussion, see below.

Loading Artefacts

Unique services do not natively (in the application code) pack or download models. This is due to the fact that Unique is also offered as Self-Hosting solution in isolated environments which have no access to the internet.

Via initContainer

The initContainer, whether using the ArtefactCache or an own implementation downloads the model to the agreed path from a source of the clients choice. This is the cleanest method but not always feasible (e.g. in isolated networks).

Via `kubectl cp`

This options involves providing the model to the workloads with either an automation or human interaction.

Using a command like kubectl cp /model/model1 my-namespace/my-pod:/models/model1 an operator can provide the necessary models to the deployed workloads.

If a Persistent Volume is used (a practice the Unique ai-service chart implements as well), this action has to be done only once for every model.

Via custom docker builds

Unique clearly discourages this practice for the reasons outlined above in High Level Rationale.

If both the above options are literally non-feasible, self-hosting clients can as last resort to custom docker builds. Unique can advise on this practice pro rata. A potential implementation of this looks as follows:

FROM uniquecr.azurecr.io/ingestor:any-version
COPY downloaded-model1 /models/model1
COPY downloaded-model2 /models/model2
...

This bloated image can then be pushed into the Self-Hosted registry.

Author	Solution Engineering

Public Documentation

Artefact Caching

High level rationale

Additional complexity: AI innovation

Artefacts Cache

Contract

Loading Artefacts

Via initContainer

Via `kubectl cp`

Via custom docker builds

Related content

Artefact Caching

High level rationale

Additional complexity: AI innovation

Artefacts Cache

Contract

Loading Artefacts

Via initContainer

Via kubectl cp

Via custom docker builds

Related content

Via `kubectl cp`