/
Ingestion Configuration: Agentic Document Ingestion

Ingestion Configuration: Agentic Document Ingestion

This service is currently in beta phase. Please note that the image content extraction is not always precise and sometimes hallucinate or only approximate the values.

Analogous to https://unique-ch.atlassian.net/wiki/spaces/SD/pages/353140836, the following custom single page ingestion service combines the latest GA version 2024-11-30 of Microsoft’s Document Intelligence layout service with the AZURE_GPT_4o_2024_0806 vision model to extract content from images and further optimize page content.

The service can run the extraction with three methods (MDI = Microsoft Document Intelligence):

  • MDI: Uses MDI to extract page content and optionally performs an optimization with the Vision model.

  • MDI + Vision: Uses MDI to extract page content and a Vision model to extract the content from each detected image in parallel.

  • Vision: Uses only the Vision model to extract page content.

Each extraction methods can apply an additional Page Content Optimizer step that will evaluate the extracted page content and further improve it using a Vision model.

image-20250228-155417.png
Agentic Document Ingestion Overview

Key capabilities:

  • Leading document ingestion service

  • Extracts tabular data

  • Parses multiple column layouts

  • Enhances search results for complex documents

  • Can be deployed in Switzerland

  • Detects and extracts content from figures:

    • Charts and table-like images are transformed into a table and a searchable description is added

    • Logos are translated to the brand name / text

    • For other images a searchable description is added

  • Further optimizes extracted page content (optional)

Enable for Scope In Knowledge Base

To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted. This is an example curl for that:

Single-Tenant

curl --location --request POST 'https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \ --header 'Authorization: Bearer <yourToken>' \ --header 'Content-Type: application/json' \ --data-raw '{ "properties": { "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } }, "applyToSubScopes": true }'

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Multi-Tenant

curl --location --request POST 'https://gateway.unique.app/ingestion-gen2/v1/folder/<scopeId>/properties' \ --header 'Authorization: Bearer <yourToken>' \ --header 'Content-Type: application/json' \ --data-raw '{ "properties": { "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } }, "applyToSubScopes": true }'

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Enable for Upload in Chat

To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:

{ ... "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } ... }

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Changing the extraction method with the apiPayload

Through the optional apiPayload string parameter, the different extraction methods can be configured. By default, the MDI_VISION extraction method is used. To change the extractionMethod set the payload to the corresponding values:

  • "{ \"extractionMethod\": \"MDI\"}"

  • "{ \"extractionMethod\": \"MDI_VISION\"}"

  • "{ \"extractionMethod\": \"VISION\"}"

The page content optimization step is disabled by default. In order to enable it, adapt the apiPayload as follows:

  • "{\"pageContentOptimizerConfig\": { \"apply\": true }, \"extractionMethod\": \"MDI_VISION\"}"

Each extraction method has further configuration options, see below. Make sure to provide the JSON object as a string for the apiPayload:

Limitations and Considerations

  • The MDI costs approx. 1 cent per page and has limited throughput. These costs might be charged additionally by Unique as it is not covered by the Ada Tokens.

  • The MDI can be deployed globally, also in Switzerland.

  • MDI_VISION and VISION come with additional costs for token usage which range dependent on the amount of images per page, page content and page optimization iterations between 5k-10k tokens.

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

 

PaaS

Single Tenant

Customer Managed

On Premise

 

PaaS

Single Tenant

Customer Managed

On Premise

Config options

only via API for a scope

via API for a scope or via environment variable via Customer Success

Customer must manage it themselves

MDI is not available

Request

already deployed

via Customer Success considering the impact described above

Customer must deploy the service by themselves

Authentication Methods

MS Document Intelligence can run in two modes:

  • Key-based authentication (taking it from the env variables (see code), used in dev)

  • Via Workload Identity in production


Author

@Martin Fadler

 

Related content

© 2025 Unique AG. All rights reserved. Privacy PolicyTerms of Service