This service is currently in beta phase. Please note that the image content extraction is not always precise and sometimes hallucinate or only approximate the values.

Analogous to https://unique-ch.atlassian.net/wiki/spaces/SD/pages/353140836, the following custom single page ingestion service combines the latest GA version 2024-11-30 of Microsoft’s Document Intelligence layout service with the AZURE_GPT_4o_2024_0806 vision model to extract content from images and further optimize page content.

The service can run the extraction with three methods (MDI = Microsoft Document Intelligence):

MDI: Uses MDI to extract page content and optionally performs an optimization with the Vision model.
MDI + Vision: Uses MDI to extract page content and a Vision model to extract the content from each detected image in parallel.
Vision: Uses only the Vision model to extract page content.

Each extraction methods can apply an additional Page Content Optimizer step that will evaluate the extracted page content and further improve it using a Vision model.

Agentic Document Ingestion Overview

Key capabilities:

Leading document ingestion service
Extracts tabular data
Parses multiple column layouts
Enhances search results for complex documents
Can be deployed in Switzerland
Detects and extracts content from figures:
- Charts and table-like images are transformed into a table and a searchable description is added
- Logos are translated to the brand name / text
- For other images a searchable description is added
Further optimizes extracted page content (optional)

Enable for Scope In Knowledge Base

To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted.

Via Ingestion Config UI

Click “Configure File Ingestion” for the scope of interest
Select “Custom Single Page API” for PDF ingestion
Enter “Unique Text and Image Extraction API” in API Identifier
Enter API Payload when you intend to change the default configuration, see below

Via API Call

The ingestionBaseUrl is different depending on where your Unique instance is hosted.

Multitenant (Europe): gateway.unique.app/ingestion-gen2
Multitenant (US): gateway.us.unique.app/ingestion
Single Tenant: <backendBaseUrl>/ingestion - the backendBaseUrl part depends on the tenant configuration (contact Unique if unknown)
Customer Managed Tenant: <backendBaseUrl>/ingestion - the backendBaseUrl part depends on your tenant configuration

curl --location --request POST 'https://<ingestionBaseUrl>/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiOptions": [{
                "customisationType": "CUSTOM_SINGLE_PAGE_API",
                "apiIdentifier": "Unique Text and Image Extraction API",
                "apiPayload": "{}"
            }]
        }
    },
    "applyToSubScopes": true
}'

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Enable for Upload in Chat

To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:

{
  ...
  "ingestionConfig": {
      "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
      "customApiOptions": [{
          "customisationType": "CUSTOM_SINGLE_PAGE_API",
          "apiIdentifier": "Unique Text and Image Extraction API",
          "apiPayload": "{}"
      }]
  }
  ...
}

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Changing the extraction method with the apiPayload

Through the optional apiPayload string parameter, the different extraction methods can be configured. By default, the MDI_VISION extraction method is used. To change the extractionMethod set the payload to the corresponding values:

"{ \"extractionMethod\": \"MDI\"}"
"{ \"extractionMethod\": \"MDI_VISION\"}"
"{ \"extractionMethod\": \"VISION\"}"

The page content optimization step is disabled by default. In order to enable it, adapt the apiPayload as follows:

"{\"pageContentOptimizerConfig\": { \"apply\": true }, \"extractionMethod\": \"MDI_VISION\"}"

Each extraction method has further configuration options, see below. Make sure to provide the JSON object as a string for the apiPayload:

Limitations and Considerations

The MDI costs approx. 1 cent per page and has limited throughput. These costs might be charged additionally by Unique as it is not covered by the Ada Tokens.
The MDI can be deployed globally, also in Switzerland.
MDI_VISION and VISION come with additional costs for token usage which range dependent on the amount of images per page, page content and page optimization iterations between 5k-10k tokens.

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

	PaaS	Single Tenant	Customer Managed	On Premise

	PaaS	Single Tenant	Customer Managed	On Premise
Config options	only via API for a scope	via API for a scope or via environment variable via Customer Success	Customer must manage it themselves	MDI is not available
Request	already deployed	via Customer Success considering the impact described above	Customer must deploy the service by themselves	MDI is not available

Authentication Methods

MS Document Intelligence can run in two modes:

Key-based authentication (taking it from the env variables (see code), used in dev)
Via Workload Identity in production
- Unique uses only Workload Identity on all its Deployment models

Author	@Martin Fadler