This service is currently in beta phase. Please note that the image content extraction is not always precise and sometimes hallucinate or only approximate the values.

Analogous to /wiki/spaces/SD/pages/353140836, the following custom single page ingestion service enables the use of the latest GA version 2024-11-30 of Microsoft’s Document Intelligence layout service. In addition, this ingestion service detects and extracts searchable content from images / figures in PDF documents, e.g., charts, logos or image-like tables. For this additional step, the service requires a language model with vision capabilities. It uses by default AZURE_GPT_4o_2024_0806.

Key capabilities:

Leading document ingestion service
Extracts tabular data
Parses multiple column layouts
Enhances search results for complex documents
Can be deployed in Switzerland
Detects and extracts content from figures:
- Charts and table-like images are transformed into a table and a searchable description is added
- Logos are translated to the brand name / text
- For other images a searchable description is added

Enable for Scope In Knowledge Base

To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted. This is an example curl for that:

curl --location --request POST 'https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiOptions": [{
                "customsationType": "CUSTOM_SINGLE_PAGE_API",
                "apiIdentifier": "Unique Text and Image Extraction API",
                "apiPayload": "{\"languageModel\": \"AZURE_GPT_4o_2024_0806\",\"imagesInParallel\": 3}"
            }]
        }
    },
    "applyToSubScopes": true
}'

The apiPayload is optional. It contains the configuration for the service and must be passed as json object formatted as a string. You can configure:

languageModel must support vision, defaults to AZURE_GPT_4o_2024_0806
imagesInParallel determines how many images detected on a page should be processed in parallel, defaults to 3. This parameter should be wisely chosen considering performance and rate-limits.

Enable for Upload in Chat

To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:

{
  ...
  "ingestionConfig": {
      "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
      "customApiOptions": [{
          "customsationType": "CUSTOM_SINGLE_PAGE_API",
          "apiIdentifier": "Unique Text and Image Extraction API",
          "apiPayload": "{\"languageModel\": \"AZURE_GPT_4o_2024_0806\",\"imagesInParallel\": 3}"
      }]
  },
  ...
}

Limitations and Considerations

The MS Document Intelligence Service costs approx. 1 cent per page and has limited throughput. These costs might be charged additionally by Unique as it is not covered by the Ada Tokens.
The MS Document Intelligence Service can be deployed in Switzerland.

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

	PaaS	Single Tenant	Customer Managed	On Premise
Config options	only via API for a scope	via API for a scope or via environment variable via Customer Success	Customer must manage it themselves	MDI is not available
Request	already deployed	via Customer Success considering the impact described above	Customer must deploy the service by themselves	MDI is not available

Authentication Methods

MS Document Intelligence can run in two modes:

Key-based authentication (taking it from the env variables (see code), used in dev)
Via Workload Identity in production
- Unique uses only Workload Identity on all its Deployment models

Author	Martin Fadler

Ingestion Configuration: Image Content Extraction and MS Document Intelligence GA Version

Enable for Scope In Knowledge Base

Enable for Upload in Chat

Limitations and Considerations

Activation

Authentication Methods