This service is currently in beta phase. Please note that the image content extraction is not always precise and sometimes hallucinate or only approximate the values.

Analogous to /wiki/spaces/SD/pages/353140836, the following custom single page ingestion service combines the latest GA version 2024-11-30 of Microsoft’s Document Intelligence layout service with the AZURE_GPT_4o_2024_0806 vision model to extract content from images and further optimize page content.

The service can run the extraction with three methods (MDI = Microsoft Document Intelligence):

Each extraction methods can apply an additional Page Content Optimizer step that will evaluate the extracted page content and further improve it using a Vision model.

image-20250228-155417.png

Key capabilities:

Enable for Scope In Knowledge Base

To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted. This is an example curl for that:

Single-Tenant

curl --location --request POST 'https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiOptions": [{
                "customisationType": "CUSTOM_SINGLE_PAGE_API",
                "apiIdentifier": "Unique Text and Image Extraction API",
                "apiPayload": "{}"
            }]
        }
    },
    "applyToSubScopes": true
}'

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Multi-Tenant

curl --location --request POST 'https://gateway.unique.app/ingestion-gen2/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiOptions": [{
                "customisationType": "CUSTOM_SINGLE_PAGE_API",
                "apiIdentifier": "Unique Text and Image Extraction API",
                "apiPayload": "{}"
            }]
        }
    },
    "applyToSubScopes": true
}'

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Enable for Upload in Chat

To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:

{
  ...
  "ingestionConfig": {
      "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
      "customApiOptions": [{
          "customisationType": "CUSTOM_SINGLE_PAGE_API",
          "apiIdentifier": "Unique Text and Image Extraction API",
          "apiPayload": "{}"
      }]
  }
  ...
}

By default, the MDI_VISION extraction method is used, see the details how to change and further configure the extraction method below.

Changing the extraction method with the apiPayload

Through the optional apiPayload string parameter, the different extraction methods can be configured. By default, the MDI_VISION extraction method is used. To change the extractionMethod set the payload to the corresponding values:

The page content optimization step is disabled by default. In order to enable it, adapt the apiPayload as follows:

Each extraction method has further configuration options, see below. Make sure to provide the JSON object as a string for the apiPayload:

Limitations and Considerations

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

 

PaaS

Single Tenant

Customer Managed

On Premise

Config options

only via API for a scope

via API for a scope or via environment variable via Customer Success

Customer must manage it themselves

MDI is not available

Request

already deployed

via Customer Success considering the impact described above

Customer must deploy the service by themselves

Authentication Methods

MS Document Intelligence can run in two modes:


Author

Martin Fadler