Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

This service is deprecated and will removed in one of the next releases. You can use /wiki/spaces/SD/pages/353140836 our MS Document Intelligence default service which now uses the latest GA version 2024-11-30.

Analogous to /wiki/spaces/SD/pages/353140836 Ingestion Configuration: MS Document Intelligence (Layout) Ingestion, the following custom single page ingestion service enables the use of the GA version 2023-07-31 of Microsoft’s Document Intelligence layout service, formerly called Form Recognizer.

Key capabilities:

  • Leading document ingestion service

  • Extracts tabular data

  • Parses multiple column layouts

  • Enhances search results for complex documents

  • Can be deployed in Switzerland

Enable for Scope In Knowledge Base

To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted. This is an example curl for that:

Code Block
curl --location --request POST 'https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiOptions": [{
                "customisationType": "CUSTOM_SINGLE_PAGE_API",
                "apiIdentifier": "Unique Text Extraction API"
            }]
        }
    },
    "applyToSubScopes": true
}'

Enable for Upload in Chat

To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:

Code Block
{
  ...
  "ingestionConfig": {
      "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
      "customApiOptions": [{
          "customisationType": "CUSTOM_SINGLE_PAGE_API",
          "apiIdentifier": "Unique Text Extraction API"
      }]
  },
  ...
}

Limitations and Considerations

  • The MS Document Intelligence Service costs approx. 1 cent per page and has limited throughput. These costs might be charged additionally by Unique as it is not covered by the Ada Tokens.

  • The MS Document Intelligence Service can be deployed in Switzerland.

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

 

PaaS

Single Tenant

Customer Managed

On Premise

Config options

only via API for a scope

via API for a scope or via environment variable via Customer Success

Customer must manage it themselves

MDI is not available

Request

already deployed

via Customer Success considering the impact described above

Customer must deploy the service by themselves

Authentication Methods

MS Document Intelligence can run in two modes:

...