Analogous to /wiki/spaces/SD/pages/353140836, the following custom single page ingestion service enables the use of the GA version 2023-07-31 of Microsoft’s Document Intelligence layout service, formerly called Form Recognizer.

Key capabilities:

Ingestion Config

To use this custom PDF page processing, the ingestion config of the content needs to be adjusted. This ingestion config can be set either on the scope level or on the content directly. This is an example curl for that:

curl --location --request POST 'http://https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \
--header 'Authorization: Bearer <yourToken>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "ingestionConfig": {
            "pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
            "customApiIdentifier": "Unique Text Extraction API"
        }
    },
    "applyToSubScopes": true
}'

Limitations and Considerations

Activation

Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.

 

PaaS

Single Tenant

Customer Managed

On Premise

Config options

only via API for a scope

via API for a scope or via environment variable via Customer Success

Customer must manage it themselves

MDI is not available

Request

already deployed

via Customer Success considering the impact described above

Customer must deploy the service by themselves

Authentication Methods

MS Document Intelligence can run in two modes:


Author

Martin Fadler