This service is currently in beta phase. Please note that the image content extraction is not always precise and sometimes hallucinate or only approximate the values. |
Analogous to /wiki/spaces/SD/pages/353140836, the following custom single page ingestion service combines the latest GA version 2024-11-30 of Microsoft’s Document Intelligence layout service with the AZURE_GPT_4o_2024_0806
vision model to extract content from images and further optimize page content.
The service can run the extraction with three methods (MDI = Microsoft Document Intelligence):
MDI: Uses MDI to extract page content and optionally performs an optimization with the Vision model.
MDI + Vision: Uses MDI to extract page content and a Vision model to extract the content from each detected image in parallel.
Vision: Uses only the Vision model to extract page content.
Each extraction methods can apply an additional Page Content Optimizer step that will evaluate the extracted page content and further improve it using a Vision model.
Key capabilities:
Leading document ingestion service
Extracts tabular data
Parses multiple column layouts
Enhances search results for complex documents
Can be deployed in Switzerland
Detects and extracts content from figures:
Charts and table-like images are transformed into a table and a searchable description is added
Logos are translated to the brand name / text
For other images a searchable description is added
Further optimizes extracted page content (optional)
To use this custom PDF page processing for a specific Scope or Content in the Knowledge Base, the ingestion config of the content needs to be adjusted. This is an example curl for that:
Single-Tenant
curl --location --request POST 'https://gateway.<baseUrl>/ingestion/v1/folder/<scopeId>/properties' \ --header 'Authorization: Bearer <yourToken>' \ --header 'Content-Type: application/json' \ --data-raw '{ "properties": { "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } }, "applyToSubScopes": true }' |
By default, the MDI_VISION
extraction method is used, see the details how to change and further configure the extraction method below.
Multi-Tenant
curl --location --request POST 'https://gateway.unique.app/ingestion-gen2/v1/folder/<scopeId>/properties' \ --header 'Authorization: Bearer <yourToken>' \ --header 'Content-Type: application/json' \ --data-raw '{ "properties": { "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } }, "applyToSubScopes": true }' |
By default, the MDI_VISION
extraction method is used, see the details how to change and further configure the extraction method below.
To use the custom PDF page processing in specific space when uploading a document to the chat, the ingestion config in the Advanced Settings in the space management must be changed as follows:
{ ... "ingestionConfig": { "pdfReadMode": "CUSTOM_SINGLE_PAGE_API", "customApiOptions": [{ "customisationType": "CUSTOM_SINGLE_PAGE_API", "apiIdentifier": "Unique Text and Image Extraction API", "apiPayload": "{}" }] } ... } |
By default, the MDI_VISION
extraction method is used, see the details how to change and further configure the extraction method below.
Through the optional apiPayload
string parameter, the different extraction methods can be configured. By default, the MDI_VISION
extraction method is used. To change the extractionMethod set the payload to the corresponding values:
"{ \"extractionMethod\": \"MDI\"}"
"{ \"extractionMethod\": \"MDI_VISION\"}"
"{ \"extractionMethod\": \"VISION\"}"
The page content optimization step is disabled by default. In order to enable it, adapt the apiPayload
as follows:
"{\"pageContentOptimizerConfig\": { \"apply\": true }, \"extractionMethod\": \"MDI_VISION\"}"
Each extraction method has further configuration options, see below. Make sure to provide the JSON object as a string for the apiPayload:
The MDI costs approx. 1 cent per page and has limited throughput. These costs might be charged additionally by Unique as it is not covered by the Ada Tokens.
The MDI can be deployed globally, also in Switzerland.
MDI_VISION
and VISION
come with additional costs for token usage which range dependent on the amount of images per page, page content and page optimization iterations between 5k-10k tokens.
Before being able to use MDI, the service must be deployed within a tenant. Depending on your Deployment models one of the following processes must be chosen.
| PaaS | Single Tenant | Customer Managed | On Premise |
---|---|---|---|---|
Config options | only via API for a scope | via API for a scope or via environment variable via Customer Success | Customer must manage it themselves | MDI is not available |
Request | already deployed | via Customer Success considering the impact described above | Customer must deploy the service by themselves |
MS Document Intelligence can run in two modes:
Key-based authentication (taking it from the env variables (see code), used in dev)
Via Workload Identity in production
Unique uses only Workload Identity on all its Deployment models
Author |
---|