PDF Ingestion Services in Unique Platform

The Unique platform supports multiple services for ingesting PDF documents:

Default Unique Ingestion Service
Docling
Microsoft Document Intelligence (MDI in the following)
MDI with Image Content Extraction

Each service can parse structured PDFs with a single-column layout and extract simple tables. However, their capabilities vary when handling more complex documents:

Image-based PDFs: Scanned or printed PDFs lack structured content, requiring OCR techniques for extraction.
Multi-Column Layout: PDFs with multiple columns, charts, tables, and text need pre-trained layout detection models to identify page elements and preserve logical content flow.
Complex Tables Detection: Extracting tables with merged cells, missing borders, or checkmarks requires specialized AI models to recognize different table components.
Image Content Extraction: Many PDFs contain unstructured visual elements like charts, logos, or photos. AI models with image-to-text capabilities are needed to extract this content in a searchable form.
On-Prem Deployment: The service can operate in a closed environment without internet access.

General recommendation:

On-Prem Customers: Use Docling for PDF ingestion, as the Default Unique Ingestion Service lacks efficient support for multi-column layouts.
Cloud Customers: Use MDI as the default ingestion service, as it provides higher accuracy than Docling, particularly for tables without grid lines.
- If PDFs contain charts or table-like structures, MDI with Image Content Extraction is recommended for making all document content searchable and accessible to language models.

- fully supported 🟡 - partially supported - not supported

Ingestion service	Capabilities					Performance	Additional costs
Ingestion service	Image-based PDFs	Multi-Column Layouts	Complex Tables Detection	Image Content Extraction	On-Prem deployment	Performance	Additional costs
Default						10-15s per page	None
Docling	🟡		🟡			10-20s per page	Azure infra Costs
MDI						10-20s per page	1.6 cents per page
MDI with Image Content Extraction						20-30s per page	3 cents per page Assumption: 1.6 cents for MDI 1.4 cents for 5k tokens (vision model GPT4o) per image per page (assuming 1 image per page)

Author	https://unique-ch.atlassian.net/people/team/3c7bf02a-21ab-4bbc-b79f-8aab67830365