PDF Ingestion Services in Unique Platform
The Unique platform supports multiple services for ingesting PDF documents:
Default Unique Ingestion Service
Docling
Microsoft Document Intelligence (MDI in the following)
MDI with Image Content Extraction
Each service can parse structured PDFs with a single-column layout and extract simple tables. However, their capabilities vary when handling more complex documents:
Image-based PDFs: Scanned or printed PDFs lack structured content, requiring OCR techniques for extraction.
Multi-Column Layout: PDFs with multiple columns, charts, tables, and text need pre-trained layout detection models to identify page elements and preserve logical content flow.
Complex Tables Detection: Extracting tables with merged cells, missing borders, or checkmarks requires specialized AI models to recognize different table components.
Image Content Extraction: Many PDFs contain unstructured visual elements like charts, logos, or photos. AI models with image-to-text capabilities are needed to extract this content in a searchable form.
On-Prem Deployment: The service can operate in a closed environment without internet access.
General recommendation:
On-Prem Customers: Use Docling for PDF ingestion, as the Default Unique Ingestion Service lacks efficient support for multi-column layouts.
Cloud Customers: Use MDI as the default ingestion service, as it provides higher accuracy than Docling, particularly for tables without grid lines.
If PDFs contain charts or table-like structures, MDI with Image Content Extraction is recommended for making all document content searchable and accessible to language models.
- fully supported - partially supported - not supported
Ingestion service | Capabilities | Performance | Additional costs | ||||
---|---|---|---|---|---|---|---|
Image-based PDFs | Multi-Column Layouts | Complex Tables Detection | Image Content Extraction | On-Prem Deployment | |||
Default | 10-15s per page | None | |||||
Docling | 10-20s per page | Azure infra Costs | |||||
MDI | 10-20s per page | 1.6 cents per page | |||||
MDI with Image Content Extraction | 20-30s per page | 3 cents per page Assumption:
|
Related content
© 2025 Unique AG. All rights reserved. Privacy Policy – Terms of Service