Making PDFs machine-processable remains a challenging task due to their variety in formats and optimization for printing by removing srtucture structure information and metadata. We have evaluated a set of libraries as potential candidates for on-prem PDF ingestion. From all investigated solutions, Docling performs best in most test scenarios and is therefore our recommended solution.
...