/
Additional ingestion configuration options

Additional ingestion configuration options

There are other ingestion configuration options available.

Ingestion configuration

Following here is the complete ingestionConfig (first values = default). Those values can be adjusted as described in the sub chapters. And set on different levels for documents.

{ ingestionConfig: { chunkMaxTokens: 600, chunkMaxTokensOnePager: 1000, chunkMinTokens: 3, chunkStrategy: 'UNIQUE_DEFAULT_CHUNKING' | 'CUSTOM_CHUNKING_API', pdfReadMode: 'PDFTODOCX_ONLY' | 'DOC_INTELLIGENCE_DEFAULT' | 'DOC_INTELLIGENCE_ON_TABLE' | 'DOC_INTELLIGENCE_FALLBACK' | 'DOC_INTELLIGENCE_DISABLED' | 'CUSTOM_SINGLE_PAGE_API', jpgReadMode: 'NO_INGESTION' | 'DOC_INTELLIGENCE_DEFAULT', wordReadMode: 'MAMMOTH_ONLY' | 'DOC_INTELLIGENCE_DEFAULT', uniqueIngestionMode: 'INGESTION' | 'SKIP_INGESTION' | 'EXTERNAL_INGESTION'; documentMinTokens: 25; customApiOptions: [] | Array<{ customisationType: 'CUSTOM_SINGLE_PAGE_API' | 'CUSTOM_CHUNKING_API', apiIdentifier: 'YOUR IDENTIFIER', apiPayload?: '{"xxx": "yyyy"}' }>; } }

 

The ingestion configuration can be set on different levels:

Setting the general Unique Finance GPT ingestion mode

This mode defines the overall behaviour of the ingestion. There are three possible options:

  1. INGESTION (default)

  2. SKIP_INGESTION

  3. EXTERNAL_INGESTION

Mode Ingestion

This is the default mode. The content will be queued by Unique FinanceGPT to be ingested by Unique. During this ingestion flow there are still some customisations possible. But generally Unique will lead the ingestion process.

Mode skip_ingestion

This mode will directly set the status of an uploaded content as FINISHED. Means Unique FinanceGPT expect that this content needs no ingestion. Use case for this can be uploaded images/charts to be referenced in a chat message.

Mode external_ingestion

The external ingestion mode indicates Unique FinanceGPT that the whole ingestion progress of this content will be done by an SDK integration. Therefor Unique will also skip its ingestion process but does set the status of the content to QUEUED. In the assumption that the SDK will then pick up this message and start ingesting, adding chunks and updating the status of the content when finished.

 

Additional Parameters

The following ingestion parameters can also be set on the ingestion config of a space, scope or content.

  • chunkMaxTokens (number)
    Defines the maximum amount of tokens a normal chunk is allowed to have.

  • chunkMaxTokensOnePager (number)
    Defines the maximum amount of tokens of a content with only one total page which might would result in two or at least one too small chunk with no contextual meaning.

  • chunkMinTokens (number)
    This defines the minimum amount of tokens a chunk needs to have.

  • documentMinTokens (number)
    The document min tokens defines a minimum amount of tokens a content needs to have. If this amount is not reached for a document it can indicate its owner that either the document is not meaningful or the ingestion process was not able to parse the content correctly. (scanned documents just containing images inside - no text)


 

Author

@Adrian Gugger

 

Related content

3rd party APIs for customisation of ingestion
3rd party APIs for customisation of ingestion
More like this
Ingestion Configuration: MS Document Intelligence (Layout) Ingestion
Ingestion Configuration: MS Document Intelligence (Layout) Ingestion
More like this
Ingestion API
Read with this
Ingestion Configuration: MS Document Intelligence GA Version (deprecated)
Ingestion Configuration: MS Document Intelligence GA Version (deprecated)
More like this
Ingestion Events
Ingestion Events
Read with this
Ingestion Configuration: Image Content Extraction and MS Document Intelligence GA Version
Ingestion Configuration: Image Content Extraction and MS Document Intelligence GA Version
More like this

© 2025 Unique AG. All rights reserved. Privacy PolicyTerms of Service