1 Functionality
2 Input
3 Output
4 Configuration:
5 Migration to V2
- 5.1 What is new?
- 5.2 How to migrate the old configuration?

Functionality

The "Document Summarizer" module is designed to provide comprehensive summaries of entire documents. This module allows users to specify a document for summarization, and it will generate key takeaways and a full summary of the document, regardless of its length and based on the user's wishes for the style of summarization.

The AI assistant (module) performs the following tasks:

Decides whether the user query pertains to a summary task or might be a follow-up question
- Dependent on the user query the assistant will either run the summary or answer a follow-up question or clarify the user’s intention when the user did not reference any document to summarize, for instance.
Extracts the file reference and summary task description from the user query:
- From the user query, the assistant extracts the specific file reference which can be a specific document, a general topic or vague reference by “this” / “it” when uploaded to the chat window
Finds the relevant file to summarize:
- When the found files are ambiguous or not found, the user will be asked to select the correct file from a selection or to rephrase his query.
- Otherwise, the file will be summarized
Summarizes the file in a recursive way:
1. The found content will then be summarized in a recursive way by merging as many tokens into the context window as defined with the maxTokensPerSummaryStep parameter.
Streams the summary back to the user

Input

A user message requesting a summarization of a specific document.

Example scenario:

User query: "Please summarize the document I uploaded."
User clicks on the correct document and sends the request then the system process: Summarizes parts of the document and then combines these into a full summary.

Output

A comprehensive summary of the specified document, capturing key takeaways and essential information based on the user's wishes for the style of summarization

Configuration:

AI assistant module configuration (see variable explanation below)

{
    "agentConfig": {
        "languageModel": "AZURE_GPT_4_32K_0613",
        "historyTokenLimit": 1000,
        "historyMaxMessages": 10,
        "toolDefinition": {
            "name": "DocumentSummarizerV2",
            "parameters": {
                "type": "object",
                "required": [
                    "language",
                    "task_description",
                    "document_name_referenced_by_topic",
                    "document_name_referenced_by_this"
                ],
                "properties": {
                    "language": {
                        "type": "string",
                        "description": "The language used by the user (e.g. english, french, german, etc.)"
                    },
                    "document_name": {
                        "type": "string",
                        "description": "The name of the latest document, e.g., Quaterlyreport.pdf, that the user wants to  summarize."
                    },
                    "document_name_referenced_by_this": {
                        "type": "boolean",
                        "description": "A boolean value indicating whether the user referred to a document indirectly by 'this', 'this document', 'uploaded', 'the document', 'the file' or something similar."
                    },
                    "document_topic_referenced": {
                        "type": "string",
                        "description": "The topic of the document referenced by the user, e.g., 'data sharing' or 'help desk information'."
                    },
                    "task_description": {
                        "type": "string",
                        "description": "The latest summary task description given by the employee, e.g., 'summarize' or 'summarize using bullet points'."
                    }
                }
            },
            "description": "This function is triggered whenever the user intends to summarize a document. The document can either be referenced by a file name, a general topic or indirectly. When the user wants to summarize multiple documents, explain that this is currently not possible. Do not use this tool whenever a user wants to summarize or do something else of a previously given answer like 'summarize the last answer in two sentences'."
        }
    },
    "toolDocumentSummarizerConfig": {
        "languageModel": "AZURE_GPT_4_32K_0613",
        "systemMessage": "You are a helpful AI designed to summarize a document (delimited by <document>) using a given task description (delimited by <task_description>). First reason based on the task description whether the whole document or only parts of it are relevant for the summary, then proceed with the summary and output only the summary in markdown format. If none of the document is relevant for the summary, return an empty string.",
        "userMessage": "Summary task description:\n<task_description>\n$task_description\n</task_description>\n\nDocument:\n<document>\n$document_text\n</document>\n\nMarkdown output:",
        "maxTokensPerSummaryStep": 5000,
        "contentFinderConfig": {
            "scopeId": null,
            "limitScopeToChat": true,
            "limitOfOptionsToDisplay": 5,
            "limitContentChunkSearch": 50,
            "keepContentsNameEndsOn": [],
            "removeContentsNameEndsOn": []
        }
    }
}

Attribute	Description	Type

Attribute	Description	Type
`agentConfig`
`languageModel`	Defines the default model for making the tool call.	Text enum (depending on availability): “AZURE_GPT_4o“, ”AZURE_GPT_35”, …
`historyTokenLimit`	The maximum amount of tokens to consider from the history when doing the tool call.	Number
`historyMaxMessages`	The amount of messages from the history to consider when doing the tool call.	Number
`toolDefinition`	The `function` value from the function definition below.	Dictionary
`toolDocumentSummarizerConfig`
`languageModel`	Defines the default model for creating the summary.	Text enum (depending on availability): “AZURE_GPT_4o“, ”AZURE_GPT_35”, …
`systemMessage`	The summarization general task description	Text
`userMessage`	The user message that will trigger the summarization task. When changed, it requires the following string template parameter: - `task_description` - `document_text`	Text
`maxTokensPerSummaryStep`	Defines the maximum amount of tokens used for each summary generation request, e.g., when the document comprises 10 document chunks a 1000 tokens and max tokens is set to 5000, then 5 chunks will be used per summary request.	Number
`contentFinderConfig`
`scopeId`	When set, it defines the scope in which to find documents. By default it is the company scope.	Text \| null
`limitScopeToChat`	Limits the scope in which to find documents to the chat.	Boolean
`limitOfOptionsToDisplay`	The maximum amount of document names to display to the user whenever more than one document was found.	Number
`limitContentChunkSearch`	Limits the amount of total chunks to retrieve whenever the content finder will do a chunk search based on the user query.	Number
`keepContentsNameEndsOn`	Filter to keep contents that end of a certain name.	Array[Text]
`removeContentsNameEndsOn`	Filter to remove contents that end of a certain name.	Array[Text]

Prompts

Only adjust prompts if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.

Parameter	Description

Parameter

Description

systemMessage

The summarization general task description

userMessage

The user message that will trigger the summarization task. When changed, it requires the following string template parameter:

task_description
document_text

(Tool) Definition

Only adjust function definition if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.

{
    "type": "function",
    "function": {
        "name": "DocumentSummarizerV2",
        "parameters": {
            "type": "object",
            "required": [
                "language",
                "task_description",
                "document_referenced_by_topic",
                "document_name_referenced_by_this"
            ],
            "properties": {
                "language": {
                    "type": "string",
                    "description": "The language used by the user (e.g. english, french, german, etc.)"
                },
                "document_name": {
                    "type": "string",
                    "description": "The name of the latest document, e.g., Quaterlyreport.pdf, that the user wants to  summarize."
                },
                "document_name_referenced_by_this": {
                    "type": "boolean",
                    "description": "A boolean value indicating whether the user referred to a document indirectly by 'this', 'this document', 'uploaded', 'the document', 'the file' or something similar."
                },
                "document_topic_referenced": {
                    "type": "string",
                    "description": "The topic of the document referenced by the user, e.g., 'data sharing' or 'help desk information'."
                },
                "task_description": {
                    "type": "string",
                    "description": "The latest summary task description given by the employee, e.g., 'summarize' or 'summarize using bullet points'."
                }
            }
        },
        "description": "This function is triggered whenever the user intends to summarize a document. The document can either be referenced by a file name, a general topic or indirectly. When the user wants to summarize multiple documents, explain that this is currently not possible. Do not use this tool whenever a user wants to summarize or do something else of a previously given answer like 'summarize the last answer in two sentences'."
    }
}

Migration to V2

What is new?

Overall the new version is more user friendly and performing when summarizing documents. Specifically, the new version comes with:

an agentic approach to handle follow-up questions better and extract key parameters with function calling
a more sophisticated approach to find the correct content by combining search by filename with a semantic search
a recursive approach for summarization to handle arbitrary document lengths
possibility to adjust the maximum tokens to summarize per summary steps
a streaming approach for the final summary

How to migrate the old configuration?

The new version makes the configuration of the module more fine-granular based on its core components. To migrate to the Document Summarizer V2, the user has to translate the old configuration parameters as follows:

V1	V2 (see complete config above)

V1	V2 (see complete config above)
`languageModel: string`	`toolDocumentSummarizerConfig.languageModel`
`scopeIds: [string]`	`toolDocumentSummarizerConfig.contentFinderConfig.scopeId` (Please note the scope can be limited to one folder, if set to `null` the whole company scope will be considered)
`scopeToChatOnUpload: boolean`	`toolDocumentSummarizerConfig.contentFinderConfig.limitScopeToChat` (This will restrict the document search to the chat scope and replaces the work around with `["UPLOAD_ONLY"]` and `scopeIds` in the old version)
`systemPromptCombineSummaries: string` `triggerPromptCombineSummaries: string`	The new version requires only two prompts for the summarization, please see details above: `toolDocumentSummarizerConfig.systemMessage` `toolDocumentSummarizerConfig.userMessage`
`systemPromptPartialSummary: string` `triggerPromptPartialSummary: string`
`systemPromptNameExtraction: string` `triggerPromptNameExtraction: string`	Obsolete, the name will be extracted through function calling.

Author	@Martin Fadler

Public Documentation

Document Summarizer V2