Document Summarizer V2
Functionality
The "Document Summarizer" module is designed to provide comprehensive summaries of entire documents. This module allows users to specify a document for summarization, and it will generate key takeaways and a full summary of the document, regardless of its length and based on the user's wishes for the style of summarization.
The AI assistant (module) performs the following tasks:
Decides whether the user query pertains to a summary task or might be a follow-up question
Dependent on the user query the assistant will either run the summary or answer a follow-up question or clarify the user’s intention when the user did not reference any document to summarize, for instance.
Extracts the file reference and summary task description from the user query:
From the user query, the assistant extracts the specific file reference which can be a specific document, a general topic or vague reference by “this” / “it” when uploaded to the chat window
Finds the relevant file to summarize:
When the found files are ambiguous or not found, the user will be asked to select the correct file from a selection or to rephrase his query.
Otherwise, the file will be summarized
Summarizes the file in a recursive way:
The found content will then be summarized in a recursive way by merging as many tokens into the context window as defined with the
maxTokensPerSummaryStep
parameter.
Streams the summary back to the user
Input
A user message requesting a summarization of a specific document.
Example scenario:
User query: "Please summarize the document I uploaded."
User clicks on the correct document and sends the request then the system process: Summarizes parts of the document and then combines these into a full summary.
Output
A comprehensive summary of the specified document, capturing key takeaways and essential information based on the user's wishes for the style of summarization
Configuration:
AI assistant module configuration (see variable explanation below)
{
"agentConfig": {
"languageModel": "AZURE_GPT_4_32K_0613",
"historyTokenLimit": 1000,
"historyMaxMessages": 10,
"toolDefinition": {
"name": "DocumentSummarizerV2",
"parameters": {
"type": "object",
"required": [
"language",
"task_description",
"document_name_referenced_by_topic",
"document_name_referenced_by_this"
],
"properties": {
"language": {
"type": "string",
"description": "The language used by the user (e.g. english, french, german, etc.)"
},
"document_name": {
"type": "string",
"description": "The name of the latest document, e.g., Quaterlyreport.pdf, that the user wants to summarize."
},
"document_name_referenced_by_this": {
"type": "boolean",
"description": "A boolean value indicating whether the user referred to a document indirectly by 'this', 'this document', 'uploaded', 'the document', 'the file' or something similar."
},
"document_topic_referenced": {
"type": "string",
"description": "The topic of the document referenced by the user, e.g., 'data sharing' or 'help desk information'."
},
"task_description": {
"type": "string",
"description": "The latest summary task description given by the employee, e.g., 'summarize' or 'summarize using bullet points'."
}
}
},
"description": "This function is triggered whenever the user intends to summarize a document. The document can either be referenced by a file name, a general topic or indirectly. When the user wants to summarize multiple documents, explain that this is currently not possible. Do not use this tool whenever a user wants to summarize or do something else of a previously given answer like 'summarize the last answer in two sentences'."
}
},
"toolDocumentSummarizerConfig": {
"languageModel": "AZURE_GPT_4_32K_0613",
"systemMessage": "You are a helpful AI designed to summarize a document (delimited by <document>) using a given task description (delimited by <task_description>). First reason based on the task description whether the whole document or only parts of it are relevant for the summary, then proceed with the summary and output only the summary in markdown format. If none of the document is relevant for the summary, return an empty string.",
"userMessage": "Summary task description:\n<task_description>\n$task_description\n</task_description>\n\nDocument:\n<document>\n$document_text\n</document>\n\nMarkdown output:",
"maxTokensPerSummaryStep": 5000,
"contentFinderConfig": {
"scopeId": null,
"limitScopeToChat": true,
"limitOfOptionsToDisplay": 5,
"limitContentChunkSearch": 50,
"keepContentsNameEndsOn": [],
"removeContentsNameEndsOn": []
}
}
}
Attribute | Description | Type |
---|---|---|
| ||
| Defines the default model for making the tool call. | Text enum (depending on availability): “AZURE_GPT_4o“, |
| The maximum amount of tokens to consider from the history when doing the tool call. | Number |
| The amount of messages from the history to consider when doing the tool call. | Number |
| The | Dictionary |
| ||
| Defines the default model for creating the summary. | Text enum (depending on availability): “AZURE_GPT_4o“, |
| The summarization general task description | Text |
| The user message that will trigger the summarization task. When changed, it requires the following string template parameter: | Text |
| Defines the maximum amount of tokens used for each summary generation request, e.g., when the document comprises 10 document chunks a 1000 tokens and max tokens is set to 5000, then 5 chunks will be used per summary request. | Number |
| ||
| When set, it defines the scope in which to find documents. By default it is the company scope. | Text | null |
| Limits the scope in which to find documents to the chat. | Boolean |
| The maximum amount of document names to display to the user whenever more than one document was found. | Number |
| Limits the amount of total chunks to retrieve whenever the content finder will do a chunk search based on the user query. | Number |
| Filter to keep contents that end of a certain name. | Array[Text] |
| Filter to remove contents that end of a certain name. | Array[Text] |
Prompts
Only adjust prompts if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.
Parameter | Description |
---|---|
| The summarization general task description |
| The user message that will trigger the summarization task. When changed, it requires the following string template parameter:
|
(Tool) Definition
Only adjust function definition if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.
{
"type": "function",
"function": {
"name": "DocumentSummarizerV2",
"parameters": {
"type": "object",
"required": [
"language",
"task_description",
"document_referenced_by_topic",
"document_name_referenced_by_this"
],
"properties": {
"language": {
"type": "string",
"description": "The language used by the user (e.g. english, french, german, etc.)"
},
"document_name": {
"type": "string",
"description": "The name of the latest document, e.g., Quaterlyreport.pdf, that the user wants to summarize."
},
"document_name_referenced_by_this": {
"type": "boolean",
"description": "A boolean value indicating whether the user referred to a document indirectly by 'this', 'this document', 'uploaded', 'the document', 'the file' or something similar."
},
"document_topic_referenced": {
"type": "string",
"description": "The topic of the document referenced by the user, e.g., 'data sharing' or 'help desk information'."
},
"task_description": {
"type": "string",
"description": "The latest summary task description given by the employee, e.g., 'summarize' or 'summarize using bullet points'."
}
}
},
"description": "This function is triggered whenever the user intends to summarize a document. The document can either be referenced by a file name, a general topic or indirectly. When the user wants to summarize multiple documents, explain that this is currently not possible. Do not use this tool whenever a user wants to summarize or do something else of a previously given answer like 'summarize the last answer in two sentences'."
}
}
Migration to V2
What is new?
Overall the new version is more user friendly and performing when summarizing documents. Specifically, the new version comes with:
an agentic approach to handle follow-up questions better and extract key parameters with function calling
a more sophisticated approach to find the correct content by combining search by filename with a semantic search
a recursive approach for summarization to handle arbitrary document lengths
possibility to adjust the maximum tokens to summarize per summary steps
a streaming approach for the final summary
How to migrate the old configuration?
The new version makes the configuration of the module more fine-granular based on its core components. To migrate to the Document Summarizer V2, the user has to translate the old configuration parameters as follows:
V1 | V2 (see complete config above) |
---|---|
|
|
|
|
|
|
| The new version requires only two prompts for the summarization, please see details above:
|
| |
| Obsolete, the name will be extracted through function calling. |
Author | @Martin Fadler |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service