Functionality
The document translator agent that makes use of tool calls to find out when it is ready to translate a document. To start the translation it needs to find out what language is desired and what document should be translated and to find this information it an use predefined tools (see Capabilities). This approach allows that you can talk to it about other things than translation as well.
The document translator functionality takes a document and tries to extract all texts and to translate them into the target language. The text pieces are grouped into a single prompt of a maximum length and few shot learning with regard to the input and target language is applied to build up the prompt sent to the LLM. These request have a maximum token size and the number we send is limited by the maximum allowed token consumption per minute.
Input and Output
Simply talk to the agent, ask it its capability and it will tell you how to use it.
Capabilities
The Document Translator has the following capabilities:
Document Selection: If multiple documents are uploaded, the translator can ask the user to select a specific document by showing them as buttons. Clicking on these buttons injects a prompt into the chat window.
Language Selection: The translator can present the user with proposed languages as buttons if the user asks for languages to translate to. Clicking on these buttons injects a prompt into the chat window.
Time estimation: The translator can estimate the time it will take to translate documents.
Known Issues
Small Document Upload
Cannot upload small documents in chat for translation.
[2024-10-24 08:25:58.373 +0200] ERROR (28723): IngestionWorkerError - Content has too little information to be useful
Translated Document Reappears
After translation, the translated document appears in the uploaded documents list.
Sheet Names Not Translated
Sheet names in xlsx files are not translated.
Bullet points within tables*
Bullet points within tables are removed. Text is still there.
Configuration settings (technical)
Reference in Code
DocumentTranslator
Tool Definition
{ "type": "function", "function": { "name": "DocumentTranslator", "parameters": { "type": "object", "required": [ "outputLanguage" ], "properties": { "inputDocument": { "type": "string", "description": "The name of the document to translate" }, "output_language": { "type": "string", "description": "The language used to generate the translation of the document." } } }, "description": "The document translator helps you translate documents from one language to another. After translation you can download the document directly." } }
Configuration
The following is the default configuration for the module, it consists of four sections that will be discussed in detail below.
{ "glossaryConfig": { "scope_id": null, "filename": null, "active": false }, "textPostProcessingConfig": [], "agentConfig": { "historyTokenLimit": 400, "languageModelName": "AZURE_GPT_4_0613", "proposedLanguagesForTranslation": ["English", "Spanish", "French", "German", "Italian", "Portuguese", "Dutch", "Russian", "Swedish" ], "supportedDocEndings": [".docx", ".xlsx", ".pptx", ".pdf"] }, "agentPromptConfig": { "systemPrompt": "You are a friendly person that helps our customers to translated uploaded documents.\nTherefore, you must find out which uploaded documents they want to translate and to what language before starting the translation.\nIf you cannot find the information from the conversation, nicely ask them to provide it or to upload a document.\nYou have access to tools that can help you with these tasks, but use only one at the time elsewise it is complicated for the user.\nNote that at the moment, only pdf, word, powerpoint and excel documents can be translated, thus the files must end with an .pdf, .docx, .xlsx or .pptx file extension.\nMention this if the user uploads other files." }, "translatorPromptConfig": { "systemPromptInstruction": "You are a helpful AI designed to to translate text to a specified language.\nDo it even if the target language is the same as the source language.\nMake sure the translated text contains the same amount of carriage returns '\\n' as the original text block.\nTry to keep the translated text as close to the original as possible and having approximately the same lenght.", "inputPromptInstructionTemplate": "List of text blocks to be translated to ${language}:\n\n<INPUT>\n${input_block}\n</INPUT>", "outputPromptInstructionTemplate": "<OUTPUT>\n${output_block}\n</OUTPUT>" }, "translatorUserConfig": { "languageModelName": "AZURE_GPT_4_0613", "maxTokensPerTranlationRequest": 1000, "maxTokenPerMinute": 40000, "allowedInputLanguages": [ "Afrikaans", "Albanian", "Arabic", "Aragonese", "Armenian", "Azeri", "Bashkir", "Basque", "Belarusian", "Bengali", "Bislama", "Bosnian", "Breton", "Bulgarian", "Burmese", "Catalan", "Chamorro", "Chechen", "Chinese", "Cornish", "Corsican", "Croatian", "Czech", "Danish", "Dutch", "English", "Esperanto", "Estonian", "Ewe", "Faroese", "Fijian", "Finnish", "French", "Galician", "Georgian", "German", "Greek", "Greenlandic", "Guaran\u00ed", "Haitian Creole", "Hausa", "Hebrew", "Hindi", "Hungarian", "Icelandic", "Ido", "Indonesian", "Interlingua", "Interlingue", "Inuktitut", "Irish", "Italian", "Japanese", "Javanese", "Kannada", "Kazakh", "Khmer", "Korean", "Kurdish", "Kyrgyz", "Lao", "Latin", "Latvian", "Limburgish", "Lingala", "Lithuanian", "Luxembourgish", "Macedonian", "Malagasy", "Malay", "Malayalam", "Maltese", "Manx", "Maori", "Marathi", "Marshallese", "Mongolian", "Navajo", "Nepali", "Northern Sami", "Norwegian", "Norwegian Bokm\u00e5l", "Norwegian Nynorsk", "Occitan", "Ojibwe", "Old Church Slavonic", "Ossetian", "Pashto", "Persian", "Polish", "Portuguese", "Punjabi", "Quechua", "Romanian", "Romansch", "Russian", "Samoan", "Sanskrit", "Sardinian", "Scottish Gaelic", "Serbian", "Serbo-Croatian", "Sichuan Yi", "Sindhi", "Slovak", "Slovene", "Somali", "Spanish", "Sundanese", "Swahili", "Swedish", "Tagalog", "Tahitian", "Tajik", "Tamil", "Tatar", "Telugu", "Thai", "Tibetan", "Tongan", "Tswana", "Turkish", "Turkmen", "Ukrainian", "Urdu", "Uyghur", "Uzbek", "Vietnamese", "Volap\u00fck", "Walloon", "Welsh", "West Frisian", "Yiddish", "Yoruba", "Zhuang", "Zulu" ] } }
Parameter Description
GlossaryConfig
The glossary is expected to be in an .xlsx
file within the knowledge base.
Parameter | Description | Default Value |
---|---|---|
| The scope id within the knowledge base | "" |
| The filename of the | "" |
| If the glossary is used or not | False |
{ "scope_id": "", "filename": "", "active": false }
PostProcessorConfig
The post processors are a list of text processors that are applied to translations
Parameter | Description | Default Value |
---|---|---|
| Name of the processor |
|
| If the processor is used or not | False |
| The languages that will be processed by this processor. | [] |
At the moment there are two valid processors "Replace sharp s with ss"
and "American to British"
[{ "name": "Replace sharp s with ss", "active": true, "applied_to_languages": ["German"] }, { "name": "American to British", "active": true, "applied_to_languages": ["English"] }]
AgentConfig
Parameter | Description | Default |
---|---|---|
| The number of tokens used from the history when calling the LLM. | 400 |
| The name of the language model to use for the agent. | "AZURE_GPT_4_TURBO_2024_0409" |
| The languages that the agent proposes to translate to. | ["English", "Spanish", "French", "German", "Italian", "Portuguese","Dutch","Russian", "Swedish"] |
| The supported document endings for translation. | [".pdf", ".docx", ".xlsx", ".pptx"] |
Example
{ "historyTokenLimit": 400, "languageModelName": "AZURE_GPT_4_TURBO_2024_0409", "proposedLanguagesForTranslation": ["English", "Spanish", "French", "German", "Italian", "Portuguese", "Dutch", "Russian", "Swedish" ], "supportedDocEndings": [".pdf", ".docx", ".xlsx", ".pptx"] }
AgentPromptConfi
❗Only adjust prompts if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.
Parameter | Description |
---|---|
| System prompt for document translation agent. |
Default Value
You are a friendly person that helps our customers to translate uploaded documents. Therefore, you must find out which uploaded documents they want to translate and to what language before starting the translation. If you cannot find the information from the conversation, nicely ask them to provide it or to upload a document. You have access to tools that can help you with these tasks, but use only one at a time else it is complicated for the user. Note that at the moment, only Word and Excel documents can be translated, thus the files must end with a .docx or .xlsx file extension. Mention this if the user uploads other files.
Example
{ "systemPrompt": "You are a friendly person that helps our customers to translate uploaded documents. Therefore, you must find out which uploaded documents they want to translate and to what language before starting the translation. If you cannot find the information from the conversation, nicely ask them to provide it or to upload a document. You have access to tools that can help you with these tasks, but use only one at a time else it is complicated for the user. Note that at the moment, only Word and Excel documents can be translated, thus the files must end with a .docx or .xlsx file extension. Mention this if the user uploads other files." }
TranslatorPromptConfig
❗Only adjust prompts if you are fully familiar with the code logic. Small changes can break the module or reduce the output quality.
Parameter | Description | Default Value |
---|---|---|
| System prompt instruction for the document translation module. | See below |
| Input prompt for translating an array of text blocks with word formatting tags to a specified language. | See below |
| Output prompt instruction template. | See below |
Default Values
systemPromptInstruction:
You are a helpful AI designed to translate text to a specified language. Do it even if the target language is the same as the source language. Make sure the translated text contains the same amount of carriage returns '\\\\n' as the original text block and keep the number of characters per line approximately the same.
inputPromptInstructionTemplate:
List of text blocks to be translated to ${language}: <INPUT> ${input_block} </INPUT>
outputPromptInstructionTemplate:
<OUTPUT> ${output_block} </OUTPUT>
Example
{ "systemPromptInstruction": "You are a helpful AI designed to translate text to a specified language. Do it even if the target language is the same as the source language. Make sure the translated text contains the same amount of carriage returns '\\n' as the original text block and keep the number of characters per line approximately the same.", "inputPromptInstructionTemplate": "List of text blocks to be translated to ${language}:\n\n<INPUT>\n${input_block}\n</INPUT>", "outputPromptInstructionTemplate": "<OUTPUT>\n${output_block}\n</OUTPUT>", }
TranslatorUserConfig
Parameter | Description | Default Value |
---|---|---|
| The model that will be used to translate between languages. | "AZURE_GPT_4_0613" |
| The maximum number of tokens that will be translated at once. If the model cannot handle more than this many tokens in a single request then it will be split into multiple requests. | 1000 |
| The maximum number of tokens available for translation tasks per minute. | 40000 |
| Languages that can be recognized to use correspondingly configured few-shot examples, glossary for translation and postprocessing of text. | ["Afrikaans", ..,"Zulu"] |
Full Example
{ "languageModelName": "AZURE_GPT_4_0613", "maxTokensPerTranlationRequest": 1000, "maxTokenPerMinute": 40000, "allowedInputLanguages": ["German", "Italian", "Spanish", "Russian"] }
Author |
---|