Document Translator
Functionality
The document translator agent that makes use of tool calls to find out when it is ready to translate a document. To start the translation it needs to find out what language is desired and what document should be translated and to find this information it an use predefined tools (see Capabilities). This approach allows that you can talk to it about other things than translation as well.
The document translator functionality takes a document and tries to extract all texts and to translate them into the target language. The text pieces are grouped into a single prompt of a maximum length and few shot learning with regard to the input and target language is applied to build up the prompt sent to the LLM. These request have a maximum token size and the number we send is limited by the maximum allowed token consumption per minute.
Input and Output
Simply talk to the agent, ask it its capability and it will tell you how to use it.
Capabilities
The Document Translator has the following capabilities:
Document Selection: If multiple documents are uploaded, the translator can ask the user to select a specific document by showing them as buttons. Clicking on these buttons injects a prompt into the chat window.
Language Selection: The translator can present the user with proposed languages as buttons if the user asks for languages to translate to. Clicking on these buttons injects a prompt into the chat window.
Time estimation: The translator can estimate the time it will take to translate documents.
Known Issues
See here for limitations on the translated documentshttps://unique-ch.atlassian.net/wiki/spaces/PUB/pages/1006403719
Configuration settings
Reference in Code DocumentTranslator
Tool Definition
{
"type": "function",
"function": {
"name": "DocumentTranslator",
"parameters": {
"type": "object",
"required": [
"outputLanguage"
],
"properties": {
"inputDocument": {
"type": "string",
"description": "The name of the document to translate"
},
"output_language": {
"type": "string",
"description": "The language used to generate the translation of the document."
}
}
},
"description": "The document translator helps you translate documents from one language to another. After translation you can download the document directly."
}
}
Default Configuration
The following is the default configuration for the module, it consists of four sections that will be discussed in detail below.
{
"glossaryConfig": {
"scope_id": null,
"filename": null,
"active": false
},
"textPostProcessingConfig": [],
"agentConfig": {
"historyTokenLimit": 400,
"languageModelName": "AZURE_GPT_4_0613",
"proposedLanguagesForTranslation": ["English", "Spanish", "French", "German", "Italian", "Portuguese", "Dutch", "Russian", "Swedish" ]
},
"agentPromptConfig": {
"systemPromptTemplate": "You are a friendly person that helps our customers to translated uploaded documents.\n\n{% if file_types|length == 0 -%}\nUnfortunately at the moment the file translation feature has not been activated. \nIf asked for for file translation inform the user about this, otherwise have a normal conversation with the user.\n{%- endif -%}\n\n{%- if file_types|length > 0 -%}\nTherefore, you must find out which uploaded documents they want to translate and to what language before starting the translation.\nIf you cannot find the information from the conversation, nicely ask them to provide it or to upload a document.\nYou have access to tools that can help you with these tasks, but use only one at the time elsewise it is complicated for the user.\n\nAt the moment the following file types can be translated\n\n**Available file types for translation**\n{% for type in file_types -%}\n- {{ type }}\n{% endfor %}\n\nThus, the files must end with an\n{%- if extensions|length > 1 -%}\n {{ ' ' ~ extensions[:-1] | join(', ') }} or {{ extensions[-1] }}\n{%- else -%}\n {{ ' ' ~ extensions[0] }}\n{%- endif %}.\n\nMention the available file types if the user uploads other files.\n\n{% if uploaded_files|length == 0 -%}\n Currently the user has not uploaded any documents.\n{% elif uploaded_files|length == 1 -%}\n The only uploaded file is '{{ uploaded_files[0] }}'\n{% elif uploaded_files|length > 1 -%}\n The following list of documents was uploaded to the chat\n\n**Uploaded documents**\n {% for file in uploaded_files -%}\n - '{{ file }}'\n {% endfor %}\n{%- endif %}\n\n{%- endif -%}"
},
"translatorPromptConfig": {
"systemPromptInstruction": "You are a helpful AI designed to to translate text to a specified language.\nDo it even if the target language is the same as the source language.\nMake sure the translated text contains the same amount of carriage returns '\\n' as the original text block.\nTry to keep the translated text as close to the original as possible and having approximately the same lenght.",
"userMessageTemplate": "Please translate the following text pieces in {{format_style}} {% if input_language %}from {{input_language}} {% endif %}to {{output_language}}\n\n{% if glossary %}Use the following translation rules \n {{ glossary_text }}{% endif %}\n\n\n{{formatted_text_pieces}}"
},
"translatorUserConfig": {
"languageModelName": "AZURE_GPT_4_0613",
"maxTokensPerTranlationRequest": 500,
"maxTokenPerMinute": 40000,
"allowedInputLanguages": [
"Afrikaans", "Albanian", "Arabic", "Aragonese", "Armenian", "Azeri", "Bashkir", "Basque", "Belarusian", "Bengali", "Bislama",
"Bosnian", "Breton", "Bulgarian", "Burmese", "Catalan", "Chamorro", "Chechen", "Chinese", "Cornish", "Corsican", "Croatian",
"Czech", "Danish", "Dutch", "English", "Esperanto", "Estonian", "Ewe", "Faroese", "Fijian", "Finnish", "French", "Galician",
"Georgian", "German", "Greek", "Greenlandic", "Guaran\u00ed", "Haitian Creole", "Hausa", "Hebrew", "Hindi", "Hungarian", "Icelandic",
"Ido", "Indonesian", "Interlingua", "Interlingue", "Inuktitut", "Irish", "Italian", "Japanese", "Javanese", "Kannada", "Kazakh", "Khmer",
"Korean", "Kurdish", "Kyrgyz", "Lao", "Latin", "Latvian", "Limburgish", "Lingala", "Lithuanian", "Luxembourgish", "Macedonian", "Malagasy",
"Malay", "Malayalam", "Maltese", "Manx", "Maori", "Marathi", "Marshallese", "Mongolian", "Navajo", "Nepali", "Northern Sami", "Norwegian", "Norwegian Bokm\u00e5l",
"Norwegian Nynorsk", "Occitan", "Ojibwe", "Old Church Slavonic", "Ossetian", "Pashto", "Persian", "Polish", "Portuguese", "Punjabi", "Quechua", "Romanian",
"Romansch", "Russian", "Samoan", "Sanskrit", "Sardinian", "Scottish Gaelic", "Serbian", "Serbo-Croatian", "Sichuan Yi", "Sindhi", "Slovak", "Slovene",
"Somali", "Spanish", "Sundanese", "Swahili", "Swedish", "Tagalog", "Tahitian", "Tajik", "Tamil", "Tatar", "Telugu", "Thai", "Tibetan", "Tongan",
"Tswana", "Turkish", "Turkmen", "Ukrainian", "Urdu", "Uyghur", "Uzbek", "Vietnamese", "Volap\u00fck", "Walloon", "Welsh", "West Frisian", "Yiddish",
"Yoruba", "Zhuang", "Zulu" ]
}
}
Configuration Parts
The document translator uses multiple configuration corresponding to different services. More information for each part can be found following the links below:
Author | @Cedric Klinkert |
---|
Related content
© 2025 Unique AG. All rights reserved. Privacy Policy – Terms of Service