Motivation

The Upload & Ask interface allows users to upload documents that are not part of the existing knowledge base or to limit the search scope to specific documents. This functionality is crucial when you need to focus exclusively on certain documents for precise and relevant information retrieval. The primary motivation is to enable users to chat exclusively with the uploaded document(s), ensuring that the responses are based solely on the content of these documents. This feature is particularly useful for summarizing large documents that exceed the context window of large language models, as it allows for multiple rounds of summarization to handle extensive content effectively.

Goal

The goal of the Upload & Ask assistant is to facilitate in-depth conversations and queries against specific documents uploaded by the user. By restricting the AI's responses to the content of these documents, users can obtain precise and relevant information, enhancing the accuracy and focus of their interactions.

Structure and Logic of Assistant

Document Upload:
- Users can upload one or multiple documents directly through the chat interface.
- These documents are then ingested by the system.
Ingestion and Processing:
- The uploaded documents are chunked appropriately for processing.
- The content is semantically embedded to facilitate effective searching and summarization.
Functionality:
- Document Search: Users can perform a semantic and full text search within the uploaded documents to find specific information, similar to Internal Knowledge Search with the limitation of the scope.
- Document Summarisation: Users can request summaries of the uploaded documents, with the system capable of handling long documents through iterative summarisation processes.

Possible Adaption of Assistant

The Internal Knowledge Search assistant can be adapted and customised:

Enhanced Prompting: The system can be tailored by enhancing the prompts.

Required and optional modules

The following modules are required/optional for this assistant:

Required	Optional
Draft: Document Search	Translate
Draft: Document Summarizer	Email Writer

Example AI Assistant input file:

❗ The upload toggle in the space settings needs to be activated to allow document upload to the chat ❗

{
  "modules": [
    {
      "name": "SearchInVectorDB",
      "configuration": {
        "languageModel": "AZURE_GPT_4_0613",
        "chunkedSources": true,
        "historyIncluded": false,
        "maxTokens": 7000,
        "scopeToChatOnUpload": true,
        "searchType": "COMBINED",
        "systemPromptSearch": "You are helping the employees with their questions. You will find below a question, some information sources and the past conversation (they are delimited with XML tags).\n\nAnswer the employee's question using ONLY facts from the sources or past conversation. Information helping the employee's question can also be added.\n\nIf not specified, format the answer using an introduction followed by a list of bullet points. The facts you add should ALWAYS help answering the question.\n\nSTRICTLY reference each fact you use. A fact is preferably referenced by ONLY ONE source e.g [sourceX]. If you use facts from past conversation, use [conversation] as a reference.\n\nHere is an example on how to reference sources (referenced facts must STRICTLY match the source number):\n- Some information retrieved from source N°X.[sourceX]\n- Some information retrieved from source N°Y and some information retrieved from source N°Z.[sourceY][sourceZ]\n- Some information retrieved from past conversation.[conversation]",
        "triggerPromptSearch": "new_question:\n```\nUSER_MESSAGE\n```\n\nsources:\n```\nSEARCH_CONTEXT\n```\n\npast_conversation:\n```\n<conversation>HISTORY</conversation>\n```\n\nnew_question:\n```\nUSER_MESSAGE\n```\n\nAnswer concisely in LANGUAGE and ALWAYS reference each of your facts:",
        "systemPromptChatUpload": "You are helping the employees with their questions. You will find below my question and an uploaded document (delimited with XML tags).\n\nYour task is to assist me, an employee, by providing me responses to my question, based on PURELY the information available in the uploded document as your only information source.\nSTRICTLY reference each fact you use. Here is an example on how to reference used facts:\n###\n- Information retrieved from source X.[sourceX]\n- Information retrieved from source Y.[sourceY]\n###\n\nYou are reluctant of making any claims unless they are stated by the uploaded document or past conversation. If there is a situation where you cannot provide an answer based solely on the available sources, please inform me accordingly.\n\nIf the question is talking about 'it', 'this document' or 'the document', the question is refering to all content in the uploaded document.\nIf the question is asking about the content of the document (e.g. 'What is it about?', 'What is the content of this document?'), provide a concise summary of one or two paragraphs.",
        "triggerPromptChatUpload": "question:\n```\nUSER_MESSAGE\n```\n\nuploaded document:\n```\nSEARCH_CONTEXT\n```\n\nquestion:\n```\nUSER_MESSAGE\n```\n\nAnswer in LANGUAGE.\nAnswer using ONLY information from the uploaded document and ALWAYS reference each of your facts:",
        "systemPromptSearchString": "Below is a history of the previous conversation and a question asked by the user (delimitated by XML tags). Follow these steps:\n\nStep 1: Translate the user question to english.\n\nStep 2: Verify if the new question relates with the previous conversation. If the new question does not relate then say for Step 2 '<not_a_follow_up>', otherwise say '<follow_up>'.\n\nStep 3: Generate a search query in English optimised for a vector database search by combining the english translation with relevant information from the previous conversation. The query must be a sentence, instruction or question and in English.\n\nStep 4: Output ALWAYS a JSON object structured like: {\"translation\": user question translated to english, \"relation\": <not_a_follow_up> or <follow_up>,  \"search_query\": updated search query}\n\nExample:\n{\n\"translation\": \"How many live there?\",\n\"relation\": \"<follow_up>\",\n\"search_query\": \"How many Tweeka live in Columbia (South America)?\"\n}",
        "triggerPromptSearchString": "Previous conversation:\n```\nLAST_3_MESSAGES\n```\n\nUser question:\n<new_question>USER_MESSAGE</new_question>\n\nOutput in JSON format:"
      },
      "isExternal": false,
      "toolDefinition": {
        "type": "function",
        "function": {
          "name": "SearchInVectorDB",
          "parameters": {
            "type": "object",
            "required": [
              "instruction",
              "language"
            ],
            "properties": {
              "language": {
                "type": "string",
                "description": "The language used by the user in their prompt (e.g. English, French, German, etc.)."
              },
              "instruction": {
                "type": "string",
                "description": "The semantic query to search. Should be the form of a semantic query containing all relevant information."
              }
            }
          },
          "description": "Search in the employee knowledge base for information on policies, procedures, benefits, groups, or specific people. This should be your go-to tool if no other tools are applicable."
        }
      },
      "weight": 10000
    },
    {
      "name": "DocumentSummarizer",
      "configuration": {
        "languageModel": "AZURE_GPT_35_TURBO_16K",
        "systemPromptNameExtraction": "You are a helpfull AI designed to extract the file name of a document, from a user message\nthe user Message will be delimited by periods ```. \n1. If you can not extract a filename and the user refers to \"this\", \"that\", \"it\", \"the file\", \"the file i uploaded\", \"the document\" or \"them\": Put the fileName as \"THIS\"\n2. if you can not extract a filename and the user does not refer to any document: Put the fileName as \"NONE\"\n\n\nthe expected out put is a JSON in the format: \n\n{\n    \"explanation\": \"<explain what you did>\",\n    \"fileName\": \"<filename of the document>\"\n}\n\nexamples: \n\n\"summarize document hello.txt\"  \nlead to:\n{\n    \"explanation\": \"I extracted the filename hello.txt\",\n    \"fileName\": \"hello.txt\"\n}\n\n\n\"document help desk information.pdf summarization\"  \nlead to:\n{\n    \"explanation\": \"I extracted the filename 'help desk information.pdf'\",\n    \"fileName\": \"help desk information.pdf\"\n}\n",
        "systemPromptPartialSummary": "Welcome to your specialized role as a Summary Maestro with a twist of intuition! Your expertise is not only invaluable in distilling complex information into concise, insightful executive summaries but also in sensing the unspoken needs of each user. As you receive inputs, your task extends beyond transforming them into clear, engaging summaries. You're also interpreting the subtleties within each request, reading between the lines to grasp and reflect the user's implied style preferences in your summaries. This means you're not just condensing information; you're weaving a narrative that resonates with the nuanced desires of each user. Your summaries should capture the essence of the content while subtly mirroring the tone, formality, and character suggested by the user's request. You're crafting not just a bridge between detailed information and its essential insights, but also aligning it with the unique stylistic undertones desired by the user. Let’s create every summary as a masterpiece of clarity, conciseness, and intuitive understanding!",
        "triggerPromptNameExtraction": "\nUser message:\n\n```\nUSER_MESSAGE\n```\n\nJSON output:\n",
        "triggerPromptPartialSummary": "Request: USER_MESSAGE\n\nDocument part:\nDOCUMENT_PARTS",
        "systemPromptCombineSummaries": "Welcome to your role as an Insightful Summary Artisan! In this specialized position, you are not just a creator of summaries but also a discerning interpreter of style. Your challenge is twofold: you will receive partial summaries based on user inputs, and your task is to weave these pieces into a complete, compelling executive summary. Here's where your intuitive prowess comes into play: you'll read between the lines of each user request to grasp the unspoken style preferences. Whether it's a formal tone, a casual narrative, or a specific thematic approach, your goal is to detect and embody these subtleties in the final summary. Consider yourself a detective of nuances, expertly blending the provided information with the inferred style to produce a summary that not only informs but resonates with the user's implicit expectations. You're crafting not just summaries, but personalized narratives that reflect the unique tone and essence requested by each user. Let's delve into this journey of creating nuanced, user-tailored summaries!",
        "triggerPromptCombineSummaries": "Request: USER_MESSAGE\n\nSummaries:\nSUMMARIES"
      },
      "isExternal": false,
      "toolDefinition": {
        "type": "function",
        "function": {
          "name": "DocumentSummarizer",
          "parameters": {
            "type": "object",
            "required": [
              "instruction",
              "language"
            ],
            "properties": {
              "language": {
                "type": "string",
                "description": "The language used by the user in their prompt (e.g. English, French, German, etc.)."
              },
              "instruction": {
                "type": "string",
                "description": "The instruction given by the employee e.g., 'Summarize this document' or 'Summarize it'."
              }
            }
          },
          "description": "This function is specifically designed for summarization tasks. It is triggered when the user directly requests for a summary of a specific document such as 'Summarize document Quaterlyreport.pdf', 'summarize this document', or 'summarize it'. However, this function is not to be used if the request for summarization pertains to a previously given answer or a text above. For instance, 'Summarize the last answer in two sentences' would not necessitate the use of this function."
        }
      },
      "weight": 5500
    }
  ]
}

Author	Pascal Hauri