MDI+Vision apiPayload Config - 03.03.2025

apiPayload must be provided as a JSON compatible string. The below JSON object must therefore be converted to a string.
{
    "extractionMethod": "MDI_VISION",
    "languageModel": "AZURE_GPT_4o_2024_0806",
    "pageContentExtractorMdiConfig": {
        "useHighResolution": true
    },
    "imageContentExtractorConfig": {
        "imagesInParallel": 3,
        "classifierSystemPrompt": "You are an image classifier assistant and help to classify the contents of a cropped image of a document page.\n\nYou are given the whole document page as a reference and the cropped image that you should classify.\n",
        "classifierUserPrompt": "First locate the cropped image within the document page. Only then classify the cropped image into one of the following categories: \n- chart_with_numerical_values: A chart in which numerical text values are present (do not consider the axis values) and can be extracted with high accuracy.\n- chart_without_numerical_values: A chart in which numerical text values are not present (do not consider the axis values) and cannot be extracted with high accuracy.\n- table_structure: A structure that displays data in a tabular format with headers, rows, columns and cells.\n- mixed_content: A combination of different content types, e.g., charts and tables and logos.\n- logo: A logo of a company or brand.\n- icon: A single icon that is a symbol for a tool, product, service, etc, e.g., a tool icon.\n- illustrative_picture: An illustrative picture that only serves to illustrate the text and does not contain any useful, related information.\n\nIn addition to the category, explain your reasoning why you chose the category.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you classify the image. Keep it short but complete.\",\n    \"category\": \"Here the category\"\n}\n",
        "documentReferencePrompt": "Here is the whole document page as a reference:\n",
        "extractorCategoryToSystemPrompts": {
            "chart_with_numerical_values": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page. The cropped image contains a chart with numerical values.\n",
            "chart_without_numerical_values": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page. The cropped image contains a chart without numerical values.\n",
            "default": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page.\n",
            "logo": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page. The cropped image contains a logo.\n",
            "mixed_content": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page. The cropped image contains mixed content, e.g., diagram and table.\n",
            "table_structure": "You are an image content extractor and help to extract information in a structured form from a cropped image of a document page. The cropped image contains a table like structure.\n"
        },
        "extractorCategoryToUserPrompts": {
            "chart_with_numerical_values": "Extract the chart data and structure from the image as a html table and explain your reasoning.\n\nFollow these steps:\n1. Clearly separate what belongs to the chart and what does not using the document image as a reference.\n2. Only consider what belongs to the chart and exclude any information that does not belong to it.\n3. Extract a maximum of ten question and answer pairs about the charts content.\n4. Then combine the found answers to a description. Do not include the questions in the description. Only describe what the chart is about or describes, not the technical elements of the chart, e.g., \"the chart has a x-axis and a y-axis\".\n5. Represent the text and numerical values of the chart as a table. Do not approximate values or make assumptions.\n6. When color coded values are present in the chart, represent them in the table as text values.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you create the html table and the description. Keep it short but complete.\",\n    \"image_content\": \"Here the html table and the description of the chart\"\n}\n",
            "chart_without_numerical_values": "Describe the chart in a meaningful way and describe your reasoning.\n\nFollow these steps:\n1. Clearly separate what belongs to the chart and what does not using the document image as a reference.\n2. Only consider what belongs to the chart and exclude any information that does not belong to it.\n3. Extract a maximum of ten question and answer pairs about the charts content. Do not approximate values or make assumptions.\n4. Then combine the found answers to a description. Do not include the questions in the description. Only describe what the chart is about or describes, not the technical elements of the chart, e.g., \"the chart has a x-axis and a y-axis\".\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you create the description. Keep it short but complete.\",\n    \"image_content\": \"Here the description\"\n}\n",
            "default": "Extract a maximum of ten text question and answer pairs from the image. Then combine the found answers to a description. Do not include the questions in the description. \n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you extract the content. Keep it short but complete.\",\n    \"image_content\": \"Here the description\"\n}\n",
            "logo": "Output the company or brand name from the image. Output only the name and nothing else. If the company or company name is unknown to you, then output only the text if possible otherwise nothing.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you extract the logo. Keep it short but complete.\",\n    \"image_content\": \"Here the company or brand name\"\n}\n",
            "mixed_content": "First identify all the different elements, e.g., charts or diagrams. Then extract all content for each element in a structured way. Use an html as structure where possible or use markdown. Make sure to preserve the information structure and the original text. Explain your reasoning for extracting the content.\n\nFollow these steps:\n1. Identify all elements in the image, e.g., charts, tables, logos, etc.\n2. Analyze which information belongs together and must be clustered.\n3. Then extract all content for each cluster in a structured way. \n4. Ouput the image content in html where possible, otherwise use markdown.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you extract the content. Keep it short but complete.\",\n    \"image_content\": \"Here the extracted content\"\n}\n",
            "table_structure": "Extract the table like structure from the image as a html table and explain your reasoning.\n\nFollow these steps:\n1. Clearly separate what belongs to the table and what does not using the document image as a reference.\n2. Carefully think about the structure of the table. \n3. Extract the headers (columns/rows) first.\n4. Then assign the cells to the headers and make sure to merge cells whenever they span multiple columns/rows.\n5. Correctly extract the values in the cells and align them with the extracted structure.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you create the html table. Keep it short but complete.\",\n    \"image_content\": \"Here the html table\"\n}\n"
        },
        "noExtractionForCategories": [
            "illustrative_picture",
            "icon"
        ]
    },
    "pageContentOptimizerConfig": {
        "apply": false,
        "maxLoops": 2,
        "scoreThreshold": 0.95,
        "evaluatorSystemPrompt": "\nYou are a helpful assistant that evaluates the quality of extracted content based on\na document image and the extracted content.\n",
        "evaluatorUserPrompt": "\nPlease evaluate the quality of the extracted information using the document image.\n\nExtracted information: ${current_response}\n\nYour tasks:    \n1. Give instructions on how to improve the extracted information. Be as specific as possible.\n2. Assess whether the extracted information meets the following evaluation criteria:\n    - Information has been completely extracted from the image\n    - Information is structured logically and coherently as in the image\n    - Information is accurate as represented in the image\n    - Numerical values are correct and have a unit of measurement (e.g., 30% CAGR instead of 30%)\n    - Charts have been converted into tables when numerical values have been extracted\n    - No numerical values have been approximated or rounded or interpolated\n    - No values have been added that are not represented in the image\n    - Color coded values have been converted into text\n    - Information from legends have been correctly assigned to the corresponding values\n3. Give a score between 0 and 1 for the quality of the extracted information (0 is bad, 1 is perfect).\n\nExample output:\n{\n    \"improvement_instructions\": \"Here your specific instructions on how to improve the extracted information. Only outline the changes to be made, do not include any other text.\",\n    \"meets_criteria\": false, # Assessment of the criteria listed above, only return true if all relevant criteria are met\n    \"score\": 0.5 # Here the score between 0 and 1\n}\n",
        "generatorSystemPrompt": "\nYou are a helpful assistant that improves content extracted from an image based on feedback\nand the original image.\n",
        "generatorUserPrompt": "\nOriginal extracted content: ${current_response}\nFeedback for improving the extracted content: ${feedback}\n\nAddress all the feedback and improve the extracted content.\nAlso explain how you addressed the feedback.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you addressed the feedback\",\n    \"improved_content\": \"Here the improved extracted content\"\n}\n"
    }
}
Author	@Martin Fadler