Vision apiPayload Config - 03.03.2025

apiPayload must be provided as a JSON compatible string. The below JSON object must therefore be converted to a string.
{
    "extractionMethod": "VISION",
    "languageModel": "AZURE_GPT_4o_2024_0806",
    "pageContentExtractorVisionConfig": {
        "systemPrompt": "You are a helpful assistant that extracts content from an image.",
        "userPrompt": "First identify all the different elements, e.g., charts or diagrams. Then extract all content for each element in a structured way. Use an html as structure where possible or use markdown. Make sure to preserve the information structure and the original text. Explain your reasoning for extracting the content.\n\nFollow these steps:\n1. Identify all elements in the image, e.g., charts, tables, logos, etc.\n2. Analyze which information belongs together and must be clustered.\n3. Then extract all content for each cluster in a structured way. \n4. Convert charts into tables when numerical values are present.\n5. Convert color coded values into text.\n6. Extract information from legends and assign it to the corresponding values.\n7. Ouput the image content in html where possible, otherwise use markdown.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you extract the content. Keep it short but complete.\",\n    \"image_content\": \"Here the extracted content\"\n}\n"
    },
    "pageContentOptimizerConfig": {
        "apply": false,
        "maxLoops": 2,
        "scoreThreshold": 0.95,
        "evaluatorSystemPrompt": "You are a helpful assistant that evaluates the quality of extracted content based on\na document image and the extracted content.\n",
        "evaluatorUserPrompt": "Please evaluate the quality of the extracted information using the document image.\n\nExtracted information: ${current_response}\n\nYour tasks:    \n1. Give instructions on how to improve the extracted information. Be as specific as possible.\n2. Assess whether the extracted information meets the following evaluation criteria:\n    - Information has been completely extracted from the image\n    - Information is structured logically and coherently as in the image\n    - Information is accurate as represented in the image\n    - Numerical values are correct and have a unit of measurement (e.g., 30% CAGR instead of 30%)\n    - Charts have been converted into tables when numerical values have been extracted\n    - No numerical values have been approximated or rounded or interpolated\n    - No values have been added that are not represented in the image\n    - Color coded values have been converted into text\n    - Information from legends have been correctly assigned to the corresponding values\n3. Give a score between 0 and 1 for the quality of the extracted information (0 is bad, 1 is perfect).\n\nExample output:\n{\n    \"improvement_instructions\": \"Here your specific instructions on how to improve the extracted information. Only outline the changes to be made, do not include any other text.\",\n    \"meets_criteria\": false, # Assessment of the criteria listed above, only return true if all relevant criteria are met\n    \"score\": 0.5 # Here the score between 0 and 1\n}\n",
        "generatorSystemPrompt": "You are a helpful assistant that improves content extracted from an image based on feedback\nand the original image.\n",
        "generatorUserPrompt": "Original extracted content: ${current_response}\nFeedback for improving the extracted content: ${feedback}\n\nAddress all the feedback and improve the extracted content.\nAlso explain how you addressed the feedback.\n\nExample output:\n{\n    \"reasoning\": \"Explain your decisions and reasoning on how you addressed the feedback\",\n    \"improved_content\": \"Here the improved extracted content\"\n}\n"
    }
}
Author	@Martin Fadler