Excerpt | ||
---|---|---|
| ||
Overview of Unique FinanceGPT architecture and basic concepts. |
...
The ingestion workers are responsible for processing different types of files that are ingested into the system. They take in files such as PDFs, Word documents, Excel files, PowerPoint presentations, text files, CSV files, and Markdown files. Each type of file needs to be processed in a different way to extract the necessary information. For example, in the case of PDFs, the ingestion workers need to extract titles, subtitles, tables, and other relevant information. The workers convert the files into Markdown format, which helps preserve the structure of the document and allows the models to better understand and generate results based on the content. The ingestion workers also create chunks out of the documents, which are then saved into a vector database and a Postgres database. This process allows for efficient storage and retrieval of the ingested documents.
Ingestion Workflow
Below we show in example how a PDF can be parsed with our solution. There are several points in the process where custom code can be executed to achive the best parsing results. As the solution parses each page seperatley it scales nicley with more compute. Limits of 3rd party tooling must be keept in mind though.
The mode of what models and methods are used each run can be configured for:
each file submission (via api)
each folder (Scope)
space
gloablly
To achive the best results per files and document layouts.
for the other file-types similar flows exists.
Drawio | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Postgres and VectorDB
In the Unique Finance GPT platform, Postgres and VectorDB are used to store and retrieve data. Postgres is used to store the markdown and metadata of the ingested documents. It is also used to save the history of user interactions and chat logs. The metadata stored in Postgres includes information about the scope of access to documents, allowing for access control based on user roles and permissions.
...