Confluence OnPrem Connector (COPC)
Solution Overview
The COPC is a standalone, dockerized NodeJS application that runs on a configurable schedule and synchronizes the Confluence OnPrem data with the Unique FinanceGPT service.
The COPC uses the Confluence REST API to fetch the data and the Unique Ingestion API to ingest the data into the FinanceGPT chat.
Confluence users can use the label functionality of Confluence to determine which pages should get ingested.
There are two labels to choose from that indicate if a page should be synced with FinanceGPT:
ai-ingest
This label will sync the labeled pageai-ingest-all:
This label will sync the labeled page and all its sub-pages (recursively).
Pages that had their label removed will be deleted from the chat with the next sync.
The label names (ai-ingest and ai-ingest-all) can be changed via the env file.
The COPC uses a service user from confluence to make API requests. It is recommended that this service user is specifically created for the COPC and has the appropriate access rights to pages and spaces.
The COPC uses the following CQL (confluence query) to get the pages that should be synced:cql=(label="ai-ingest") OR (label="ai-ingest-all")&expand=metadata.labels,version&os_authType=basic&limit=${limit}&start=${start}
The COPC runs through all the labeled pages twice. One time to find all IDs of the pages that should be ingested and one time to ingest.
First Query Run:
It syncs these files with the file-diff endpoint of unique to determine which files are new and updated (to ingest), which files were deleted, and which files were moved.Second Query Run:
In the second run, the COPC goes through all pages (and the subpages) that need to be ingested one by one and ingests them via the Ingestion API
Docker Image
The COPC is publicly available as a docker image:
docker pull ghcr.io/unique-ag/confluence-connector:latest
General Recommendations
Use a PAT (Personal Access Token) for the confluence service user with the necessary access rights for authentication
Use the TEST_MODE=true when running it for the first time to observe the performance, duration, etc.
Use the COPC's GET endpoint
/sync
to manually trigger a synchronization. Example:localhost:8083/sync
Use the
CRON_SCHEDULE
only after the first initial real ingestion is finished. Once a night should suffice in most cases (0 1 * * *).Be conservative with the
CONFLUENCE_TOKENS_PER_MINUTE
rate limiter setting to not nuke your OnPremise Confluence Server.
Requirements
The connector must be able to reach the Confluence OnPrem installation and the Unique FinanceGPT.
The connector must have a user that has read access to all spaces and pages that should be synced. This can be either through a basic auth (username + password) or using a PAT (Personal Access Token) generated from confluence.
The connector must use the user that was provided by Zitadel to authenticate against the Unique Ingestion API
The Confluence OnPrem server version must be 6.13.23 or higher.
ENV Variables for the COPC:
To configure the COPC, the following env variables are available:
APP_PORT
(required)
The port of the COPC. Default: 8083
CLIENT_ID
(required)
The Zitadel service user that has permission to ingest data into FinanceGPT
CLIENT_SECRET
(required)
The Zitadel service user's access token
CONFLUENCE_TOKENS_PER_MINUTE
Rate limiter for the API requests to Confluence. 1 request = 1 token. Default: 250
CONFLUENCE_URL
(required)
The URL to your confluence server. On localhost this is http://localhost:1990/confluence
Important: Include the http / https prefix.
CONFLUENCE_PAT
(required or username/password)
Personal Access Token of the Confluence service user. The COPC will make the Confluence API requests with this user.
CONFLUENCE_USERNAME
CONFLUENCE_PASSWORD
For testing purposes. On localhost, these are both "admin".
CRON_SCHEDULE
Defines how often the COPC should sync the Confluence data with FinanceGPT using the cron format: "* * * * *"
INGESTION_URL
(required)
The ingestion endpoint of FinanceGPT. Example: https://gateway.<baseUrl>/ingestion/v1/content
Important: Include the http / https prefix.
INGEST_ALL_LABEL
(required)
The confluence label that defines which page and its sub-pages will get ingested (recursively). Default: "ai-ingest-all"
INGEST_SINGLE_LABEL
(required)
The confluence label that defines which page will get ingested. Default: "ai-ingest"
TEST_MODE
When test mode is set to true, the COPC will run the process without ingesting. Default: false
OAUTH_TOKEN_URL
The Zitadel endpoint generates a valid token for ingestion. Example: https://id.<baseUrl>/oauth/v2/token
Important: Include the http / https prefix.
PROJECT_ID
(required)
The FinanceGPT Project ID from Zitadel from which the service user will generate a token from
SCOPE_ID
The Knowledge Base scope where the data will be ingested to in FinanceGPT's. If no scope id is given, the connector will auto-create a scope for each space and ingest the documents in the respective scope.
DEBUG_MODE
When debug mode is set to true, all outputs are written into the log file. Default: false
Delete and reset ingested files manually
If your /sync doesnt automatically delete ingested files, it might be because of wrong configuration during testing and the files being associated to the wrong space / confluence instance / project / etc.
You can use the following DELETE endpoint of the COPC to manually trigger a reset which will delete all ingested confluence pages for a given scope id so you can start again with a clean slate:
localhost:8083/reset/:scopeId
It is possible that /reset needs specific parameters to identify the files correctly. For this you can provide it with a partialKey in the body. This might be your confluence url (same as from env value) or the space prefix (spaceId_spaceKey)
Examples:
Example Helmfiles
Example helmfiles can be found in the release repo: confluence-connector.yaml
Local Setup
Set up the Atlassian Plugin SDK to run a local confluence instance:
Follow this guide: https://developer.atlassian.com/server/framework/atlassian-sdk/set-up-the-atlassian-plugin-sdk-and-build-a-project/
Up until "create a plugin" but no need to do that part.
Run Atlassian instance:
The `atlassian-confluence/server` folder contains a tutorial on how to make a macro. We dont care about the macro, just the working server so we can access it locally.
From `atlassian-confluence/server` folder run the command `atlas-run`.
This will take some time on the first run. When done, you should be able to reach your local confluence instance at `localhost:1990/confluence`
Credentials for login locally:
Here is an example rest api url that gets all pages with `ai-ingest` and `ai-ingest-all` labels and expands them (so it's in the json response):
Read more about cql here: https://developer.atlassian.com/server/confluence/advanced-searching-using-cql/
You can expand empty string properties and they can contain data. Example _expandable.body is generally empty. However, if you add the query parameter &expand=body.value you will see the body.
Run the confluence scanner:
From add-ins/atlassian-confluence run
Author | see Parent |
---|
© 2024 Unique AG. All rights reserved. Privacy Policy – Terms of Service