Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

The SharePoint Connector connects Microsoft SharePoint™️ from M365 to Unique FinanceGPT via Microsoft PowerAutomate™️. This document outlines the architecture, flows, requirements, setup and limitations.

The SharePoint Connector connects Microsoft SharePoint™️ from M365 (not necessarily On Premise or other constructs) to Unique FinanceGPT via Microsoft PowerAutomate™️.

Connector Variants

Currently only 1 version is supported by this concept: Microsoft Sharepoint from Microsoft - the cloud version.

Versions without Power Automate are not supported for now

Other deployment models can be supported if they also offer Power Automate and Custom Connectors or if Unique or the customer further invests into other platforms.

  • Power Automate & Microsoft SharePoint (Cloud / Microsoft 365) - (tick)

  • Power Automate & Microsoft SharePoint (on premise) - 🔍 to be investigated

  • Power Automate & on premise file system - 🔍 to be investigated

Architecture

The architecture detailed here is based on the Power Automate & Microsoft SharePoint (Cloud / Microsoft 365) variant.

The Unique SharePoint connector consists of 4 Power Automate flows (Automated Cloud Flows) that will be provided by Unique as an exported Power Automate Solution (ZIP file).

Clients need to import and setup the Solution in their own environment’s Power Automate.

The architecture of the SharePoint connector with the scheduled sync approach makes the SharePoint integration more scalable and more stable compared to an event based approach. The reason for this is that event based SharePoint Triggers in Power Automate have a one-to-one relationship to a SharePoint library, whereas the scheduled approach uses the SharePoint API and can process multiple libraries with the same Power Automate Solution.

The benefits of following the scheduled approach over an event based approach (using SharePoint Triggers) are:

  • Multiple libraries can be synced without the need to duplicate Power Automate flows.

  • More reliable due to not depending on SharePoint trigger events

  • Scheduled operation allows to control time and frequency of the sync

In the following sections, the logic of the SharePoint connector is explained in an overview diagram and the individual flows are explained in their dedicated section.

Sharepoint Synchronisation Workflows.png

Control what is available in the Knowledge Center

To enable files for ingestion into the Knowledge Center in Unique FinanceGPT a custom column has to be added in SharePoint that allows you to control if a file is enabled or disabled for use in Unique FinanceGPT.

A default value can be set for this column value:

Default value

Behavior

enabled

Files get ingested by default when they are added / created in the library.

Knowledge can be removed from Unique by manually setting the column value for an entry to disabled.

disabled

Files do not get ingested by default when they are added / created in the library.

Knowledge can be added to Unique by manually setting the column value for an entry to enabled. This will trigger the ingestion process for that file.

Knowledge can be removed again from Unique by manually setting the column value for an entry to disabled.

More on how to setup the custom column in SharePoint follows in the “Setup” section.

File ingestion process

To be able to ingest the content into the Knowledge Center, Unique needs to access the content of the file. We do this securely by temporarily uploading the content to an Azure Blob Storage and creating an SAS URI that provides limited time (3 hours by default) read access to the specified file. This SAS URI will then be passed to the Unique Ingestion API along with the file’s metadata and will be used to read the file content and ingest it as knowledge into the Knowledge Center.

A Blob Storage within the Unique environment (Landing Zone) is used to temporarily store the files for the purpose of ingesting the content into the knowledge center.

The flow where the Blob Storage is used works like this:

  1. Power Automate Flow is run by scheduler

  2. Unique API is called with the file’s metadata

    1. Unique “reserves” a path on the Blob Storage where the file can get uploaded to

    2. Two SAS URIs are created for that path, one with read and one with write permission

    3. The Unique API returns both SAS URIs to the Power Automate flow

  3. The write SAS URI is used to upload the file to the Blob Storage.

  4. Unique API is called again with the read SAS URI to ingest the file

    1. Unique reads and ingests the file from the Blob Storage

    2. The file will expire (get deleted) on the Blob Storage after 1 day

The Blob Storage will be setup so that the files stored there will expire 1 day after they’ve been created there.

Permissions

The permission concept allows anyone who has the edit permission for a file to enable or disable it for the use in Unique FinanceGPT.

This approach requires no further setup. All users that have the edit permission to change the custom column in SharePoint or add / remove files to the SharePoint library can manage content for Unique FinanceGPT.

It is important to understand, that everyone who can edit a file can also manage its use in Unique FinanceGPT.

This option requires to educate all users with edit permissions about Unique FinanceGPT so they know how to manage the content they have access to.

Authorization

In order to call the Unique API, the Power Automate flows need to have a valid access token to authorize themselves with. The request to get this token is also made in the Power Automate flows and it uses Basic auth and client credentials to get the token.

The client credentials are provided to the customer by Unique. The customer will receive a clientId / clientSecret and is responsible for keeping those safe.

There are two options for how to store client credentials and access them in the flows:

  • Store in Azure Key Vault (recommended) and use the Azure Key Vault connector in Power Automate to fetch the values from there.

  • Store in variable directly in the flow and get direct access to the values in the flow.

We recommend to use the first option and store the client credentials in Azure Key Vault and access them from there using the connector in the Power Automate flows. This is more secure and the client credentials will never be visible to anyone with access to the Power Automate Flows. This option requires to use the Azure Key Vault connector in your flows.

The second option is less secure, because the client credentials are stored as plain text in the variables in the Power Automate flows. You can enable secure inputs / outputs on the variables, so the secrets will not show up in the flow runs output. However, they will still be visible to everyone that can edit the flows.

The “IDP Get Access Token” flow handles this part - more on this in the “Setup” section.

Scoping

The scoping feature is a way for the customer to define which users or groups have access to which files. This feature can be also enabled within the Sharepoint connector (for this we created another Solution).

How does it work?

The scoping will be based on the folder structure in the Sharepoint library. One folder represents one scope.

Example:

Library

--- Folder A

-------- Folder B

--- Folder C

In this case we create 3 scopes on unique side (one scope for Folder A, one for Folder B, one for Folder C). Even though Folder B is nested into Folder A, it will have its own scope and on unique side the customer needs to specify who has access to it (so a user that has access to the scope that represents Folder A does not automatically have access to the scope that represents Folder B (e.g. hierarchies are not taken into account).

When the customer wants to use the scoping feature, we provide the scoping feature solution that contains the following steps:

  • When a file is created or modified within a folder, we check if a scope already exists for the folder. If not, we create it on unique side and then ingest the file with that scope

  • When a folder with files is deleted, the scope is also deleted on unique side

  • Please note that each time a new scope is created no user has access to it (this needs to be set up manually on unique side which user has access to which scope).

During development some limitations were discovered that are listed here.

Scope naming conventions

The scopes that are synced to Unique from Sharepoint will have an externalId attribute (thats how they can be differentiated from internal scopes) and use the following patterns:

  • externalId: ext_SharepointLibraryName_CurrentFolderId (example: ext_Documents_1)

  • name: absolute file path to current folder (example: https://uniqueapp.sharepoint.com/sites/QA/Documents/Private/)

Power Automate Flows

“Scheduler” flow

The “Scheduler” flow in the solution is responsible for triggering the sync of the files from SharePoint to Unique’s knowledge base. This flow is straightforward and consists of only 2 actions.

  • Recurrence

    • Clients can setup the frequency with which they want to run the sync

  • Run a Child Flow

    • Triggers the sync process by calling the “Sharepoint Files Scan” flow

 

Sharepoint Files Scan flow

The “Sharepoint Files Scan” flow in the solution holds the main logic part and loops through the sites and their libraries. In the process it calls the child flows “IDP Get Access Token” and “Ingest Content” which are described in more detail in their dedicated sections.

A list of SharePoint Sites that contain content to sync in some or all of their libraries can be provided as environment variable. The main logic in the flow uses this input and executes a series of nested loops.

  • For each provided SharePoint Site → get all libraries

  • For each found library

    • Check if FinanceGPT column exists (that means there is content to be synced)

    • Get properties of all files that are enabled for syncing

    • For each file → add it to the array of files to be synced

    • Call Unique “file sync” API with the list of files and compare with the already existing content

      • Deleted files → if content exists in the knowledge base but the matching is no longer in the enabled files list, the content gets deleted from the knowledge base.

      • Modified / Moved files → if content exists in the knowledge base and the matching files has been updated, they will be added to the list for ingestion.

      • New files → if there are new enabled files that do not have a matching content in the knowledge base, they will be added to the list for ingestion.

    • Call the child flow “Ingest Content” with the information about the Site, Library and the list of new and modified files returned from the Unique API.

Ingest Content flow

The “Ingest Content” flow is responsible for handling the calls to the Unique ingestion API and uploading the file content to the blob storage for ingestion. The “IDP Get Access Token” flow is called as well in this flow to get a token for the API calls. The flow then loops through the provided file id list of new and modified files.

For each file id:

  • Get the file’s properties and metadata

  • Get the content of the file (condition for different ways to get content for files and SharePoint Pages)

  • Call Unique Ingestion API

    • 1. Call to create the content in the backend using the metadata information. Returns the read- & write-urls to the blob storage.

    • 2. Call to upload to the blob storage using the returned write url.

    • 3. Call to update the content with the read url of the blob storage after upload to trigger ingestion of uploaded content

“IDP Get Access Token” flow

Azure Key Vault

The “IDP Get Access Token” flow is responsible for getting an access token that can be used to call the Unique APIs. Unique sets up a serivceuser for each client that has the necessary permissions to call the ingestion APIs. The serviceuser’s client credentials will be provided to the clients and are used to get a token in this flow.

The recommended approach for handling the client credentials of the serviceuser is to store them securely in Key Vault and Unique advises against storing them in Power Automate. By default, the flow is using the Azure Key Vault connector to safely use the client credentials when making the API call to get an access token.

If the client decides against using Key Vault for safely storing the client credentials the flows needs to be adapted to use the username + password flow (see next section).

Username + Password (not recommended)

If you are not using a key vault you can use the IDP Get Access Token - username + password flow. You will need to configure the related environment variables.
You will need to modify the child flow called in the Get Access Token actions in both Sharepoint Files Scan and Ingest Content flows.

To do so, go to the action in the flows, and select the username + password flow in the drop-down menu.

Requirements

The following requirements must be met in order to use the SharePoint Connector provided by Unique:

  • SharePoint Online

    • The documents that should be synced into the knowledge base must be in a SharePoint Online library

  • Power Automate connectors needed

    • SharePoint (required)

      • Used to connect to SharePoint Online Sites / Libraries

    • HTTP - Premium connector (required)

      • Used to make HTTP calls to the Unique APIs

    • Azure Key Vault - Premium connector (optional, but strongly recommended)

      • Used to get the clientId / clientSecret stored in Key Vault

      • This is optional because Azure Key Vault does not have to be used to store the client credentials of the serviceuser, they can also be stored directly in Power Automate. Unique advises to use Azure Key Vault for securely storing the client credentials.

Setup

Add custom column in SharePoint

A custom column needs to be added to a SharePoint library to control the ingestion of the files. The custom column’s setup:

  1. In SharePoint, add a custom column by clicking the “+ Add column” button. Select the column type “Yes/No”.

  2. Name the custom column. This will be the name you need to use when setting up the Power Automate Flow variables. Also set the default value for new files that get added here.

  3. After saving and creating the column, you can optionally format the column to make the selected values more obvious to the users in SharePoint. Select the column’s dropdown > Column settings > Format this column. There select “Format yes and no” (you can also choose the colors by editing the styles).

Setting up the SharePoint Connector’s Power Automate flows

The setup process for the Unique SharePoint Connector consists of the following steps:

  1. Import the Power Automate Solution provided by Unique

  2. Configure the environment variables while / after importing the solution

The steps will be performed in Power Automate, which you can reach by navigating to Microsoft Power Automate.

Import the Power Automate Solution

Unique provides the Unique SharePoint connector to Customers as an exported Power Automate solution, which is a ZIP file. Along with the ZIP file, the Customers receive client credentials and all necessary values for configuring the environment variables.

In Power Automate navigate to the “Solutions” tab on the left side. You should see an overview of all existing solutions. On the top, click the “Import solution” button and you will be prompted to provide a file. Upload and import the ZIP file that Unique provided containing the Power Automate solution for the Unique SharePoint connector.

Configure environment variables

There are two logical sets of environment variables with the global prefix ufgpt like the rest of the objects within the solution:

  • sp_xxx related to Sharepoint setup

  • uq_xxxrelated to Unique setup

The Sharepoint variables must be configured as such:

  • sp_domain : the root Sharepoint domain

  • The Sharepoint sites to synchronise are stored in a Sharepoint list that needs to be accessible from the Power Automate Flow. The site hosting the list does not have to be synchronised. Access to the list is managed by 2 environment variables:

    • sp_list_hosting_site : the site where the list of sites to synchronise is hosted

    • sp_sites_list_name : the name of the list

      • the urls of the sites listed MUST be stored in a column named exactly Url

      • you can pass the list display name in sp_sites_list_name, but it is recommended to pass the list id to prevent disruption may the list name be inadvertently modified (Go to the list then Settings > List settings. The list id can be found in the url as List={the-list-id}

  • sp_sync_column_name : the name of the Unique column that controls for file synchronisation

The Unique variables must be configured as such:

  • uq_file_sync_endpoint : the url of the /file-diff endpoint on ingestion service

  • uq_idp_project_id: the Zitadel project id

  • uq_idp_token_endpoint : the Unique token issuer endpoint

  • uq_ingestion_graphql : /graphql endpoint on ingestion service

  • uq_owner_type & uq_scope : these two variables must be setup in accordance with each other. Possible values are:

    • uq_owner_type : SCOPE or COMPANY

    • uq_scope: COMPANY or PATH or the scope Id applicable to all synced files

    • Possible set of values and behaviour:

      • If both variables are set to COMPANY the files will be availble to the whole company.

      • If uq_scope is set to PATH, uq_owner_type must be set to SCOPE: each folder will generate its own permission scope that will be applied to nested files but not to subfolders which will have their own scope as permission set flattens the folder structure.

      • If uq_scope is set to a specific scope id applicable to all synchronised files, uq_owner_type must be set to SCOPE

⚠️ Any other set of values will fail the ingestion calls ⚠️

  • uq_store_internally: boolean value that defines whether or not the synced documents should also be stored internally at Unique. This allows users access to the file (e.g.: when clicking references in the chat) without the need to have access to the SharePoint library. Also needed to use the PDF preview / highlighting feature. Default value = false.

 

  • If you are using the username + password credential flow (not recommended), set up the following 2 variables:

    • uq_idp_client_id : Zitadel client id

    • uq_idp_client_secret : Zitadel client secret

Power Automate does not like empty environment variables so if they are not used, these variables should be left to the default empty string value "".

Scopes Considerations

Scope types

3 types of scopes are available:

  • PATH : file access scopes are attributed by SP folder. Folder hierarchy is flat, meaning that access to a folder does not grant access to subfolder nor top folders. Individual scopes must be attributed manually in the backend

  • COMPANY : all ingested files are available at the company level

  • Scope Id: all ingested files are attributed to a single specific scope identified by its scope id (can be different from the company scope id).

SP Service User Scope Access

The connector is in charge of creating the needed scopes in the case of scope set to PATH. The service user is automatically granted READ/WRITE permission on all the scopes it creates, and only on those.

Known Limitations & Issues

Ingestion of linked content in SharePoint Pages

What can be ingested from SharePoint pages has limitations.

What works:

  • All text content

What does not work:

  • Linked document libraries

  • Linked content in general

The limitation stems from the fact that we fetch the content of the SharePoint page from SharePoint’s API via Power Automate flow and what we receive there is the actual content on the page. Linked content, like a embedded Document Library Widget cannot be ingested because it’s just a link / embedding / iFrame that shows content on that page but is loaded from elsewhere (not present in the content we fetch from the API).

Splitting the flows

In its current form, failing to sync a file will show the whole scheduled synchronisation as failed. Debugging can be cumbersome as you have to go though each iteration to eventually find the culprit. This also means that the whole sync might be tried again until cancellation or resolution.

One way to mitigate this side effect would be to split the ingestion flow to decorrelate the call to Unique from the Sharepoint calls, to have the actual file ingestion flow to be triggered at file level. This would create an unwanted side effect though: as Zitadel is not able to provide the current valid token, it recreates a new token for each call. This means that we would rapidly hit the token issuance limit from the file ingestion flow.

This could be mitigated by scheduling a token issuance and storing it in the key vault, and have the single file ingestion flow fetch the token from the key vault rather than from the token endpoint. For this, the service principal connecting to the key vault must have write access on the key vault.


Resources


Author

see Parent

 

  • No labels