File Access Permissions

File Access Permissions

Introduction and Overview

Description

File Access Control is a new permission system that stores access control information directly on individual files/chunks rather than relying on folder-based permissions. This approach allows to adjust permissions on a fine-granular basis and solves scalability issues with the current folder-based system where search queries become too long when users have access to many folders (capping out around 1,000 folders in Elasticsearch).

Benefits

Current Folder-Based System Issues:

  • During search, the system enumerates all folders a user has access to

  • Builds queries with all folder IDs, making them extremely long

  • Elasticsearch queries cap out around 1,000 folders

  • Files/chunks only contain folder ownership information, not user access rights

  • Requires expensive detour: collect user folders → find files in those folders

New File-Based System Benefits:

  • Each file/chunk stores its own access control information

  • Search queries compare user groups against file access permissions

  • More efficient queries since users typically have fewer groups than folder access

  • Eliminates the 1,000 folder limit in Elasticsearch queries

  • Seamless switching between systems via feature flag

Functionality

Procedure

  1. Content Upload: When files are uploaded via contentUpsert, the system automatically determines and stores access permissions on each file/chunk

  2. Permission Format: Access permissions are stored as formatted strings like u:user123R (user), g:group456W (group)

  3. Search: Instead of enumerating folders, search compares the user's groups against the file access permissions

  4. Feature Flag: System can switch between folder-based and file-based access control seamlessly (see below).

The file access information is composed during the content upload process through the contentUpsert mutation. Stores formatted permissions in the fileAccess field of the Content/Chunk records.

Permissions are stored as formatted strings with the pattern: {entityType}:{entityId}{accessType}

  • Entity Types:

    • u = User

    • g = Group

  • Access Types:

    • R = Read

    • W = Write

    • M = Manage

Examples:

  • u:user123R = User "user123" has Read access

  • g:group456W = Group "group456" has Write access

  • u:admin789M = User "admin789" has Manage access

Permission Management

Understanding Permission Inheritance:

  • When folder permissions change, file permissions are automatically updated

  • Changes propagate to all files within the affected scope

  • User and group permissions are inherited from scope access settings

Manual Permission Updates:

  • File permissions are automatically managed by the system

  • Manual intervention should not be necessary

  • Contact technical support if permission issues persist

Configuration and Management

As an admin user, you have control over the file access control feature:

Feature Flag Management

The feature is controlled by the FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660 environment variable:

# Enable file-based access control FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "true" # Disable file-based access control (use folder-based) FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "false"

The feature flag must be enabled in knowledge-upload node-ingestion and node-scope-management services

Rollback

The file access can be disabled by setting the feature flag values to false.

# Revert to folder-based access control FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "false"

The feature flag must be disabled in both knowledge-upload node-ingestion and node-scope-management services

Update via API

GraphQL

ContentUpsert Mutation

The primary way to create or update content with file access permissions:

mutation ContentUpsert($input: ContentCreateInput!, $scopeId: String) { contentUpsert(input: $input, scopeId: $scopeId) { title key id fileAccess } }

Example Request:

{ "input": { "fileAccess": [ "u:333174122332880907R" ], "id": "cont_wdld44kgwygm74oy6zx9erok", "key": "elasticserch array - Google Search.pdf", "mimeType": "application/pdf", "ownerType": "SCOPE" }, "scopeId": "scope_poxirj5hyhn5w1gq7elfhuwt" }

Example Response:

{ "data": { "contentUpsert": { "title": null, "key": "elasticserch array - Google Search.pdf", "id": "cont_wdld44kgwygm74oy6zx9erok", "fileAccess": [ "u:333174122332880907R" ] } } }

REST

Upload Content with File Access

Endpoint: POST /api/v1/content

Headers:

Content-Type: application/json
Authorization: Bearer <access_token>

Example Request:

curl -X 'POST' \ 'http://localhost:8091/v1/content' \ -H 'accept: */*' \ -H 'Authorization: Bearer <token>' -H 'Content-Type: application/json' \ -d '{ "key": "elasticserch array - Google Search.pdf", "mimeType": "application/pdf", "ownerType": "SCOPE", "scopeId": "scope_poxirj5hyhn5w1gq7elfhuwt", "fileAccess": ["u:333174122332880907R"], "mimeType": "application/pdf", "source": { "kind": "UNIQUE_BLOB_STORAGE" } } '

Example Response:

{ "id": "cont_wdld44kgwygm74oy6zx9erok", "source": { "id": "src_bmpyl5l71zfobvu00vwod9ap", "kind": "UNIQUE_BLOB_STORAGE", "name": null, "ownerId": "333174122332880907", "ownerType": "USER", "createdAt": "2025-09-03T05:44:21.650Z", "updatedAt": "2025-09-03T05:44:21.650Z" }, "mimeType": "application/pdf", "collectionName": "333174122332815371", "metadata": { "key": "elasticserch array - Google Search.pdf", "url": null, "title": null, "folderId": "scope_poxirj5hyhn5w1gq7elfhuwt", "mimeType": "application/pdf", "companyId": "333174122332815371", "contentId": "cont_wdld44kgwygm74oy6zx9erok", "validAsOf": "2025-09-09T09:48:55.469Z", "folderIdPath": "uniquepathid://scope_poxirj5hyhn5w1gq7elfhuwt", "externalFileOwner": null }, "fileAccess": [ "u:333174122332880907R" ] }

Data Migration

The File Access Data Migration system rebuilds file access permissions for all content items in the Knowledge Base. It ensures all content has properly formatted file access permissions based on ownership and scope relationships.
Please note that

The migration runs in two phases:

  1. Mark Phase: Identifies and marks content that needs file access rebuild

  2. Execute Phase: Processes marked content and rebuilds file access permissions

Migration Methods

This data migration rebuilds file-based access for companies that don’t have it yet. It uses FileAccessMaintenanceService.rebuildAllCompaniesFileAccess() so that content gets correct file-access records and search/access behavior is consistent.

Behaviour

  • Scope: Only companies that currently have no file access data (rebuildAll: false → targets “empty” file access).

  • Config used in the migration:

batchSize: 250, waitTimeMs: 50, rebuildAll: false, triggerReindexing: true, mergeExistingFileAccesses: false.

  • Execution: Runs once per deployment when the migration runs (see below). No scheduling inside the app.

When it runs: deployment (default path)

  • Automatically triggered on deployment.

The node-ingestion deployment hook runs, in order:

  1. prisma migrate deploy (schema migrations)

  2. RUNNING_MODE=data-migration node ... main.js (data migrations)

So whenever this migration is included in a release, it runs once as part of that deployment for each environment. This is the default and recommended way for companies to get (or refresh) file-based access: no extra steps.

Other ways to run file-access rebuild (exceptional only)

Kubernetes CronJob that runs the file-access rebuild for companies when started with RUNNING_MODE=file-access-rebuild. It uses FileAccessMaintenanceService.rebuildSpecificCompaniesFileAccess() (and with default config, rebuilds all companies).

Use this CronJob only when you need to:

  • Run a full rebuild for all companies

  • Rebuild for specific companies

Configuration

RUNNING_MODE

file-access-rebuild

Enters file-access-rebuild mode.

FILE_ACCESS_REBUILD_BATCH_SIZE

250

Batch size for processing.

FILE_ACCESS_REBUILD_WAIT_TIME_MS

50

Delay between batches (ms).

FILE_ACCESS_REBUILD_REBUILD_ALL

true

Rebuild all content (ignore existing file access).

FILE_ACCESS_REBUILD_TRIGGER_REINDEXING

true

Trigger reindex after rebuild.

FILE_ACCESS_REBUILD_MERGE_EXISTING_FILE_ACCESSES

true

Merge with existing file accesses.

Use HTTP endpoints to trigger migrations. This method provides fine-grained control and is ideal for:

  • Having more fine-grained control over the migration parameters

  • Testing and debugging

  • Manual intervention

  • Monitoring progress in real-time

Configuration Parameters

Batch Size (batchSize)

  • Type: number

  • Default: 100

  • Description: Number of content items to process in each batch

  • Recommendation:

    • Start with 100 for most cases

    • Increase to 250-500 for faster processing if system can handle it

    • Decrease to 50 if experiencing database load issues

Wait Time (waitTimeMs)

  • Type: number

  • Default: 250

  • Description: Wait time in milliseconds between batches

  • Recommendation:

  • 250ms is a good balance for most systems

  • Increase to 500-1000ms if database is under heavy load

  • Decrease to 100ms for faster processing (if system can handle it)

Rebuild All (rebuildAll)

  • Type: boolean

  • Default: false

  • Description: If true, rebuilds all content regardless of existing fileAccess permissions. If false, only rebuilds content with empty fileAccess.

  • Recommendation:

  • Use false for normal migrations (only empty fileAccess)

  • Use true when you need to force rebuild all content (e.g., after schema changes)

Migration Process Details

Phase 1: Mark Phase

  1. Content Selection:

  • If rebuildAll: false: Selects content with empty fileAccess

  • If rebuildAll: true: Selects all content for the company

  1. State Update:

  • Sets fileAccessState.status to PENDING

  • Creates a FileAccessStateHistory record with phase MARK

  • Processes content in batches with configurable wait times

  1. Completion:

  • Updates state history with total marked count

  • Sets status to COMPLETED or FAILED

Phase 2: Execute Phase

The execute phase processes content in three categories:

  1. Scope Content:

  • Processes content owned by scopes

  • Uses scope access permissions to build file access

  1. Non-Scope Content:

  • Processes content owned by users or chats

  • Uses owner-based permissions to build file access

  1. Orphaned Content:

  • Processes content that doesn't fit into the above categories

  • Handles edge cases and missing data

For each content item:

  • Retrieves appropriate file access permissions

  • Updates fileAccess field

  • Sets fileAccessState.status to COMPLETED or FAILED

  • Handles errors gracefully

After processing:

  • Optionally triggers reindexing if triggerReindexing: true

  • Cleans up any remaining PENDING content

  • Updates state history with final statistics

Method 1: HTTP API Requests

Prerequisites

  • Authentication token with CHAT_ADMIN_ALL role

  • Access to the node-ingestion service API

  • Base URL: https://your-domain.com/v1/maintenance/file-access

API Endpoints

1. Mark Content for Rebuild

Endpoint: POST /v1/maintenance/file-access/mark

Description: Phase 1 - Marks content that needs file access rebuild processing. Sets fileAccessState to PENDING for content with empty fileAccess.

Request Body:

{   "batchSize": 100,   "waitTimeMs": 250,   "rebuildAll": false }

Parameters:

  • batchSize (number, optional, default: 100): Number of items to process in each batch

  • waitTimeMs (number, optional, default: 250): Wait time in milliseconds between batches

  • rebuildAll (boolean, optional, default: false): If true, rebuilds all content regardless of existing fileAccess permissions. If false, only rebuilds content with empty fileAccess.

Response:

{   "totalMarked": 1500,   "stateId": "abc123-def456-ghi789" }

Example:

curl -X POST https://your-domain.com/v1/maintenance/file-access/mark \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "rebuildAll": false   }'

2. Rebuild Marked Content

Endpoint: POST /v1/maintenance/file-access/rebuild-marked-content

Description: Phase 2 - Rebuilds content that was previously marked for file access rebuild processing.

Request Body:

{   "batchSize": 100,   "waitTimeMs": 250,   "triggerReindexing": true }

Parameters:

  • batchSize (number, optional, default: 100): Number of items to process in each batch

  • waitTimeMs (number, optional, default: 250): Wait time in milliseconds between batches

  • triggerReindexing (boolean, optional, default: true): If true, triggers reindexing after rebuild completes

Response:

{   "processed": 1450,   "failed": 50,   "stateId": "abc123-def456-ghi789" }

Example:

curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild-marked-content \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "triggerReindexing": true   }'

3. Complete Rebuild (Mark + Execute)

Endpoint: POST /v1/maintenance/file-access/rebuild

Description: Runs both mark and execute phases in sequence for convenience. This is the recommended endpoint for a complete rebuild.

Request Body:

{   "batchSize": 100,   "waitTimeMs": 250,   "rebuildAll": false,   "triggerReindexing": true }

Parameters:

  • batchSize (number, optional, default: 100): Number of items to process in each batch

  • waitTimeMs (number, optional, default: 250): Wait time in milliseconds between batches

  • rebuildAll (boolean, optional, default: false): If true, rebuilds all content regardless of existing fileAccess permissions

  • triggerReindexing (boolean, optional, default: true): If true, triggers reindexing after rebuild completes

Response:

{   "totalMarked": 1500,   "processed": 1450,   "failed": 50,   "stateId": "abc123-def456-ghi789" }

Example:

curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "rebuildAll": false,     "triggerReindexing": true   }'

4. Get Rebuild Status

Endpoint: GET /v1/maintenance/file-access/status

Description: Retrieves the current status of file access rebuild processing for the authenticated user's company.

Response:

{   "id": "abc123-def456-ghi789",   "companyId": "company-123",   "status": "COMPLETED",   "phase": "EXECUTE",   "totalContent": 1500,   "processedContent": 1450,   "failedContent": 50,   "metadata": {     "batchSize": 100,     "waitTimeMs": 250   },   "createdAt": "2024-01-15T10:30:00Z",   "updatedAt": "2024-01-15T10:45:00Z" }

Status Values:

  • PENDING: Migration is in progress

  • COMPLETED: Migration completed successfully

  • FAILED: Migration failed

Phase Values:

  • MARK: Mark phase

  • EXECUTE: Execute phase

Example:

curl -X GET https://your-domain.com/v1/maintenance/file-access/status \   -H "Authorization: Bearer YOUR_TOKEN"

HTTP Workflow Example

Option A: Two-Step Process (More Control)

# Step 1: Mark content for rebuild MARK_RESPONSE=$(curl -X POST https://your-domain.com/v1/maintenance/file-access/mark \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "rebuildAll": false   }') echo "Marked content: $MARK_RESPONSE" # Step 2: Rebuild marked content REBUILD_RESPONSE=$(curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild-marked-content \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "triggerReindexing": true   }') echo "Rebuild completed: $REBUILD_RESPONSE"

Option B: One-Step Process (Simpler)

# Complete rebuild in one request curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild \   -H "Authorization: Bearer YOUR_TOKEN" \   -H "Content-Type: application/json" \   -d '{     "batchSize": 100,     "waitTimeMs": 250,     "rebuildAll": false,     "triggerReindexing": true   }'

Method 2: Cron Job (Bootstrap Function)

Overview

The bootstrap function runs migrations for all companies automatically. It's designed to be executed manually from k8n cluster or using ArgoCD UI

Configuration

The bootstrap function is configured in src/main.ts and can be triggered by setting the RUNNING_MODE environment variable.

Running the Bootstrap Function

Using Environment Variable

Set the RUNNING_MODE environment variable to file-access-rebuild:

export RUNNING_MODE=file-access-rebuild npm start # or node dist/main.js

Default Configuration

The bootstrap function uses the following default configuration:

{   batchSize: 100,   waitTimeMs: 250,   rebuildAll: false }

Note: The bootstrap function does not support triggerReindexing parameter and will not trigger reindexing automatically.

Cron Job Setup

Kubernetes CronJob Example

apiVersion: batch/v1 kind: CronJob metadata:   name: node-ingestion-file-access-rebuild spec:   schedule: "0 2 * * *"  # Run daily at 2 AM   jobTemplate:     spec:       template:         spec:           containers:           - name: node-ingestion             image: your-registry/node-ingestion:latest             env:             - name: RUNNING_MODE               value: "file-access-rebuild"             resources:               requests:                 cpu: 200m                 memory: 2Gi               limits:                 cpu: 200m                 memory: 2Gi           restartPolicy: OnFailure

Docker Compose Example

services:   file-access-rebuild:     image: your-registry/node-ingestion:latest     environment:       - RUNNING_MODE=file-access-rebuild     command: npm start

How It Works

  1. The bootstrap function creates an application context for FileAccessMaintenanceModule

  2. It calls rebuildAllCompaniesFileAccess() which:

  • Processes all companies with configurable concurrency

  • For each company, runs the complete rebuild process (mark + execute)

  • Performs cleanup of any remaining PENDING content before shutdown

  1. The function automatically handles:

  • Company-level concurrency (controlled by FILE_BASED_ACCESS_REBUILD_COMPANY_CONCURRENCY_LIMIT)

  • Error handling and logging

  • Cleanup of leftover PENDING states

Conclusion

The file access control feature represents a significant improvement in search scalability and performance. By moving access control information from folders to individual files, the system can handle organizations with thousands of folders without hitting Elasticsearch query limits.

The feature is designed to be:

  • Seamless: No changes to user workflows

  • Backward Compatible: Can switch between systems via feature flag

  • Scalable: Handles large folder structures efficiently

  • Reliable: Maintains all existing permission semantics

For questions or issues, contact your technical support team or refer to the monitoring dashboards for system health information.