File Access Permissions
Introduction and Overview
Description
File Access Control is a new permission system that stores access control information directly on individual files/chunks rather than relying on folder-based permissions. This approach allows to adjust permissions on a fine-granular basis and solves scalability issues with the current folder-based system where search queries become too long when users have access to many folders (capping out around 1,000 folders in Elasticsearch).
Benefits
Current Folder-Based System Issues:
During search, the system enumerates all folders a user has access to
Builds queries with all folder IDs, making them extremely long
Elasticsearch queries cap out around 1,000 folders
Files/chunks only contain folder ownership information, not user access rights
Requires expensive detour: collect user folders → find files in those folders
New File-Based System Benefits:
Each file/chunk stores its own access control information
Search queries compare user groups against file access permissions
More efficient queries since users typically have fewer groups than folder access
Eliminates the 1,000 folder limit in Elasticsearch queries
Seamless switching between systems via feature flag
Functionality
Procedure
Content Upload: When files are uploaded via
contentUpsert, the system automatically determines and stores access permissions on each file/chunkPermission Format: Access permissions are stored as formatted strings like
u:user123R(user),g:group456W(group)Search: Instead of enumerating folders, search compares the user's groups against the file access permissions
Feature Flag: System can switch between folder-based and file-based access control seamlessly (see below).
The file access information is composed during the content upload process through the contentUpsert mutation. Stores formatted permissions in the fileAccess field of the Content/Chunk records.
Permissions are stored as formatted strings with the pattern: {entityType}:{entityId}{accessType}
Entity Types:
u= Userg= Group
Access Types:
R= ReadW= WriteM= Manage
Examples:
u:user123R= User "user123" has Read accessg:group456W= Group "group456" has Write accessu:admin789M= User "admin789" has Manage access
Permission Management
Understanding Permission Inheritance:
When folder permissions change, file permissions are automatically updated
Changes propagate to all files within the affected scope
User and group permissions are inherited from scope access settings
Manual Permission Updates:
File permissions are automatically managed by the system
Manual intervention should not be necessary
Contact technical support if permission issues persist
Configuration and Management
As an admin user, you have control over the file access control feature:
Feature Flag Management
The feature is controlled by the FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660 environment variable:
# Enable file-based access control
FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "true"
# Disable file-based access control (use folder-based)
FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "false"The feature flag must be enabled in knowledge-upload node-ingestion and node-scope-management services
Rollback
The file access can be disabled by setting the feature flag values to false.
# Revert to folder-based access control
FEATURE_FLAG_ENABLE_FILE_BASED_ACCESS_UN_12660: "false"The feature flag must be disabled in both knowledge-upload node-ingestion and node-scope-management services
Update via API
GraphQL
ContentUpsert Mutation
The primary way to create or update content with file access permissions:
mutation ContentUpsert($input: ContentCreateInput!, $scopeId: String) {
contentUpsert(input: $input, scopeId: $scopeId) {
title
key
id
fileAccess
}
}Example Request:
{
"input": {
"fileAccess": [
"u:333174122332880907R"
],
"id": "cont_wdld44kgwygm74oy6zx9erok",
"key": "elasticserch array - Google Search.pdf",
"mimeType": "application/pdf",
"ownerType": "SCOPE"
},
"scopeId": "scope_poxirj5hyhn5w1gq7elfhuwt"
}Example Response:
{
"data": {
"contentUpsert": {
"title": null,
"key": "elasticserch array - Google Search.pdf",
"id": "cont_wdld44kgwygm74oy6zx9erok",
"fileAccess": [
"u:333174122332880907R"
]
}
}
}REST
Upload Content with File Access
Endpoint: POST /api/v1/content
Headers:
Content-Type: application/json
Authorization: Bearer <access_token>
Example Request:
curl -X 'POST' \
'http://localhost:8091/v1/content' \
-H 'accept: */*' \
-H 'Authorization: Bearer <token>'
-H 'Content-Type: application/json' \
-d '{
"key": "elasticserch array - Google Search.pdf",
"mimeType": "application/pdf",
"ownerType": "SCOPE",
"scopeId": "scope_poxirj5hyhn5w1gq7elfhuwt",
"fileAccess": ["u:333174122332880907R"],
"mimeType": "application/pdf",
"source": {
"kind": "UNIQUE_BLOB_STORAGE"
}
}
'Example Response:
{
"id": "cont_wdld44kgwygm74oy6zx9erok",
"source": {
"id": "src_bmpyl5l71zfobvu00vwod9ap",
"kind": "UNIQUE_BLOB_STORAGE",
"name": null,
"ownerId": "333174122332880907",
"ownerType": "USER",
"createdAt": "2025-09-03T05:44:21.650Z",
"updatedAt": "2025-09-03T05:44:21.650Z"
},
"mimeType": "application/pdf",
"collectionName": "333174122332815371",
"metadata": {
"key": "elasticserch array - Google Search.pdf",
"url": null,
"title": null,
"folderId": "scope_poxirj5hyhn5w1gq7elfhuwt",
"mimeType": "application/pdf",
"companyId": "333174122332815371",
"contentId": "cont_wdld44kgwygm74oy6zx9erok",
"validAsOf": "2025-09-09T09:48:55.469Z",
"folderIdPath": "uniquepathid://scope_poxirj5hyhn5w1gq7elfhuwt",
"externalFileOwner": null
},
"fileAccess": [
"u:333174122332880907R"
]
}Data Migration
The File Access Data Migration system rebuilds file access permissions for all content items in the Knowledge Base. It ensures all content has properly formatted file access permissions based on ownership and scope relationships.
Please note that
The migration runs in two phases:
Mark Phase: Identifies and marks content that needs file access rebuild
Execute Phase: Processes marked content and rebuilds file access permissions
Migration Methods
Other ways to run file-access rebuild (exceptional only)
Configuration Parameters
Batch Size (batchSize)
Type:
numberDefault:
100Description: Number of content items to process in each batch
Recommendation:
Start with 100 for most cases
Increase to 250-500 for faster processing if system can handle it
Decrease to 50 if experiencing database load issues
Wait Time (waitTimeMs)
Type:
numberDefault:
250Description: Wait time in milliseconds between batches
Recommendation:
250ms is a good balance for most systems
Increase to 500-1000ms if database is under heavy load
Decrease to 100ms for faster processing (if system can handle it)
Rebuild All (rebuildAll)
Type:
booleanDefault:
falseDescription: If
true, rebuilds all content regardless of existing fileAccess permissions. Iffalse, only rebuilds content with empty fileAccess.Recommendation:
Use
falsefor normal migrations (only empty fileAccess)Use
truewhen you need to force rebuild all content (e.g., after schema changes)
Migration Process Details
Phase 1: Mark Phase
Content Selection:
If
rebuildAll: false: Selects content with emptyfileAccessIf
rebuildAll: true: Selects all content for the company
State Update:
Sets
fileAccessState.statustoPENDINGCreates a
FileAccessStateHistoryrecord with phaseMARKProcesses content in batches with configurable wait times
Completion:
Updates state history with total marked count
Sets status to
COMPLETEDorFAILED
Phase 2: Execute Phase
The execute phase processes content in three categories:
Scope Content:
Processes content owned by scopes
Uses scope access permissions to build file access
Non-Scope Content:
Processes content owned by users or chats
Uses owner-based permissions to build file access
Orphaned Content:
Processes content that doesn't fit into the above categories
Handles edge cases and missing data
For each content item:
Retrieves appropriate file access permissions
Updates
fileAccessfieldSets
fileAccessState.statustoCOMPLETEDorFAILEDHandles errors gracefully
After processing:
Optionally triggers reindexing if
triggerReindexing: trueCleans up any remaining PENDING content
Updates state history with final statistics
Method 1: HTTP API Requests
Prerequisites
Authentication token with
CHAT_ADMIN_ALLroleAccess to the node-ingestion service API
Base URL:
https://your-domain.com/v1/maintenance/file-access
API Endpoints
1. Mark Content for Rebuild
Endpoint: POST /v1/maintenance/file-access/mark
Description: Phase 1 - Marks content that needs file access rebuild processing. Sets fileAccessState to PENDING for content with empty fileAccess.
Request Body:
{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false
}Parameters:
batchSize(number, optional, default: 100): Number of items to process in each batchwaitTimeMs(number, optional, default: 250): Wait time in milliseconds between batchesrebuildAll(boolean, optional, default: false): Iftrue, rebuilds all content regardless of existing fileAccess permissions. Iffalse, only rebuilds content with empty fileAccess.
Response:
{
"totalMarked": 1500,
"stateId": "abc123-def456-ghi789"
}Example:
curl -X POST https://your-domain.com/v1/maintenance/file-access/mark \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false
}'2. Rebuild Marked Content
Endpoint: POST /v1/maintenance/file-access/rebuild-marked-content
Description: Phase 2 - Rebuilds content that was previously marked for file access rebuild processing.
Request Body:
{
"batchSize": 100,
"waitTimeMs": 250,
"triggerReindexing": true
}Parameters:
batchSize(number, optional, default: 100): Number of items to process in each batchwaitTimeMs(number, optional, default: 250): Wait time in milliseconds between batchestriggerReindexing(boolean, optional, default: true): Iftrue, triggers reindexing after rebuild completes
Response:
{
"processed": 1450,
"failed": 50,
"stateId": "abc123-def456-ghi789"
}Example:
curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild-marked-content \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"triggerReindexing": true
}'3. Complete Rebuild (Mark + Execute)
Endpoint: POST /v1/maintenance/file-access/rebuild
Description: Runs both mark and execute phases in sequence for convenience. This is the recommended endpoint for a complete rebuild.
Request Body:
{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false,
"triggerReindexing": true
}Parameters:
batchSize(number, optional, default: 100): Number of items to process in each batchwaitTimeMs(number, optional, default: 250): Wait time in milliseconds between batchesrebuildAll(boolean, optional, default: false): Iftrue, rebuilds all content regardless of existing fileAccess permissionstriggerReindexing(boolean, optional, default: true): Iftrue, triggers reindexing after rebuild completes
Response:
{
"totalMarked": 1500,
"processed": 1450,
"failed": 50,
"stateId": "abc123-def456-ghi789"
}Example:
curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false,
"triggerReindexing": true
}'4. Get Rebuild Status
Endpoint: GET /v1/maintenance/file-access/status
Description: Retrieves the current status of file access rebuild processing for the authenticated user's company.
Response:
{
"id": "abc123-def456-ghi789",
"companyId": "company-123",
"status": "COMPLETED",
"phase": "EXECUTE",
"totalContent": 1500,
"processedContent": 1450,
"failedContent": 50,
"metadata": {
"batchSize": 100,
"waitTimeMs": 250
},
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:45:00Z"
}Status Values:
PENDING: Migration is in progressCOMPLETED: Migration completed successfullyFAILED: Migration failed
Phase Values:
MARK: Mark phaseEXECUTE: Execute phase
Example:
curl -X GET https://your-domain.com/v1/maintenance/file-access/status \
-H "Authorization: Bearer YOUR_TOKEN"HTTP Workflow Example
Option A: Two-Step Process (More Control)
# Step 1: Mark content for rebuild
MARK_RESPONSE=$(curl -X POST https://your-domain.com/v1/maintenance/file-access/mark \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false
}')
echo "Marked content: $MARK_RESPONSE"
# Step 2: Rebuild marked content
REBUILD_RESPONSE=$(curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild-marked-content \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"triggerReindexing": true
}')
echo "Rebuild completed: $REBUILD_RESPONSE"Option B: One-Step Process (Simpler)
# Complete rebuild in one request
curl -X POST https://your-domain.com/v1/maintenance/file-access/rebuild \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"batchSize": 100,
"waitTimeMs": 250,
"rebuildAll": false,
"triggerReindexing": true
}'Method 2: Cron Job (Bootstrap Function)
Overview
The bootstrap function runs migrations for all companies automatically. It's designed to be executed manually from k8n cluster or using ArgoCD UI
Configuration
The bootstrap function is configured in src/main.ts and can be triggered by setting the RUNNING_MODE environment variable.
Running the Bootstrap Function
Using Environment Variable
Set the RUNNING_MODE environment variable to file-access-rebuild:
export RUNNING_MODE=file-access-rebuild
npm start
# or
node dist/main.jsDefault Configuration
The bootstrap function uses the following default configuration:
{
batchSize: 100,
waitTimeMs: 250,
rebuildAll: false
}Note: The bootstrap function does not support triggerReindexing parameter and will not trigger reindexing automatically.
Cron Job Setup
Kubernetes CronJob Example
apiVersion: batch/v1
kind: CronJob
metadata:
name: node-ingestion-file-access-rebuild
spec:
schedule: "0 2 * * *" # Run daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: node-ingestion
image: your-registry/node-ingestion:latest
env:
- name: RUNNING_MODE
value: "file-access-rebuild"
resources:
requests:
cpu: 200m
memory: 2Gi
limits:
cpu: 200m
memory: 2Gi
restartPolicy: OnFailureDocker Compose Example
services:
file-access-rebuild:
image: your-registry/node-ingestion:latest
environment:
- RUNNING_MODE=file-access-rebuild
command: npm startHow It Works
The bootstrap function creates an application context for
FileAccessMaintenanceModuleIt calls
rebuildAllCompaniesFileAccess()which:
Processes all companies with configurable concurrency
For each company, runs the complete rebuild process (mark + execute)
Performs cleanup of any remaining PENDING content before shutdown
The function automatically handles:
Company-level concurrency (controlled by
FILE_BASED_ACCESS_REBUILD_COMPANY_CONCURRENCY_LIMIT)Error handling and logging
Cleanup of leftover PENDING states
Conclusion
The file access control feature represents a significant improvement in search scalability and performance. By moving access control information from folders to individual files, the system can handle organizations with thousands of folders without hitting Elasticsearch query limits.
The feature is designed to be:
Seamless: No changes to user workflows
Backward Compatible: Can switch between systems via feature flag
Scalable: Handles large folder structures efficiently
Reliable: Maintains all existing permission semantics
For questions or issues, contact your technical support team or refer to the monitoring dashboards for system health information.