Data redaction

Overview

The Data Redaction feature is designed to automatically remove sensitive information from the node-chat application at regular intervals to comply with data retention policies and privacy concerns. This feature runs as a cron job named 'data-redaction-task' every day at midnight, soft-deleting or redacting specified data older than a customizable retention period. As this is a soft delete, files or other data assigned to a chat are not deleted; this is handled as a dedicated feature.

Activation

To activate the Data Redaction feature, you need to define an environment variable in node-chat:

  • Environment Variable: DATA_RETENTION_IN_DAYS

  • Description: Specifies the number of days that data should be retained before being redacted. Data older than this specified duration will be redacted.

Redaction Process

The feature runs a daily cron job that performs the following actions on data older than the defined retention period:

  1. Soft Delete Chats: Mark chats as soft deleted instead of hard deleting them, which allows the data to be kept for analytics and metrics purposes.

  2. Redact Specific Fields: To comply with data privacy, specific sensitive fields in chats and messages are emptied or sanitized, including:

    • Chat:

      • Title: Set to empty.

    • Feedback:

      • Text: Set to empty.

      • Additional Info: Set to empty.

    • Short Term Memory:

      • Data: Set as an empty object.

    • Messages:

      • Text: Set to an empty string.

      • Original Text: Set to an empty string.

      • GPT Request: Set to an empty object.

      • Debug Info: Set to empty while retaining userAgent and chosenModule.

  3. Delete Benchmarks: Remove benchmark data older than the retention date to ensure outdated performance metrics are not stored.

Key Points

Example Usage

To activate the feature and set the retention period to 30 days, add the following environment variable:

export DATA_RETENTION_IN_DAYS=30

This will ensure that all data older than 30 days is redacted as per the rules defined above.

Considerations

  • The redaction process runs daily at midnight, ensuring that the latest data compliance policies are applied regularly.

  • Soft deletion keeps anonymized data available for analytics purposes, so any analysis of benchmarks and metrics remains unaffected.

  • This feature helps comply with privacy regulations by limiting the amount of sensitive information retained unnecessarily.

For any issues or to further customize the behavior of this feature, please refer to the node-chat source code or contact an administrator.

 


Author

@Sebastien Barbier

 

© 2024 Unique AG. All rights reserved. Privacy PolicyTerms of Service