Overview
The Clean App Data Files maintenance task is a utility designed to help users manage disk usage by automatically deleting old files from the Home folder ({app.data}) and its subdirectories. This article explains how the task works, how to configure it, and best practices to ensure smooth operation. It also covers how the task interacts with data folders in a multi-tenant setup.
What Is the Home Folder?
When you sign up for Etlworks, a dedicated storage volume is created on Etlworks servers exclusively for your account. This volume is referred to as the Home folder, and all data stored in this folder and its subdirectories can be accessed using the Server Storage connection type.
To reference the Home folder in your configurations, use the token {app.data}.
Multi-Tenant Setup
For Enterprise and On-premise plans, Etlworks supports multi-tenancy, allowing users to create tenant sub-accounts.
Each tenant has its own isolated Home folder to ensure data separation and privacy.
The main account (non-tenant account) also has its own Home folder.
Task Behavior in a Multi-Tenant Environment
The Clean App Data Files task is configured under the main (non-tenant) account.
When executed, the task traverses:
1. The Home folder of the main account.
2. The Home folders of all tenant sub-accounts.
This ensures that old files are cleaned across all accounts managed by the main account.
How this task works
The Clean App Data Files task deletes files that are older than a configurable number of days within the Home folder ({app.data}) and its subdirectories.
Key Features
Customizable Retention Period: Users can specify the number of days to retain files. Files older than the configured number of days will be deleted.
Exclusion Settings:
• Users can manually configure specific files or folders to exclude from cleanup.
• Some folders are excluded by default, such as {app.data}/debezium_data, to prevent accidental deletion of critical data.
Support for Multi-Tenant Environments:
• The task automatically traverses all tenant-specific Home folders, ensuring cleanup across all accounts.
How to Configure the Clean App Data Files Task
Follow these steps to manually configure the task:
Step 1: Add the Task
1. Log in to the main (non-tenant) account.
2. Navigate to the Settings->Maintenance section in the Etlworks application.
3. Click Add and select Clean App Data Files from the list of task types. Enter the name of the task.
Step 2: Configure Retention Period
Specify the number of days for the retention period in Older than # days. Files older than this value will be deleted.
Ensure the retention period aligns with your data management policies.
Step 3: Optionally configure Exclusions and Inclusions
Add specific folders or files you want to exclude from cleanup or include in cleanup.
Exclude paths: List of paths to exclude, each on a separate line. Glob patterns can be used:
• The * character matches zero or more characters of a name component without crossing directory boundaries.
• The ** characters match zero or more characters crossing directory boundaries.
• The ? character matches exactly one character of a name component.
• The [] characters are a bracket expression that matches a single character of a name component out of a set of characters. For example, [abc] matches “a”, “b”, or “c”. The hyphen - may be used to specify a range, so [a-z] specifies a range that matches from “a” to “z” (inclusive). These forms can be mixed, so [abce-g] matches “a”, “b”, “c”, “e”, “f”, or “g”. If the character after the [ is a !, it is used for negation, so [!a-c] matches any character except “a”, “b”, or “c”.
• The {} characters are a group of subpatterns, where the group matches if any subpattern in the group matches. The , character is used to separate the subpatterns. Groups cannot be nested.
Example patterns:
• **.csv - all files with .csv extension.
• *.csv - all files with .csv extension only under the Super Admin level root directory.
• **/*.csv - all files with .csv extension under tenants or any Super Admin level subdirectory.
Include paths: List of paths to include, each on a separate line. All other paths will be excluded. Glob patterns can be used:
• The * character matches zero or more characters of a name component without crossing directory boundaries.
• The ** characters match zero or more characters crossing directory boundaries.
• The ? character matches exactly one character of a name component.
• The [] characters are a bracket expression that matches a single character of a name component out of a set of characters. For example, [abc] matches “a”, “b”, or “c”. The hyphen - may be used to specify a range, so [a-z] specifies a range that matches from “a” to “z” (inclusive). These forms can be mixed, so [abce-g] matches “a”, “b”, “c”, “e”, “f”, or “g”. If the character after the [ is a !, it is used for negation, so [!a-c] matches any character except “a”, “b”, or “c”.
• The {} characters are a group of subpatterns, where the group matches if any subpattern in the group matches. The , character is used to separate the subpatterns. Groups cannot be nested.
Example patterns:
• **.csv - all files with .csv extension.
• *.csv - all files with .csv extension only under the Super Admin level root directory.
• **/*.csv - all files with .csv extension under tenants or any Super Admin level subdirectory.
All files under the debezium_data subdirectory are excluded by default but can be explicitly added with the Include path pattern.
Step 4: Test the Task
1. Before scheduling the task to run periodically, test it by running it manually.
2. Review the task logs to verify that only the intended files were deleted.
3. Confirm that tenant Home folders and the main account Home folder were traversed correctly.
Step 5: Schedule the Task
1. Once testing is complete and the task behaves as expected, schedule it to run periodically.
2. Regularly monitor task logs to ensure the cleanup process continues to operate correctly.
Best Practices
Understand Your Data: Identify which files and folders should not be deleted and configure exclusions accordingly.
Test Before Scheduling: Always test the task manually before enabling periodic execution to prevent unintended data loss.
Monitor Logs: Regularly review task logs to ensure the cleanup process is running as expected across all Home folders.
Avoid Aggressive Retention: Set a reasonable retention period that aligns with your organization’s data retention policies.
Summary
The Clean App Data Files task is a powerful tool for managing storage by automatically deleting old files from the Home folder and all tenant-specific Home folders. While it is not configured by default, it can be customized to meet your specific needs, including setting retention periods and exclusions. Testing the task before scheduling it to run periodically is essential to ensure safe and efficient operation.
Comments
0 comments
Please sign in to leave a comment.