Info | ||
---|---|---|
| ||
Datameer's housekeeping is a background service that improves processing by deleting obsolete data from HDFS, removing old entries from job history, and removing unsaved workbooks. The housekeeping service is defined either by count, where data objects above a given number are marked for deletion (e.g. keep 10 data objects, when object 11 comes in, delete object 1), by time (e.g. at the end of every business day), or with a combination of both. Data entities that are set to be deleted are first marked for deletion. Data entity deletion is based on specific site retention policy and can be configured by the Datameer X administrator using the default.properties file. A property that delays file deletion from HDFS for a period longer than the standard backup interval can be used to configure failover behavior. Similarly, a property that delays data deletion after a Datameer X upgrade can be used to assure that roll back is successful if it is needed. |
...
Anchorhousekeeping_not_delete housekeeping_not_delete
Housekeeping Does Not Delete
housekeeping_not_delete | |
housekeeping_not_delete |
- data not marked for deletion (configurable for failover and rollback reasons)
- data in status 'MARKET_FOR_DELETION' that is used in a running job
- data referenced by an active workbook snapshot
- data is copied instead of being referenced, when a sheet is linked in a different workbook and is marked as kept
...
Code Block | ||
---|---|---|
| ||
################################################################################################ ## Housekeeping configuration ################################################################################################ housekeeping.enabled=true # Disable data deletion from hdfs, when doing a housekeeping run the files in database will be deleted but not in hdfs. # Then the files will be logged into the database table 'filesystem_artifact_to_delete' and to conductor.log housekeeping.data.deletion.disabled=false # Define the maximum number of days job executions are saved in the job history, after a job has been completed. housekeeping.execution.max-age=28d # Maximum number of out-dated executions that should be deleted per housekeeping run housekeeping.run.delete.outdated-job-executions=50#50 # To allow for better failover due to a crashed database, deleted data should be kept longer than the configured # frequency of database backups. housekeeping.keep-deleted-data=2h # Don't delete any data on HDFS for this period of time after an upgrade. This allows for a safe rollback to # a previous version housekeeping.keep-deleted-data-after-upgrade=2d # Maximum number of out-dated data objects that should be marked for deletion per housekeeping run housekeeping.run.mark-for-deletion.outdated-data-objects=200 # Maximum number of out-dated data objects that should be deleted per housekeeping run housekeeping.run.delete.outdated-data-objects=2550 # Maximum number out-dated data artifacts that should be deleted from HDFS per housekeeping run housekeeping.run.delete.outdated-data-artifacts=100 # Maximum number out-dated file artifacts that should be deleted from HDFS per housekeeping run housekeeping.run.delete.outdated-file-artifacts=10000#10000 # Define the maximum number of days unsaved workbooks are stored in the database. housekeeping.temporary-files.max-age=3d # Minimum time to keep files in temporary folder after last access. housekeeping.temporary-folder-files.max-age=30d # Maximum number of attempts for each taskout-dated temporary conductor files that should be deleted per housekeeping run housekeeping.run.delete.taskoutdated-attemptstemporary-per-runfiles=50 # Maximum number of out-datedattempts temporaryfor conductor files that should be deleted each task per housekeeping run housekeeping.run.delete.outdated-temporary-files=50#task-attempts-per-run=50 # The time that the housekeeping service falls asleep after each cycle housekeeping.sleep-time=1h ################################################################################################ |
...