Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleINFO

Datameer's housekeeping is a background service that improves processing by deleting obsolete data from HDFS, removing old entries from job history, and removing unsaved workbooks.

The housekeeping service is defined either by count, where data objects above a given number are marked for deletion (e.g. keep 10 data objects, when object 11 comes in, delete object 1), by time (e.g. at the end of every business day), or with a combination of both. Data entities that are set to be deleted are first marked for deletion. Data entity deletion is based on specific site retention policy and can be configured by the Datameer X administrator using the default.properties file.

A property that delays file deletion from HDFS for a period longer than the standard backup interval can be used to configure failover behavior. Similarly, a property that delays data deletion after a Datameer X upgrade can be used to assure that roll back is successful if it is needed.

...

Anchor
housekeeping_not_delete
housekeeping_not_delete
Housekeeping Does Not Delete

  • data not marked for deletion (configurable for failover and rollback reasons)
  • data in status 'MARKET_FOR_DELETION' that is used in a running job
  • data referenced by an active workbook snapshot
  • data is copied instead of being referenced, when a sheet is linked in a different workbook and is marked as kept

...

Code Block
languagebash
################################################################################################
## Housekeeping configuration
################################################################################################
housekeeping.enabled=true
# Disable data deletion from hdfs, when doing a housekeeping run the files in database will be deleted but not in hdfs.
# Then the files will be logged into the database table 'filesystem_artifact_to_delete' and to conductor.log
housekeeping.data.deletion.disabled=false
# Define the  maximum number of days job executions are saved in the job history, after a job has been completed.
housekeeping.execution.max-age=28d
# Maximum number of out-dated executions that should be deleted per housekeeping run
housekeeping.run.delete.outdated-job-executions=50#50


# To allow for better failover due to a crashed database, deleted data should be kept longer than the configured
# frequency of database backups.
housekeeping.keep-deleted-data=2h
# Don't delete any data on HDFS for this period of time after an upgrade. This allows for a safe rollback to
# a previous version 
housekeeping.keep-deleted-data-after-upgrade=2d
# Maximum number of out-dated data objects that should be marked for deletion per housekeeping run
housekeeping.run.mark-for-deletion.outdated-data-objects=200
# Maximum number of out-dated data objects that should be deleted per housekeeping run
housekeeping.run.delete.outdated-data-objects=2550
# Maximum number out-dated data artifacts that should be deleted from HDFS per housekeeping run
housekeeping.run.delete.outdated-data-artifacts=100
# Maximum number out-dated file artifacts that should be deleted from HDFS per housekeeping run
housekeeping.run.delete.outdated-file-artifacts=10000#10000


# Define the maximum number of days unsaved workbooks are stored in the database.
housekeeping.temporary-files.max-age=3d
# Minimum time to keep files in temporary folder after last access.
housekeeping.temporary-folder-files.max-age=30d
# Maximum number of attempts for each taskout-dated temporary conductor files that should be deleted per housekeeping run
housekeeping.run.delete.taskoutdated-attemptstemporary-per-runfiles=50
# Maximum number of out-datedattempts temporaryfor conductor files that should be deleted each task per housekeeping run
housekeeping.run.delete.outdated-temporary-files=50#task-attempts-per-run=50


# The time that the housekeeping service falls asleep after each cycle
housekeeping.sleep-time=1h

################################################################################################

...