Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

NameDefault ValueDescription
fs.AbstractFileSystem.hdfs.implorg.apache.hadoop.fs.Hdfs

Property is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework.

io.compression.codecs.additiondatameer.dap.common.util.ZipCodec,datameer.dap.common.util.LzwCodecProperty is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework. Additional comma=separated compression codecs that are added to io.compression.codecs
das.yarn.available-node-vcoresauto

This property sets the number of available node vcores and is used in order to calculate the optimal number of splits and tasks (numberOfNodes * das.yarn.available-node-vcores). It can be set either to the number of free CPUs per node, or it can be set to auto. In auto mode, Datameer X fetches information about the vCores from every node and sets das.yarn.available-node-vcores to the average of the available vCores. 

das.job.health.checktrueTurn das job configuration health check on for cluster configuration.
das.local.exec.map.count5DAS property to influence the number of mappers in Local Execution mode.

Other Properties

NameDefault ValueDescription

YARN MR2 Framework Properties

das.yarn.base-counter-wait-time

40000

YARN Counters take a while to show up at the Job History Server, so we use this base wait time + a small factor * the number of executed tasks to wait for the counters to finally show up.

TEZ Execution Framework Properties

das.tez.session-pool.max-cached-sessions

auto

Maximum number of idle session to be held in cache. 0 means no session pooling, auto uses the number of nodes in the cluster as value for the maximum number of sessions.

das.tez.session-pool.max-idle-time

2m

The timeout for the sessions held in pool for which they wait for DAG to be submitted.

das.tez.session-pool.max-time-to-live

2h

Maximum amount of time a tez session will be alive irrespective of being idle or active since its start time.

tez.runtime.compress

true

Settings for intermediate compression with Tez.

tez.runtime.compress.codec

org.apache.hadoop.io.compress.SnappyCodec

Settings for intermediate compression with Tez.

framework.local.das.parquet-storage.max-parquet-block-size

67108864

For local execution framework setting the max parquet block size to 64MB(can't be raised beyond 64MB but, can be lowered) by default.

tez.shuffle-vertex-manager.desired-task-input-size

52428800

Sets the input size for each TEZ sub-task.

das.cluster.plugin.resources.fixer.enabledtrueIntercepts the YARN errors YARN-3591 and YARN 6641 and rebuilds the missing plug-in class files on the fly. Setting the value to 'false' disables the property.

Hive Properties

dap.hive.use-datameer-file-splitterfalesProperty decides whether Datameer X should use Datameer's FileSplitter logic to generate RewindableFileSplits or Hive's own splitting logic.

Windows Properties

windows.das.join.disabled-strategiesMEMORY_BACKED_MAP_SIDEBy default, memory backed joins are enabled on Windows platform, if required to disable, please uncomment the property.
mapreduce.input.linerecordreader.line.maxlength2097152Controls the maximum line size (in characters) allowed before filtering it on read. Below value is around a maximum of 4MB per line.

Error Handling Properties

workbook.error-handling.default

IGNORE

Legal values for workbook error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore, Skip and Abort.

export-job.error-handling.default

DROP_RECORD

Legal values for export job error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore error, Drop record and Abort job.

import-job.error-handling.default

DROP_RECORD

Legal values for import job error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.

data-link.error-handling.defaultDROP_RECORD

Legal values for data link error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.

...