Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

NameDefault ValueDescription
fs.AbstractFileSystem.hdfs.implorg.apache.hadoop.fs.Hdfs

Property is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework.

io.compression.codecs.additiondatameer.dap.common.util.ZipCodec,datameer.dap.common.util.LzwCodecProperty is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework. Additional comma=separated compression codecs that are added to io.compression.codecs
das.yarn.available-node-vcoresauto

This property sets the number of available node vcores and is used in order to calculate the optimal number of splits and tasks (numberOfNodes * das.yarn.available-node-vcores). It can be set either to the number of free CPUs per node, or it can be set to auto. In auto mode, Datameer fetches information about the vCores from every node and sets das.yarn.available-node-vcores to the average of the available vCores. 

das.job.health.checktrueTurn das job configuration health check on for cluster configuration.
das.local.exec.map.count5DAS property to influence the number of mappers in Local Execution mode.

List of Hive Properties 

NameDefault ValueDescription
dap.hive.use-datameer-file-splitterfalesProperty decides whether Datameer should use Datameer's FileSplitter logic to generate RewindableFileSplits or Hive's own splitting logic.

List of Execution Framework Properties

NameDefault ValueDescription

General



das.execution-framework

TezSets the execution framework e.g. 'das.execution-framework=Spark' to run Spark as the execution framework or 'das.execution-framework=Tez' to run Tez as the execution framework.

Spark







TEZ

das.tez.session-pool.max-cached-sessionsautoMaximum number of idle session to be held in cache. 0 means no session pooling, auto uses the number of nodes in the cluster as value for the maximum number of sessions.
das.tez.session-pool.max-idle-time2mThe timeout for the sessions held in pool for which they wait for DAG to be submitted.
das.tez.session-pool.max-time-to-live2hMaximum amount of time a tez session will be alive irrespective of being idle or active since its start time.
tez.runtime.compresstrueSettings for intermediate compression with Tez.
tez.runtime.compress.codecorg.apache.hadoop.io.compress.SnappyCodecSettings for intermediate compression with Tez.
framework.local.das.parquet-storage.max-parquet-block-size67108864For local execution framework setting the max parquet block size to 64MB(can't be raised beyond 64MB but, can be lowered) by default.
tez.shuffle-vertex-manager.desired-task-input-size52428800Sets the input size for each TEZ sub-task.
das.cluster.plugin.resources.fixer.enabledtrueIntercepts the YARN errors YARN-3591 and YARN 6641 and rebuilds the missing plug-in class files on the fly. Setting the value to 'false' disables the property.

YARN MR2

das.yarn.base-counter-wait-time40000YARN Counters take a while to show up at the Job History Server, so we use this base wait time + a small factor * the number of executed tasks to wait for the counters to finally show up.


List of Windows Properties

NameDefault ValueDescription
windows.das.join.disabled-strategiesMEMORY_BACKED_MAP_SIDEBy default, memory backed joins are enabled on Windows platform, if required to disable, please uncomment the property.
mapreduce.input.linerecordreader.line.maxlength2097152Controls the maximum line size (in characters) allowed before filtering it on read. Below value is around a maximum of 4MB per line.

List of Error Handling Properties

NameDefault ValueDescription

workbook.error-handling.default

IGNORE

Legal values for workbook error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore, Skip and Abort.

export-job.error-handling.default

DROP_RECORD

Legal values for export job error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore error, Drop record and Abort job.

import-job.error-handling.default

DROP_RECORD

Legal values for import job error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.

data-link.error-handling.defaultDROP_RECORD

Legal values for data link error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.

...