Page Comparison

Table of Contents

...

Name	Default Value	Description
fs.AbstractFileSystem.hdfs.impl	org.apache.hadoop.fs.Hdfs	Property is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework.
io.compression.codecs.addition	datameer.dap.common.util.ZipCodec,datameer.dap.common.util.LzwCodec	Property is required for successfully starting yarn application as these properties are stripped out while setting up the overlay by DAS framework. Additional comma=separated compression codecs that are added to io.compression.codecs
das.yarn.available-node-vcores	auto	This property sets the number of available node vcores and is used in order to calculate the optimal number of splits and tasks (numberOfNodes * das.yarn.available-node-vcores). It can be set either to the number of free CPUs per node, or it can be set to auto. In auto mode, Datameer fetches information about the vCores from every node and sets das.yarn.available-node-vcores to the average of the available vCores.
das.job.health.check	true	Turn das job configuration health check on for cluster configuration.
das.local.exec.map.count	5	DAS property to influence the number of mappers in Local Execution mode.

List of Hive Properties

Name	Default Value	Description
dap.hive.use-datameer-file-splitter	fales	Property decides whether Datameer should use Datameer's FileSplitter logic to generate RewindableFileSplits or Hive's own splitting logic.

List of Execution Framework Properties

Name	Default Value	Description
General
das.execution-framework	Tez	Sets the execution framework e.g. 'das.execution-framework=Spark' to run Spark as the execution framework or 'das.execution-framework=Tez' to run Tez as the execution framework.
Spark


TEZ
das.tez.session-pool.max-cached-sessions	auto	Maximum number of idle session to be held in cache. 0 means no session pooling, auto uses the number of nodes in the cluster as value for the maximum number of sessions.
das.tez.session-pool.max-idle-time	2m	The timeout for the sessions held in pool for which they wait for DAG to be submitted.
das.tez.session-pool.max-time-to-live	2h	Maximum amount of time a tez session will be alive irrespective of being idle or active since its start time.
tez.runtime.compress	true	Settings for intermediate compression with Tez.
tez.runtime.compress.codec	org.apache.hadoop.io.compress.SnappyCodec	Settings for intermediate compression with Tez.
framework.local.das.parquet-storage.max-parquet-block-size	67108864	For local execution framework setting the max parquet block size to 64MB(can't be raised beyond 64MB but, can be lowered) by default.
tez.shuffle-vertex-manager.desired-task-input-size	52428800	Sets the input size for each TEZ sub-task.
das.cluster.plugin.resources.fixer.enabled	true	Intercepts the YARN errors YARN-3591 and YARN 6641 and rebuilds the missing plug-in class files on the fly. Setting the value to 'false' disables the property.
YARN MR2
das.yarn.base-counter-wait-time	40000	YARN Counters take a while to show up at the Job History Server, so we use this base wait time + a small factor * the number of executed tasks to wait for the counters to finally show up.

List of Windows Properties

Name	Default Value	Description
windows.das.join.disabled-strategies	MEMORY_BACKED_MAP_SIDE	By default, memory backed joins are enabled on Windows platform, if required to disable, please uncomment the property.
mapreduce.input.linerecordreader.line.maxlength	2097152	Controls the maximum line size (in characters) allowed before filtering it on read. Below value is around a maximum of 4MB per line.

List of Error Handling Properties

Name	Default Value	Description
workbook.error-handling.default	IGNORE	Legal values for workbook error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore, Skip and Abort.
export-job.error-handling.default	DROP_RECORD	Legal values for export job error handling are IGNORE, DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Ignore error, Drop record and Abort job.
import-job.error-handling.default	DROP_RECORD	Legal values for import job error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.
data-link.error-handling.default	DROP_RECORD	Legal values for data link error handling are DROP_RECORD, ABORT_JOB. The equivalent error handling modes in the UI are Drop record and Abort job.

...

Versions Compared

Old Version 1

New Version 2

Key

List of Hive Properties

List of Execution Framework Properties

General

Spark

TEZ

YARN MR2

List of Windows Properties

List of Error Handling Properties