Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
titleINFO

This page describes only the minimal set of steps needed for Datameer X to operate properly in your Hadoop environment. For a more complete guide to Hadoop cluster design, configuration, and tuning, see Hadoop Cluster Configuration Tips or the additional resources section below.

...

Your Hadoop cluster is accessed by Datameer X as a particular Hadoop user. By default, the user is identified as the UNIX user who launched the Datameer X application (equivalent to the UNIX command 'whoami'). To ensure this works properly, you should create a user of the same name within Hadoop's HDFS for Datameer X to use exclusively for scheduling and configuration. This ensures the proper permissions are set, and that this user is recognized whenever the Datameer X application interacts with your Hadoop cluster.

The username used to launch the Datameer X application can be configured in <Datameer X folder>/etc/das-env.sh

...

Name

Value

Description

Location

mapred.map.tasks.speculative.execution
mapred.reduce.tasks.speculative.execution

false

Datameer X currently doesn't support speculative execution. However you don't need to configure these properties cluster-wide. Datameer X disables speculative execution for every job it submits. You must ensure that these properties aren't set to 'true' cluster-wide with the final parameter also set to 'true'. This prevents a client from changing this property on a job by job basis.

das-job.properties

mapred.job.reduce.total.mem.bytes

0

Datameer X turns off in memory shuffle - this could lead to an 'out of memory exception' in the reduce phase.

das-job.properties

mapred.map.output.compression.codec
mapred.output.compression.codec

usually one of:
org.apache.hadoop.io.compress.DefaultCodec
org.apache.hadoop.io.compress.GzipCodec
org.apache.hadoop.io.compress.BZip2Codec
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

Datameer X cares about the what and the when of compression but not about the how. It uses the codec you have configured for your cluster. If you would like to change the codec to another available one, then set these properties in Datameer. Otherwise, Datameer X uses the default in mapred-site.xml on the cluster. Furthermore, if you have configured a non-standard codec like LZO, it is necessary to install this codec on the machine running Datameer. See See Frequently Asked Hadoop Questions#Q. How do I configure Datameer/Hadoop to use native compression?.

mapred-site.xml

Highly Recommended Hadoop Settings

...