Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...


Info
titleINFO

This page describes only the minimal set of steps needed for Datameer X to operate properly in your Hadoop environment. For a more complete guide to Hadoop cluster design, configuration, and tuning, see Hadoop Cluster Configuration Tips or the additional resources section below.

...

Your Hadoop cluster is accessed by Datameer X as a particular Hadoop user. By default, the user is identified as the UNIX user who launched the Datameer X application (equivalent to the UNIX command 'whoami'). To ensure this works properly, you should create a user of the same name within Hadoop's HDFS for Datameer X to use exclusively for scheduling and configuration. This ensures the proper permissions are set, and that this user is recognized whenever the Datameer X application interacts with your Hadoop cluster.

The username used to launch the Datameer X application can be configured in <Datameer X folder>/etc/das-env.sh

...

Hadoop permissions can be set with the following commands. Note that permission checking can be enabled/disabled globally for HDFS. See Configuring Datameer X in a Shared Hadoop Cluster for more information.

Code Block
hadoop -fs
      [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
      [-chown [-R] [OWNER][:[GROUP]] PATH...]
      [-chgrp [-R] GROUP PATH...]

...

For standard Hadoop compression algorithms, you can choose the algorithm Datameer X should use. However, if your Hadoop cluster is using a non-standard compression algorithm such as LZO, you need to install these libraries onto the Datameer X machine. This is necessary so that Datameer X can read the files it writes to HDFS, and decompress files residing on HDFS which you wish to import. Libraries which utilize native compression require both a Java (JAR) and native code component (UNIX packages). The Java component is a JAR file which that needs to be placed into <Datameer X folder>/etc/custom-jars. See Frequently Asked Hadoop Questions#Q. How do I configure Datameer/Hadoop to use native compression? for  for more details.

Note

The configuration of Hadoop compression can drastically affect Datameer X performance. See Hadoop Cluster Configuration Tips for more information.

Connecting Datameer X to Your Hadoop Cluster

By default, Datameer X isn't connected to any Hadoop cluster and operates in local mode, with all analytics and other functions performed by a local instance of Hadoop, which is useful for prototyping with small data sets, but not for high-volume testing or production. To connect Datameer X to a Hadoop cluster, change the settings in Datameer X in the Admin > Hadoop Cluster page. Click the Admin tab at the top of the page and click the Hadoop Cluster tab in the left column. Click Edit and change the Mode setting.

...

Name

Value

Description

Location

mapred.map.tasks.speculative.execution
mapred.reduce.tasks.speculative.execution

false

Datameer X currently doesn't support speculative execution. However you don't need to configure these properties cluster-wide. Datameer X disables speculative execution for every job it submits. You must ensure that these properties aren't set to 'true' cluster-wide with the final parameter also set to 'true'. This prevents a client from changing this property on a job by job basis.

das-job.properties

mapred.job.reduce.total.mem.bytes

0

Datameer X turns off in memory shuffle - this could lead to an 'out of memory exception' in the reduce phase.

das-job.properties

mapred.map.output.compression.codec
mapred.output.compression.codec

usually one of:
org.apache.hadoop.io.compress.DefaultCodec
org.apache.hadoop.io.compress.GzipCodec
org.apache.hadoop.io.compress.BZip2Codec
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

Datameer X cares about the what and the when of compression but not about the how. It uses the codec you have configured for your cluster. If you would like to change the codec to another available one, then set these properties in Datameer. Otherwise, Datameer X uses the default in mapred-site.xml on the cluster. Furthermore, if you have configured a non-standard codec like LZO, it is necessary to install this codec on the machine running Datameer. See See Frequently Asked Hadoop Questions#Q. How do I configure Datameer/Hadoop to use native compression?.

mapred-site.xml

Highly Recommended Hadoop Settings

...