Set Up Datameer on YARN

Set Up Datameer on YARN

Setting Up Datameer with YARN

Datameer can connect to YARN cluster, depending on the Hadoop distribution used.

An admin has complete configuration in the Hadoop Cluster configuration page. The Hadoop Cluster configuration page requests for different configurations for YARN cluster when compared with a classic MapReduce cluster.

Following are the new configurations required to be setup by a Datameer admin in Hadoop Cluster configuration page (the user interface suggests the default values to be used):

Configuration

Type

Configuration

Type

Resource Tracker Address

host:port

Yarn Resource Manager Address

host:port

Resource Manager Webapp Address

host:port

Yarn Resource Manager Scheduler Address

host:port

Yarn MR Job History Address

host:port

Yarn Application Classpath

string: class path

While working with secure Kerberos mode the Hadoop Cluster configuration page asks for the YARN Principal as well.

Debugging options

Some options for better debugging:

By default the container logs are deleted once the container is released, so you never see the actual reason for the failing container. The log message provided in the node managers log is not sufficient.

For better and faster debugging set the below configuration in yarn-site.xml (on all nodes):

<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>-1</value>
</property>

values here can be any positive integer, -1 implies never delete

Restart node manager to reflect the changes. After this change you should see container logs in the yarn log folder.

Occasionally you can even look at launch_container.sh script to check the runtime configurations of the application container.

Common YARN Configurations

#Configs set by in one of your SX Lab Environments das.execution-framework.small-job.max-records=1000000 das.execution-framework.small-job.max-uncompressed=1000000000 tez.am.container.reuse.enabled=true tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/ tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/ #das.tez.reduce-tasks-per-node=6 #tez.shuffle-vertex-manager.desired-task-input-size=50485760 das.tez.session-pool.max-cached-sessions=3 das.tez.session-pool.max-idle-time=600s tez.am.session.min.held-containers=1 tez.am.container.idle.release-timeout-min.millis=60000 tez.am.container.idle.release-timeout-max.millis=60000 das.execution-framework=Smart das.map-tasks-per-node=16 ### We've seen this in virtual environments mapreduce.map.memory.mb=1536 mapreduce.map.java.opts=-Xmx3226m mapreduce.reduce.memory.mb=2240 mapreduce.reduce.java.opts=-Xmx4704m yarn.app.mapreduce.am.command-opts=-Xmx2016m yarn.scheduler.minimum-allocation-vcores=1 yarn.scheduler.maximum-allocation-vcores=8 yarn.nodemanager.resource.memory-mb=13440 yarn.scheduler.minimum-allocation-mb=1024 yarn.app.mapreduce.am.resource.cpu-vcores=1 mapreduce.reduce.cpu.vcores=1 yarn.nodemanager.resource.cpu-vcores=8 yarn.app.mapreduce.am.resource.mb=2240 yarn.scheduler.maximum-allocation-mb=13440 mapreduce.map.cpu.vcores=1 mapreduce.task.io.sort.mb=896   #Smaller jobs got better performance with customer FLS Connect with this setting: tez.shuffle-vertex-manager.desired-task-input-size=52428800

Configuring Datameer When Upgrading from MR1 to MR2

Complete the following steps to configure Datameer when upgrading from MR1 to MR2:

  1. Stop Datameer.

  2. Take a dap database backup.

  3. Execute the following query on the dap database: 

    UPDATE property SET value = "LOCAL" WHERE name = "hadoop.mode";
  4. Start Datameer.

  5. Login and configure the MapR 4 MR2 settings on the Admin > Hadoop Cluster tab.

Configuring YARN with High Availability

To configure YARN with High Availability, see Configuring Datameer.