Tez Tuning and Troubleshooting

Tez Tuning

Keep the following tips in mind when tuning Tez for your environment:

  • Use das.job.map-task.memory, das.job.reduce-task.memory, and das.job.application-manager.memory to control container sizes as the Tez container sizing parameters are ignored by Datameer.

    • The container sizes need to be in multiples of the YARN yarn.scheduler.minimum-allocation-mb property, since YARN only allocates containers in multiples of that property.
      • Example: If the yarn.scheduler.minimum-allocation-mb is set to 4GB and you set das.job.map-task.memory to 6GB, YARN allocates an 8GB container even though you only asked for 6GB.

  • Datameer X doesn't translate MapReduce configuration parameters over to Tez equivalents, so Tez must be configured separately using the values from MapReduce as a starting point.
  • Extensive tuning is not required. Most frequently, you need to tune only container sizing and shuffle sizing.
  • Overriding the launch.cluster-default.cmd-opts parameter is unnecessary for tuning memory sizing. For container JVM memory sizing, use the das.job.container.memory-heap-fraction parameter instead. The default for that parameter is 0.8.
  • In Tez 0.7 the pipeline sorter (shuffle) is the default and it is unnecessary set the shuffle threads. There were some bugs in 0.7 with the shuffle and sorter implementation that might require setting tez.runtime.sorter.class=LEGACY. Datameer X versions 5.11 and later use Tez 0.7.1, which stabilized the pipeline sort implementation and this workaround is no longer required and can be removed.

Tez tuning example

The following properties are set:

das.yarn.available-node-vcores=auto
das.job.map-task.memory=4096
das.job.reduce-task.memory=4096
das.job.application-manager.memory=4096
tez.runtime.io.sort.mb=512
tez.runtime.io.sort.factor=100

Property

Reason

das.yarn.available-node-vcores=auto

Defines the number of vcores available per node and is used to determine the number of map and reduce tasks.

When set with the value "auto", the assigned value is read from yarn-site.xml - this does not respect queue configurations. If Datameer X is running on a multi-tenant cluster, this should be set equal to the number of vcores allocated to the Datameer X execution queue.

das.job.map-task.memory=4096
The map task memory value and can be calculated by:

available node memory in mB / available node vcores - (Round down to the full GB. E.g., 5120, 6144, 7168, 8192)

The recommended map tasks memory minimum is 4096.

das.job.reduce-task.memory=4096
 The reduce tasks memory should be set to (available node memory divided by available node vcores)
das.job.application-manager.memory=4096
 The application manager memory should be set to (available node memory divided by available node vcores)
tez.runtime.io.sort.mb=512
 The Tez runtime mb should be set to 1/8-1/4 the das.job.map-task.memory value
tez.runtime.io.sort.factor=100

 The Tez runtime factor should be set to 100 in most cases


Troubleshooting

Use the following tips to troubleshoot Tez:

  • Fetch YARN application logs, as most actions occur on the cluster with Tez. These logs include: 
    • File/source splitting and artifact merging in the AM
    • Data fetching and computation in the tasks
  • Enable automatic log fetching (even if the tasks don’t fail) using the following properties:
    • das.debug.tasks.logs.collect.force=true
    • das.debug.tasks.logs.collect.max=30 (30 is the default, which can be increased)
  • Use Hadoop custom properties to set security and environment overrides, including the following:
    • Encrypted shuffle
    • LD_LIBRARY_PATH
  •  Check shuffle configuration to solve out of memory errors:
    • Shuffle doesn’t always speed things up
    • Tune across multiple jobs instead of just one giant sort job (single group by or join)
  • Enable SSL for shuffle process using the following properties:
    • tez.runtime.shuffle.ssl.enable=true
    • tez.runtime.shuffle.keep-alive.enabled=true
  • Refer to the Knowledge Base for detailed troubleshooting steps on shuffle failures.

Containers running beyond physical memory limits

Problem

With the introduction of Parquet as file storage for Datameer X there may be some jobs failing with the following error:

WARN [<timestamp>] [ConcurrentJobExecutor-0] (DefaultMrJobClient.java:185) - attempt_<id>: Container [pid=<pid>,containerID=container_<id>] is running beyond physical memory limits. Current usage: <value> GB of <value> GB physical memory used. 

Configuring the typical memory parameters and increasing their values doesn't alleviate the problem:

das.job.map-task.memory=4096
das.job.reduce-task.memory=4096
das.job.application-manager.memory=4096

Additionally, increasing the off-heap memory space by using the heap fraction parameter doesn't alleviate the issue until set to 0.5, which may cause other memory issues.

das.job.container.memory-heap-fraction=0.5

Cause

On very busy clusters, the YARN resource manager is more strict about memory limits. This strict behavior combined with the high rate of direct memory buffers being allocated by the implementation of Parquet and Tez's shuffle leads to containers being killed before JVM Garbage Collection can occur.

Solution

To mitigate this issue, the fraction of memory dedicated to off-heap operations should be slightly increased and the amount of Direct Memory allocated within the JVM should be bound to 25% of the total container memory.

For example, if working with a 4GB container, the following properties should be used:

das.job.map-task.memory=4096 
das.job.reduce-task.memory=4096
das.job.application-manager.memory=4096
das.job.container.memory-heap-fraction=0.7
tez.task.launch.cmd-opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:MaxDirectMemorySize=1024m

Heap Fraction is a percentage of the overall container memory. By setting a Heap Fraction of 0.7, you're allocating 30% of the 4GB to off-heap operations. By setting a Direct Memory size of 25% while allocating 30% to off-heap, it shouldn't be possible to overrun the off-heap memory space.

On HDP clusters where the tez.task.launch.cmd-opts value is already being set by the tez-site.xml in the Hadoop configuration directory, you must make sure to use the existing tez.task.launch.cmd-opts value and add the correct setting for -XX:MaxDirectMemorySize=1024m to the rest of the command line flags. The example above is the default value that comes with Tez and any other changes made should be reflected in the value configured for Datameer.

Compression

You can add the following values to Custom Properties to enable compression codecs.

PropertyDescriptionExample value
tez.runtime.compressSpecifies whether intermediate data should be compressed or not.TRUE
tez.runtime.compress.codecUsed for compressing intermediate data. Only applicable if tez.runtime.compress is enabled.org.apache.hadoop.io.compress.SnappyCodec