Tez Tuning and Troubleshooting
Tez Tuning
Keep the following tips in mind when tuning Tez for your environment:
Use
das.job.map-task.memory, das.job.reduce-task.memory
, anddas.job.application-manager.memory
to control container sizes as the Tez container sizing parameters are ignored by Datameer.- The container sizes need to be in multiples of the YARN
yarn.scheduler.minimum-allocation-mb
property, since YARN only allocates containers in multiples of that property.Example: If the
yarn.scheduler.minimum-allocation-mb
is set to 4GB and you setdas.job.map-task.memory
to 6GB, YARN allocates an 8GB container even though you only asked for 6GB.
- The container sizes need to be in multiples of the YARN
- Datameer X doesn't translate MapReduce configuration parameters over to Tez equivalents, so Tez must be configured separately using the values from MapReduce as a starting point.
- Extensive tuning is not required. Most frequently, you need to tune only container sizing and shuffle sizing.
- Overriding the
launch.cluster-default.cmd-opts
parameter is unnecessary for tuning memory sizing. For container JVM memory sizing, use thedas.job.container.memory-heap-fraction
parameter instead. The default for that parameter is 0.8. - In Tez 0.7 the pipeline sorter (shuffle) is the default and it is unnecessary set the shuffle threads. There were some bugs in 0.7 with the shuffle and sorter implementation that might require setting
tez.runtime.sorter.class=LEGACY
. Datameer X versions 5.11 and later use Tez 0.7.1, which stabilized the pipeline sort implementation and this workaround is no longer required and can be removed.
Tez tuning example
The following properties are set:
das.yarn.available-node-vcores=auto das.job.map-task.memory=4096 das.job.reduce-task.memory=4096 das.job.application-manager.memory=4096 tez.runtime.io.sort.mb=512 tez.runtime.io.sort.factor=100
Property | Reason |
---|---|
das.yarn.available-node-vcores=auto | Defines the number of vcores available per node and is used to determine the number of map and reduce tasks. When set with the value "auto", the assigned value is read from yarn-site.xml - this does not respect queue configurations. If Datameer X is running on a multi-tenant cluster, this should be set equal to the number of vcores allocated to the Datameer X execution queue. |
das.job.map-task.memory=4096 | The map task memory value and can be calculated by: available node memory in mB / available node vcores - (Round down to the full GB. E.g., 5120, 6144, 7168, 8192) The recommended map tasks memory minimum is 4096. |
das.job.reduce-task.memory=4096 | The reduce tasks memory should be set to (available node memory divided by available node vcores) |
das.job.application-manager.memory=4096 | The application manager memory should be set to (available node memory divided by available node vcores) |
tez.runtime.io.sort.mb=512 | The Tez runtime mb should be set to 1/8-1/4 the das.job.map-task.memory value |
tez.runtime.io.sort.factor=100 | The Tez runtime factor should be set to 100 in most cases |
Troubleshooting
Use the following tips to troubleshoot Tez:
- Fetch YARN application logs, as most actions occur on the cluster with Tez. These logs include:
- File/source splitting and artifact merging in the AM
- Data fetching and computation in the tasks
- Enable automatic log fetching (even if the tasks don’t fail) using the following properties:
das.debug.tasks.logs.collect.force=true
das.debug.tasks.logs.collect.max=30
(30 is the default, which can be increased)- Use Hadoop custom properties to set security and environment overrides, including the following:
- Encrypted shuffle
- LD_LIBRARY_PATH
- Check shuffle configuration to solve out of memory errors:
- Shuffle doesn’t always speed things up
- Tune across multiple jobs instead of just one giant sort job (single group by or join)
- Enable SSL for shuffle process using the following properties:
tez.runtime.shuffle.ssl.enable=true
tez.runtime.shuffle.keep-alive.enabled=true
- Refer to the Knowledge Base for detailed troubleshooting steps on shuffle failures.
Containers running beyond physical memory limits
Problem
With the introduction of Parquet as file storage for Datameer X there may be some jobs failing with the following error:
WARN [<timestamp>] [ConcurrentJobExecutor-0] (DefaultMrJobClient.java:185) - attempt_<id>: Container [pid=<pid>,containerID=container_<id>] is running beyond physical memory limits. Current usage: <value> GB of <value> GB physical memory used.
Configuring the typical memory parameters and increasing their values doesn't alleviate the problem:
das.job.map-task.memory=4096 das.job.reduce-task.memory=4096 das.job.application-manager.memory=4096
Additionally, increasing the off-heap memory space by using the heap fraction parameter doesn't alleviate the issue until set to 0.5, which may cause other memory issues.
das.job.container.memory-heap-fraction=0.5
Cause
On very busy clusters, the YARN resource manager is more strict about memory limits. This strict behavior combined with the high rate of direct memory buffers being allocated by the implementation of Parquet and Tez's shuffle leads to containers being killed before JVM Garbage Collection can occur.
Solution
To mitigate this issue, the fraction of memory dedicated to off-heap operations should be slightly increased and the amount of Direct Memory allocated within the JVM should be bound to 25% of the total container memory.
For example, if working with a 4GB container, the following properties should be used:
das.job.map-task.memory=4096 das.job.reduce-task.memory=4096 das.job.application-manager.memory=4096 das.job.container.memory-heap-fraction=0.7 tez.task.launch.cmd-opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:MaxDirectMemorySize=1024m
Heap Fraction is a percentage of the overall container memory. By setting a Heap Fraction of 0.7, you're allocating 30% of the 4GB to off-heap operations. By setting a Direct Memory size of 25% while allocating 30% to off-heap, it shouldn't be possible to overrun the off-heap memory space.
On HDP clusters where the tez.task.launch.cmd-opts
value is already being set by the tez-site.xm
l in the Hadoop configuration directory, you must make sure to use the existing tez.task.launch.cmd-opts
value and add the correct setting for -
XX:MaxDirectMemorySize=1024m
to the rest of the command line flags. The example above is the default value that comes with Tez and any other changes made should be reflected in the value configured for Datameer.
Compression
You can add the following values to Custom Properties to enable compression codecs.
Property | Description | Example value |
---|---|---|
tez.runtime.compress | Specifies whether intermediate data should be compressed or not. | TRUE |
tez.runtime.compress.codec | Used for compressing intermediate data. Only applicable if tez.runtime.compress is enabled. | org.apache.hadoop.io.compress.SnappyCodec |