Tez Activation and Setup
INFO
Tez is the default execution framework for customers without a Smart ExecutionTM license.
Usage Requirements
INFO
You need a Hortonworks Hadoop distribution based on Apache Hadoop 2.2 and higher with YARN, e.g., HDP 2.1.x or higher or CDH 5.x or higher.
Activation
- Install a license that supports Smart ExecutionTM by selecting the Admin tab in Datameer X and adding a license that supports the feature. If you are interested in obtaining a Smart Execution TM license, contact your sales representative.
Configure Smart ExecutionTM specific properties under the Admin tab > Hadoop Cluster > Custom Properties:
#Used as a foundation for calculating the optimal number of splits based on the number of physical nodes. #(split count without looking at the data sizes would be something like numberOfNodes * das.yarn.available-node-vcores * das.splitting.map-wave-count.) # And, used to calculate the initial reducer task parallelism based on the number of physical nodes. # `auto` uses the YARN apis to determine the information based on the configured YARN queue if possible. das.yarn.available-node-vcores=auto #The memory allocated for the Tez application master. das.job.application-manager.memory=2048 #The memory allocated for each Tez task. das.job.map-task.memory=2048 das.job.reduce-task.memory=2048 #Maximum amount of data that can be included in one RPC call. ipc.maximum.data.length=134217728 #Improve sort memory allocation (should be ~1/8th of available heap) tez.runtime.io.sort.mb=256 tez.runtime.io.sort.factor=100 #Reduce Shuffle overhead tez.runtime.shuffle.keep-alive.enabled=true
Set up LD_LIBRARY_PATH, the path which links native libraries and codecs.
To do so, configure Smart ExecutionTM specific properties under the Admin tab > Hadoop Cluster > Custom Properties:tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute_path_to_native_lib> tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<absolute_path_to_native_lib>
This setting is required for few Hadoop distributions, such as HDP-2.1.2. Without this setting, the jobs fail because they are unable to link to the native libs.
<absolute_path_to_native_lib>
should point to the folder location where the native libs and codecs are located on the cluster nodes. You can run the following command on one of the cluster node and get the folder name:find / -name libhadoop.so
If you don't have a clear HADOOP_COMMON_HOME variable set, use these properties to manually define your local path:
tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/native/lib:/usr/hadoop/native/lib/Linux-amd64-64:/usr/hadoop/lib:/usr/hadoop/lib/lib tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/native/lib:/usr/hadoop/native/lib/Linux-amd64-64:/usr/hadoop/lib:/usr/hadoop/lib/lib
- Run a cluster health job to check if the Datameer X configurations are configured correctly..
The cluster health job can be found under the Admin tab > Cluster Health > Hadoop Cluster Diagnostics > Run Tests.
- Wait for the cluster health job to finish and verify the tabulated report of the diagnostic.
Setup on AWS
Those using Amazon Web Services (AWS) need to perform the following additional steps to activate Smart ExecutionTM.
- Open custom property file additions under the Admin tab > Hadoop Cluster > Custom Properties.
Add the following properties which specify open port ranges that sync with your inbound rules on the AWS EC2 management console.
tez.am.client.am.port-range=50000-55000
Open firewall ports on the AWS EC2 management console.
You need to open ports that match in number with the range above, to allow the Hadoop cluster to communicate and expose various web interfaces to Datameer. Add the inbound rules to the default security group on the AWS EC2 management console. These firewalls customize the security groups in order to minimize attack surface area while not blocking essential communication channels.
Deactivation
Tez is the default execution framework when Smart Execution is not purchased or is disabled.
Smart ExecutionTM can be disabled either globally for all jobs or for specific jobs using the following property file:
das.execution-framework=Tez
Global deactivation for all jobs
Under Admin tab > Hadoop Cluster > Custom Properties, add the above property:
Single job deactivation
Import/Export job
Create/edit an Import/Export Job > Schedule > Advanced > Custom Properties:
Workbook
Save/Edit a workbook > Workbook Settings > Advanced > Custom Properties: