/
Configuring Datameer

Configuring Datameer

INFO

Datameer X supports to connect to Hadoop clusters. You can also configure the cluster to be used locally or Kerberos secured. Find here all information about the cluster configuration.

Configuring a Cluster 

INFO

First decide which type of Datameer X mode you are using and the appropriate settings for that mode, such as file system or root directory within the HDFS.

Viewing the Current Configuration

To access the cluster configuration, click "Admin" and select "Cluster Configuration"The configuration page opens. The current cluster settings are displayed. 


Editing the Configuration

To edit the current cluster configuration:

  1. Click "Edit"The configuration page opens in edit mode. 
     
  2. Change the needed information and confirm with "Save"The configuration is finished.
     

Configuring Local Execution Mode

INFO

This Datameer X mode is not available with Enterprise.

To edit 'Local Execution' settings:

  1. Click on "Edit"The configuration page opens. 
     
  2. Select "Local Execution" as the cluster mode from the drop-down. 
     
  3. Enter the needed default properties in the "Default Hadoop Properties" text box. 
    INFO: A property contains of a property name and a value. 
    INFO: Delete a name and value pair to delete a property. 
     
  4. Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box. 
     
  5. Enter the needed custom properties in the "Custom Properties" text box. 
     
  6. Confirm with "Save"The configuration is finished. 
     

Configuring the Hadoop Cluster 

To edit 'Hadoop Cluster' settings:

  1. Click on "Edit"The configuration page opens. 
     
  2. Select "Hadoop Cluster" as the cluster mode from the drop-down. 
     
  3. Enter the needed default properties in the "Default Hadoop Properties" text box. 
    INFO: A property contains of a property name and a value. 
    INFO: Delete a name and value pair to delete a property. 
     
  4. Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box. 
     
  5. Enter the needed custom properties in the "Custom Properties" text box. 
     
  6. Confirm with "Save"The configuration is finished. 
  1. Click the Admin tab.
  2. Click the Hadoop Cluster tab at the left side. The current settings are shown.
  3. Click Edit to make changes.
  4. Select Hadoop Cluster for the mode.
  5. Specify the name node and add a private folder path or use impersonation if applicable.
    Whitespaces aren't supported for use in file/folder paths. Avoid setting up Datameer X storage directories (storage root path, temp paths, execution framework specific staging directories, etc.) with a whitespace in the path.

    Impersonation notes:
    - There is one-to-one mapping between the Datameer X user and the OS user.  
    - The OS user who is launching the Datameer X process must be a sudoer.
    - The temp folder for the Datameer X installation local file system as well as in the Hadoop cluster (for Datameer) should have read/write access.

      • <Datameer_Installation_Folder>/tmp (Local FileSystem)
      • <Datameer_Private_Folder>/temp (Hadoop Cluster and MapR)

    Learn about /wiki/spaces/DASSB70/pages/33036121028 with Datameer.

  6. Specify YARN settings.
  7. Use the properties text boxes to add Hadoop and custom properties. 
    Enter a name and value to add a property, or delete a name and value pair to delete the property.

    Within these edit fields, backslash (\) characters are interpreted by Datameer X as an escape character rather than a plain text character. In order to produce the actual backslash character, you have to type two backslashes:

    example.property=example text, a backslash \\ and further text

    The second backslash is needed as you are effectively editing a Java properties file in these edit fields.


  8. Logging options. Select the severity of messages to be logged. The logging customization field allows to record exactly what is needed.
  9. Click Save when you are finished making changes.

Kerberos secured Hadoop

 Kerberos authentication is available with Datameer's Advanced Governance through a plug-in. If you used Kerberos prior to 5.11, make sure to install this plug-in when upgrading.

Prerequisites:

  • Mapping 1:1 the Datameer X Service Account between hosts (Datameer X and all HDFS nodes)
  • Adding the Datameer X service account to the HDFS supergroup
  • Utilizing the correct krb5.conf in order for Datameer X to communicate with the key distribution center (KDC) correctly. By default, Datameer X assumes that the file is in /etc/krb5.conf on the Datameer X application server. If the file is in another location, specify the path.

    etc/das-env.sh
    export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.security.krb5.conf=/home/datameer/krb5.conf"

To edit settings for a Kerberos secured Hadoop cluster:

  1. Click the Admin tab
  2. Click the Hadoop Cluster tab at the left side. The current settings are shown.
  3. Click Edit to make changes and choose Kerberos Secured Hadoop in the Mode list if needed. Click the link in the dialog box to learn more.
  4. Specify the URI for the name node, the private folder Datameer X should use, whether impersonation should be enabled, and whether to enable HDFS transparent encryption.
  5. Enter the necessary Kerberos information: the Kerberos principal for the Datameer X user, the path to the keytab containing Kerberos secrets, the Kerberos principal for YARN, the Kerberos principal for HDFS, and the Kerberos principal for MapReduce.
  6. Use the Custom Properties text box to add custom properties. Enter a name and value to add a property, or delete a name and value pair to delete that property.
  7. Logging options. Select the severity of messages to be logged. It is also possible to write custom log settings to record exactly what is needed.
  8. Click Save when you are finished making changes.

Autoconfigure grid mode 

This feature is not supported with Cloudera Manager Safety Valve.

In Autoconfigure Grid Mode, Datameer X evaluates your cluster and automatically configures the optimal Hadoop cluster settings. You connect Datameer X to your Hadoop cluster by installing a properly configured Hadoop client package on the Datameer X server (using an installation manager such as ClouderaM or Ambari or manually) and then providing Datameer X with a path to that client. 

The Autoconfigure Grid Mode reads all the cluster's property files and evaluates the following properties:

spark.master
das.yarn.available-node-memory
das.yarn.available-node-vcores
spark.yarn.am.cores
spark.yarn.am.memory
spark.yarn.am.memoryOverhead
spark.driver.cores
spark.driver.memory
spark.yarn.driver.memoryOverhead
spark.executor.cores
spark.yarn.executor.memoryOverhead
das.spark.context.max-executors
das.spark.context.auto-scale-enabled
spark.executor.memory
spark.submit.deployMode

You can connect Datameer X to a Hadoop cluster by installing a properly configured Hadoop client package on the Datameer X server (using an installation manager such as ClouderaM or Ambari or manually) and then providing Datameer X with a path to that client. 

  1. Go to Admin tab > Hadoop Cluster.
  2. Select Autoconfigure Cluster in the Mode field.
  3. In the Hadoop Configuration Directory field, enter the directory where the configuration files for Hadoop are located.
  4. Enter the path to the folder where Datameer X puts its private information in Hadoop's file system in the Datameer X Private Folder field. 
     
  5. Enter the number of concurrent jobs.
  6. Select whether to use secure impersonation. 
  7. Edit the Hadoop or custom properties as necessary.
    If the yarn.timeline-service.enabled property is true in the Hadoop conf files, set yarn.timeline-service.enabled=false as a custom property. (This change is not needed as of Datameer X v6.1)
     
  8. Click Save.

To update your Datameer X Autoconfig Grid Mode cluster configuration, click Edit on the Hadoop Cluster page and click Save. After saving has been completed, the updated configuration has been applied. New updates to the settings are not shown in the conductor.log but are displayed when clicking again on Edit for Autoconfigure Grid Mode under the Hadoop Properties label.

Autoconfigure Grid Mode should not be used when servers in your Hadoop cluster require specific settings for specialized tasks.

MapR

To edit settings for clusters using MapR:

  1. Click the Admin tab at the top of the page.
  2. Click the Hadoop Cluster tab at the left side. The current settings are shown.
  3. Click Edit to make changes and choose MapR in the mode list.
  4. Add the cluster name, the Datameer X private folder, and check the boxes if using /wiki/spaces/DASSB70/pages/33036121047 for Datameer X to submit jobs and access the HDFS on behalf of Datameer X user, and the Max Concurrent jobs. 
     
    There is one-to-one mapping between the Datameer X user and the OS user.  
    The OS user who is launching the Datameer X process must be a sudoer.
    The temp folder for the Datameer X installation local file system as well as in the hadoop cluster (for Datameer) should have read/write access.

      • <Datameer_Installation_Folder>/tmp (Local FileSystem)
      • <Datameer_Private_Folder>/temp (Hadoop Cluster and MapR)

    Connecting to a secure MapR cluster

    1) Obtain the MapR ticket for the user who is running the Datameer X application. Execute the following command on the shell:

    maprlogin password -user <user_who_starts_datameer>

    2) Install Datameer X and open <Datameer_Home>/etc/das-env.sh and add the following system property to the Java arguments: