Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Hadoop provides scalable data storage using the /wiki/spaces/DASSB70/pages/33036123891 and fast parallel data processing on a fault-tolerant cluster of computers. Learn more about Hadoop.

See /wiki/spaces/DASSB70/pages/33036120935 to learn more about Hadoop and how to use it with Datameer.

Table of Contents

Configuring Hadoop Cluster

To configure the Hadoop cluster settings in Datameer, you need to know which type of mode you are using and the appropriate settings for that mode such as file system or root directory within HDFS. If you don't have this information readily available, you might need to contact someone within your own organization who can assist you.

The Hadoop cluster can be configured to use local, Hadoop cluster, or Kerberos secured. These are described in the sections that follow.

General configuration

  1. Click the Admin tab.
  2. Click the 'Cluster Configuration' tab at the left side. The current settings are shown.
    Image Removed 
  3. Click Edit to make changes.
    Image Removed 
  4. Click Save when you are finished making changes.

Hadoop cluster settings

...

Specify the name node and add a private folder path or use impersonation if applicable.
Whitespaces aren't supported for use in file/folder paths. Avoid setting up Datameer storage directories (storage root path, temp paths, execution framework specific staging directories, etc.) with a whitespace in the path.

Note

Impersonation notes:
- There is one-to-one mapping between the Datameer user and the OS user.  
- The OS user who is launching the Datameer process must be a sudoer.
- The temp folder for the Datameer installation local file system as well as in the Hadoop cluster (for Datameer) should have read/write access.

    • <Datameer_Installation_Folder>/tmp (Local FileSystem)
    • <Datameer_Private_Folder>/temp (Hadoop Cluster and MapR)

Learn about /wiki/spaces/DASSB70/pages/33036121028 with Datameer.

Image Removed

...

Use the properties text boxes to add Hadoop and custom properties. 
Enter a name and value to add a property, or delete a name and value pair to delete the property.

Note

Within these edit fields, backslash (\) characters are interpreted by Datameer as an escape character rather than a plain text character. In order to produce the actual backslash character, you have to type two backslashes:

Code Block
languagetext
example.property=example text, a backslash \\ and further text

The second backslash is needed as you are effectively editing a Java properties file in these edit fields.

...

Local execution settings

Note

Not available with Enterprise.

To edit Local Execution settings:

...

Info
titleINFO

Datameer supports to connect to Kubernetes clusters as well as to Hadoop clusters. You can also configure the cluster to be used locally or Kerberos secured. Find here all information about the cluster configuration.

Table of Contents

Configuring a Cluster 

Info
titleINFO

First decide which type of Datameer mode you are using and the appropriate settings for that mode, such as file system or root directory within the HDFS.

Viewing the Current Configuration

To access the cluster configuration, click "Admin" and select "Cluster Configuration"The configuration page opens. The current cluster settings are displayed. 

Image Added

Editing the Configuration

To edit the current cluster configuration:

  1. Click "Edit"The configuration page opens in edit mode. 
    Image Added 
  2. Change the needed information and confirm with "Save"The configuration is finished.
    Image Added 

Configuring Local Execution Mode

Info
titleINFO

This Datameer mode is not available with Enterprise.

To edit 'Local Execution' settings:

  1. Click on "Edit"The configuration page opens. 
    Image Added 
  2. Select "Local Execution" as the cluster mode from the drop-down. 
    Image Added 
  3. Enter the needed default properties in the "Default Hadoop Properties" text box. 
    INFO: A property contains of a property name and a value. 
    INFO: Delete a name and value pair to delete a property. 
    Image Added 
  4. Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box. 
    Image Added 
  5. Enter the needed custom properties in the "Custom Properties" text box. 
    Image Added 
  6. Confirm with "Save"The configuration is finished. 
    Image Added 

Configuring the Hadoop Cluster 

To edit 'Hadoop Cluster' settings:

  1. Click on "Edit"The configuration page opens. 
    Image Added 
  2. Select "Hadoop Cluster" as the cluster mode from the drop-down. 
    Image Added 
  3. Enter the needed default properties in the "Default Hadoop Properties" text box. 
    INFO: A property contains of a property name and a value. 
    INFO: Delete a name and value pair to delete a property. 
    Image Added 
  4. Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box. 
    Image Added 
  5. Enter the needed custom properties in the "Custom Properties" text box. 
    Image Added 
  6. Confirm with "Save"The configuration is finished. 
    Image Added
  1. Click the Admin tab.
  2. Click the Hadoop Cluster tab at the left side. The current settings are shown.
  3. Click Edit to make changes.
  4. Select Hadoop Cluster for the mode.
    Image Added
  5. Specify the name node and add a private folder path or use impersonation if applicable.
    Whitespaces aren't supported for use in file/folder paths. Avoid setting up Datameer storage directories (storage root path, temp paths, execution framework specific staging directories, etc.) with a whitespace in the path.

    Note

    Impersonation notes:
    - There is one-to-one mapping between the Datameer user and the OS user.  
    - The OS user who is launching the Datameer process must be a sudoer.
    - The temp folder for the Datameer installation local file system as well as in the Hadoop cluster (for Datameer) should have read/write access.

      • <Datameer_Installation_Folder>/tmp (Local FileSystem)
      • <Datameer_Private_Folder>/temp (Hadoop Cluster and MapR)

    Learn about /wiki/spaces/DASSB70/pages/33036121028 with Datameer.

    Image Added

  6. Specify YARN settings.
    Image Added
  7. Use the properties text boxes to add Hadoop and custom properties. 
    Enter a name and value to add a property, or delete a name and value pair to delete the property.

    Note

    Within these edit fields, backslash (\) characters are interpreted by Datameer as an escape character rather than a plain text character. In order to produce the actual backslash character, you have to type two backslashes:

    Code Block
    languagetext
    example.property=example text, a backslash \\ and further text

    The second backslash is needed as you are effectively editing a Java properties file in these edit fields.


    Image Added

  8. Logging options. Select the severity of messages to be logged. The logging customization field allows to record exactly what is needed.
    Image Added
  9. Click Save when you are finished making changes.

...

  1. Click the Admin tab at the top of the page.
  2. Click the Hadoop Cluster tab at the left side. The current settings are shown.
  3. Click Edit to make changes and choose MapR in the mode list.
  4. Add the cluster name, the Datameer private folder, and check the boxes if using /wiki/spaces/DASSB70/pages/33036121047 for Datameer to submit jobs and access the HDFS on behalf of Datameer user, and the Max Concurrent jobs. 
     
    There is one-to-one mapping between the Datameer user and the OS user.  
    The OS user who is launching the Datameer process must be a sudoer.
    The temp folder for the Datameer installation local file system as well as in the hadoop cluster (for Datameer) should have read/write access.

      • <Datameer_Installation_Folder>/tmp (Local FileSystem)
      • <Datameer_Private_Folder>/temp (Hadoop Cluster and MapR)
    Note

    Anchor
    secure_mapr
    secure_mapr
    Connecting to a secure MapR cluster

    1) Obtain the MapR ticket for the user who is running the Datameer application. Execute the following command on the shell:

    Code Block
    maprlogin password -user <user_who_starts_datameer>

    2) Install Datameer and open <Datameer_Home>/etc/das-env.sh and add the following system property to the Java arguments:

    Code Block
    -Dmapr.secure.mode=true

    3) Start and configure Datameer using MapR Grid Mode.

    The option to connect using Secure Impersonation is now available.

    4) (Optional) If there is a failure in saving the configuration:

    Code Block
    Caused by: java.io.IOException: Can't get Master Kerberos principal for use as renewer

    Add the following custom Hadoop properties under the Hadoop Admin page: 

    Code Block
    yarn.resourcemanager.principal=<value>

    The value for this property can be found in the yarn-site.xml file in your Hadoop cluster configuration.

    The steps to achieve impersonation are same as for a secured Kerberos cluster.

  5. If required, enter properties. Enter a name and value to add a property, or delete a name and value pair to delete that property.
  6. Logging options. Select the severity of messages to be logged. It is also possible to write custom log settings to record exactly what is needed.
  7. Click Save when you are finished making changes.

Anchor
HA
HA
 Configuring High Availability

...

In conf/default.properties you can change the value designating the time zone:

system.property.das.default-timezone=default

If the time zone is changed on the machine where Datameer is running, Datameer must be restarted to show the new default time zone configuration.

Examples

Time zone

Description

defaultLocal server time

PST

Pacific Standard Time

PST8PDT

This time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -7 hours (PDT) during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -8 hours (PST).

CST

Central Standard Time

America/Los_Angeles

Time zone for Los Angeles (USA), this time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -7 hours during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -8 hours.

EST5EDT

This time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -4 hours (EDT) during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -5 hours (EST).