Hadoop provides scalable data storage using the /wiki/spaces/DASSB70/pages/33036123891 and fast parallel data processing on a fault-tolerant cluster of computers. Learn more about Hadoop.
See /wiki/spaces/DASSB70/pages/33036120935 to learn more about Hadoop and how to use it with Datameer.
Table of Contents |
---|
Configuring Hadoop Cluster
To configure the Hadoop cluster settings in Datameer, you need to know which type of mode you are using and the appropriate settings for that mode such as file system or root directory within HDFS. If you don't have this information readily available, you might need to contact someone within your own organization who can assist you.
The Hadoop cluster can be configured to use local, Hadoop cluster, or Kerberos secured. These are described in the sections that follow.
General configuration
- Click the Admin tab.
- Click the 'Cluster Configuration' tab at the left side. The current settings are shown.
- Click Edit to make changes.
- Click Save when you are finished making changes.
Hadoop cluster settings
...
Specify the name node and add a private folder path or use impersonation if applicable.
Whitespaces aren't supported for use in file/folder paths. Avoid setting up Datameer storage directories (storage root path, temp paths, execution framework specific staging directories, etc.) with a whitespace in the path.
Note |
---|
Impersonation notes:
Learn about /wiki/spaces/DASSB70/pages/33036121028 with Datameer. |
...
Use the properties text boxes to add Hadoop and custom properties.
Enter a name and value to add a property, or delete a name and value pair to delete the property.
Note | |||||
---|---|---|---|---|---|
Within these edit fields, backslash (
The second backslash is needed as you are effectively editing a Java properties file in these edit fields. |
...
Local execution settings
Note |
---|
Not available with Enterprise. |
To edit Local Execution settings:
...
Info | ||
---|---|---|
| ||
Datameer supports to connect to Kubernetes clusters as well as to Hadoop clusters. You can also configure the cluster to be used locally or Kerberos secured. Find here all information about the cluster configuration. |
Table of Contents |
---|
Configuring a Cluster
Info | ||
---|---|---|
| ||
First decide which type of Datameer mode you are using and the appropriate settings for that mode, such as file system or root directory within the HDFS. |
Viewing the Current Configuration
To access the cluster configuration, click "Admin" and select "Cluster Configuration". The configuration page opens. The current cluster settings are displayed.
Editing the Configuration
To edit the current cluster configuration:
- Click "Edit". The configuration page opens in edit mode.
- Change the needed information and confirm with "Save". The configuration is finished.
Configuring Local Execution Mode
Info | ||
---|---|---|
| ||
This Datameer mode is not available with Enterprise. |
To edit 'Local Execution' settings:
- Click on "Edit". The configuration page opens.
- Select "Local Execution" as the cluster mode from the drop-down.
- Enter the needed default properties in the "Default Hadoop Properties" text box.
INFO: A property contains of a property name and a value.
INFO: Delete a name and value pair to delete a property.
- Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box.
- Enter the needed custom properties in the "Custom Properties" text box.
- Confirm with "Save". The configuration is finished.
Configuring the Hadoop Cluster
To edit 'Hadoop Cluster' settings:
- Click on "Edit". The configuration page opens.
- Select "Hadoop Cluster" as the cluster mode from the drop-down.
- Enter the needed default properties in the "Default Hadoop Properties" text box.
INFO: A property contains of a property name and a value.
INFO: Delete a name and value pair to delete a property.
- Enter the needed specific properties in the "Hadoop Distribution Specific Properties" text box.
- Enter the needed custom properties in the "Custom Properties" text box.
- Confirm with "Save". The configuration is finished.
- Click the Admin tab.
- Click the Hadoop Cluster tab at the left side. The current settings are shown.
- Click Edit to make changes.
- Select Hadoop Cluster for the mode.
Specify the name node and add a private folder path or use impersonation if applicable.
Whitespaces aren't supported for use in file/folder paths. Avoid setting up Datameer storage directories (storage root path, temp paths, execution framework specific staging directories, etc.) with a whitespace in the path.Note Impersonation notes:
- There is one-to-one mapping between the Datameer user and the OS user.
- The OS user who is launching the Datameer process must be a sudoer.
- The temp folder for the Datameer installation local file system as well as in the Hadoop cluster (for Datameer) should have read/write access.<Datameer_Installation_Folder>/tmp
(Local FileSystem)<Datameer_Private_Folder>/temp
(Hadoop Cluster and MapR)
Learn about /wiki/spaces/DASSB70/pages/33036121028 with Datameer.
- Specify YARN settings.
Use the properties text boxes to add Hadoop and custom properties.
Enter a name and value to add a property, or delete a name and value pair to delete the property.Note Within these edit fields, backslash (
\
) characters are interpreted by Datameer as an escape character rather than a plain text character. In order to produce the actual backslash character, you have to type two backslashes:Code Block language text example.property=example text, a backslash \\ and further text
The second backslash is needed as you are effectively editing a Java properties file in these edit fields.
- Logging options. Select the severity of messages to be logged. The logging customization field allows to record exactly what is needed.
- Click Save when you are finished making changes.
...
- Click the Admin tab at the top of the page.
- Click the Hadoop Cluster tab at the left side. The current settings are shown.
- Click Edit to make changes and choose MapR in the mode list.
Add the cluster name, the Datameer private folder, and check the boxes if using /wiki/spaces/DASSB70/pages/33036121047 for Datameer to submit jobs and access the HDFS on behalf of Datameer user, and the Max Concurrent jobs.
There is one-to-one mapping between the Datameer user and the OS user.
The OS user who is launching the Datameer process must be a sudoer.
The temp folder for the Datameer installation local file system as well as in the hadoop cluster (for Datameer) should have read/write access.<Datameer_Installation_Folder>/tmp
(Local FileSystem)<Datameer_Private_Folder>/temp
(Hadoop Cluster and MapR)
Note
Connecting to a secure MapR clusterAnchor secure_mapr secure_mapr 1) Obtain the MapR ticket for the user who is running the Datameer application. Execute the following command on the shell:
Code Block maprlogin password -user <user_who_starts_datameer>
2) Install Datameer and open
<Datameer_Home>/etc/das-env.sh
and add the following system property to the Java arguments:Code Block -Dmapr.secure.mode=true
3) Start and configure Datameer using MapR Grid Mode.
The option to connect using Secure Impersonation is now available.
4) (Optional) If there is a failure in saving the configuration:
Code Block Caused by: java.io.IOException: Can't get Master Kerberos principal for use as renewer
Add the following custom Hadoop properties under the Hadoop Admin page:
Code Block yarn.resourcemanager.principal=<value>
The value for this property can be found in the
yarn-site.xml
file in your Hadoop cluster configuration.The steps to achieve impersonation are same as for a secured Kerberos cluster.
- If required, enter properties. Enter a name and value to add a property, or delete a name and value pair to delete that property.
- Logging options. Select the severity of messages to be logged. It is also possible to write custom log settings to record exactly what is needed.
- Click Save when you are finished making changes.
Anchor | ||||
---|---|---|---|---|
|
...
In conf/default.properties
you can change the value designating the time zone:
system.property.das.default-timezone=default |
If the time zone is changed on the machine where Datameer is running, Datameer must be restarted to show the new default time zone configuration.
Examples
Time zone | Description |
---|---|
default | Local server time |
PST | Pacific Standard Time |
PST8PDT | This time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -7 hours (PDT) during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -8 hours (PST). |
CST | Central Standard Time |
America/Los_Angeles | Time zone for Los Angeles (USA), this time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -7 hours during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -8 hours. |
EST5EDT | This time zone changes to daylight saving time (DST) in the spring. The GMT offset is UTC/GMT -4 hours (EDT) during this time. In the fall it changes back to standard time, the GMT offset is then UTC/GMT -5 hours (EST). |