...
...
...
...
Info | ||
---|---|---|
| ||
This page describes only the minimal set of steps needed for Datameer X to operate properly in your Hadoop environment. For a more complete guide to Hadoop cluster design, configuration, and tuning, see Hadoop Cluster Configuration Tips or the additional resources section below. |
...
Your Hadoop cluster is accessed by Datameer X as a particular Hadoop user. By default, the user is identified as the UNIX user who launched the Datameer X application (equivalent to the UNIX command 'whoami
'). To ensure this works properly, you should create a user of the same name within Hadoop's HDFS for Datameer X to use exclusively for scheduling and configuration. This ensures the proper permissions are set, and that this user is recognized whenever the Datameer X application interacts with your Hadoop cluster.
The username used to launch the Datameer X application can be configured in <Datameer X folder>/etc/das-env.sh
...
Hadoop permissions can be set with the following commands. Note that permission checking can be enabled/disabled globally for HDFS. See Configuring Datameer X in a Shared Hadoop Cluster for more information.
Code Block |
---|
hadoop -fs [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-chgrp [-R] GROUP PATH...] |
...
For standard Hadoop compression algorithms, you can choose the algorithm Datameer X should use. However, if your Hadoop cluster is using a non-standard compression algorithm such as LZO, you need to install these libraries onto the Datameer X machine. This is necessary so that Datameer X can read the files it writes to HDFS, and decompress files residing on HDFS which you wish to import. Libraries which utilize native compression require both a Java (JAR) and native code component (UNIX packages). The Java component is a JAR file which that needs to be placed into <Datameer X folder>/etc/custom-jars
. See Frequently Asked Hadoop Questions#Q. How do I configure Datameer/Hadoop to use native compression? for for more details.
Note |
---|
The configuration of Hadoop compression can drastically affect Datameer X performance. See Hadoop Cluster Configuration Tips for more information. |
Connecting Datameer X to Your Hadoop Cluster
By default, Datameer X isn't connected to any Hadoop cluster and operates in local mode, with all analytics and other functions performed by a local instance of Hadoop, which is useful for prototyping with small data sets, but not for high-volume testing or production. To connect Datameer X to a Hadoop cluster, change the settings in Datameer X in the Admin > Hadoop Cluster page. Click the Admin tab at the top of the page and click the Hadoop Cluster tab in the left column. Click Edit and change the Mode setting.
...
Name | Value | Description | Location |
---|---|---|---|
mapred.map.tasks.speculative.execution | false | Datameer X currently doesn't support speculative execution. However you don't need to configure these properties cluster-wide. Datameer X disables speculative execution for every job it submits. You must ensure that these properties aren't set to 'true' cluster-wide with the final parameter also set to 'true'. This prevents a client from changing this property on a job by job basis. | das-job.properties |
mapred.job.reduce.total.mem.bytes | 0 | Datameer X turns off in memory shuffle - this could lead to an 'out of memory exception' in the reduce phase. | das-job.properties |
mapred.map.output.compression.codec | usually one of: | Datameer X cares about the what and the when of compression but not about the how. It uses the codec you have configured for your cluster. If you would like to change the codec to another available one, then set these properties in Datameer. Otherwise, Datameer X uses the default in | mapred-site.xml |
Highly Recommended Hadoop Settings
...