Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • DFS block size is defaulted in Datameer X to 64MB
  • Open the Datameer X application
  • Click on the Administration tab at the top of the screen.
  • Select Hadoop Cluster from the navigation bar on the left and Hadoop Cluster in the mode settings.
  • In the custom properties box type in the new block size
    • dfs.block.size=[size]

The block size must be an integer and can't be a string value.

Example: (134217728 = 128mb)


Code Block
dfs.block.size=134217728

Memory and Task Calculation

...

Name

Value

Reasoning

mapred.job.tracker

<masterhost>:9001

  • Nodes need to know

mapred.tasktracker.map.tasks.maximum

<how many map task per task-tracker concurrently>

  • Optimal use of the power of your machines
  • Maximum slots should be slightly more than number of cores, map tasks take the majority of free slots (6 cores > 8 slots > 6 map task max)
  • Sensitive to memory available on machine - per slot <mapred.child.java.opts> of memory has to be available

mapred.tasktracker.reduce.tasks.maximum

<how many reduce task per task-tracker concurrently>

  • Optimal use of the power of your machines
  • Maximum slots should be slightly more then number of cores, reduce tasks take the minority of free slots (6 cores > 8 slots > 2 reduce tasks max)
  • Sensitive to memory available on machine - per slot <mapred.child.java.opts> of memory has to be available

mapred.child.java.opts
(mapred.map.child.java.opts and mapred.reduce.child.java.opts for version >=0.21)

-Xmx500m

Datameer X needs a minimum of 500 MB per task jvm with its default configuration. Interacts with <mapred.tasktracker.map.tasks.maximum> and <mapred.tasktracker.reduce.tasks.maximum>

mapred.local.dir

<comma-separated list of all folders where the big temporary data should go.>

  • Should be distributed over all data partitions with dfs.data.dir

mapred.system.dir

<path in hdfs where small control files are stored>

  • Default location is inside of /tmp - This is dangerous.

...

Name

Recommended Value

Description

Location

mapred.map.tasks.speculative.execution
mapred.reduce.tasks.speculative.execution

false

(warning) Datameer X currently doesn't support speculative execution. However you don't need to configure these properties cluster-wide since Datameer X disables speculative execution for every job it submits. You only need to make sure that these properties aren't set cluster-wide to true with the final parameter set to true as well, not allowing a client to change that property on a job basis anymore.

hdfs-site.xml

mapred.map.output.compression.codec
mapred.output.compression.codec

usually one of:
org.apache.hadoop.io.compress.DefaultCodec
org.apache.hadoop.io.compress.GzipCodec
org.apache.hadoop.io.compress.BZip2Codec
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

Datameer X cares about the what and the when of compression but not about the how, means the codec. It uses the codec you configured for the cluster. If you've configured a non-standard codec like LZO, you have to make sure that Datameer X has access to the codec as well. See Frequently Asked Hadoop Questions#Q. How do I configure Datameer/Hadoop to use native compression?

mapred-site.xml

...