Datameer, Hadoop, and VMware

Running Datameer X or Hadoop in a VMware virtual machine (VM) results in slower performance compared to running both applications on native hardware. The performance impact is affected in particular by allocated cores and RAM, and the performance of the file system layer in between the VM and the hardware.

What Can be Installed in a VM?

The following combinations of Datameer X and Hadoop are supported:

  • Both Datameer X and Hadoop can be installed in the same VM
  • Datameer X installed in a VM can be configured against an external Hadoop or EMR grid (Administration - Hadoop Cluster)
  • Datameer X can be configured to use Hadoop installed in a VM for testing purposes

Installing Hadoop in a VM

Use the following configuration tips when setting up a VM to run Hadoop:

  • At least 2 cores should be assigned to the VMware instance.
  • At least 1.5 GB RAM should be assigned to the VMware instance.
  • Estimate the amount of data to process and store when determining the size of of the virtual hard drive (> 10 GB recommended for a small demo setup).
  • Installing on top of an actual Linux distribution is recommended.
  • Hadoop version 0.20.2 or newer is recommended.
  • Be aware that Hadoop preserves space for the operating system using the dfs.datanode.du.reserved property (see conf/hdfs-site.xml) and can't store blocks if this limit is undercut.

Install Hadoop in a VM

  • Install Linux into VMware instance
  • See the Hadoop Quick Start for preparing the operating system and installation of Hadoop (standalone or pseudo-distributed mode)

Here is general information about using Hadoop in a VM:

Tips and configuration settings

  • Allocating hard drive space on demand results in a smaller image size when you start using the system.
  • Allocating the full hard drive space for VMware improves performance but requires that you define the final size from the beginning.
  • The space for (application) data is limited by the VM image settings (which you store via shared VMware folders into the host filesystem space).

Installing Datameer X in a Virtual Machine

Use the standard installation process to install Datameer X in a VM. For detailed instructions, see the Setup Guide.

To get started:

  • Set up a supported Linux distribution on the VM, see Supported Operating Systems.
  • Install the required software for the Datameer X application server (see the software section under System Requirements).
  • Keep in mind the limitations defined by the virtual machine settings.