Installation Guide

Installation Guide

This installation guide explains how to set up Datameer for enterprise and productions environments. If you are upgrading from a previous version, see the Upgrade Instructions.

Following this step-by-step guide also prepares you for later unattended installation, integration into AnsibleChefPuppet, or Saltstack, and creating a log of changes. To achieve this, configuration and property changes in files are made using sed.

Prerequisites

Complete the following prerequisites before installing Datameer:

  • Install the Hadoop client

    • Datameer application server, as well all data nodes, are configured properly with host names, DNS, datetime, NTP, and other details 

      • Datameer application server, as well all data nodes, have Java 1.8 (Oracle recommended)

        • Check this installation using the following commands: java -version and echo $JAVA_HOME

      • Datameer application server has Oracle Java Cryptography Extension (JCE) already installed. See Java SE Security for more information.

      • Commands such as hadoopyarn, and mysql can be executed

  • Install MySQL client

    • For Datameer's application database, the MySQL server must be prepared with necessary access 

  • Grant administrative rights or root access

  • Ensure Internet access to download packages and plug-ins or have necessary ZIP files downloaded and available

  • If using Kerberos, configure a Kerberos Secured cluster for secure impersonation

Create the Datameer User

Administrative rights are required to create the Datameer user on the machine where Datameer is being installed. This can be accomplished under the root account. Make sure the user ID is above 500 and that the account has enough resources and file descriptors available. 

  • Create the user account under which the Datameer service will be started and running later:

    Create user

    ./usr/sbin/groupadd --system datameer ./usr/sbin/useradd --system --create-home --gid datameer datameer

    These commands also create the directory /home/datameer

  • Check the max number of open files - global level or per-user limits (or both) - and set it to 64K if it isn't set already. This configuration needs to be done on all nodes within the cluster and might require a reboot.

Create Directories for Application, Cache, Logs, and Temporary Files 

For performance reasons and to have better control about where space on the file systems and on disks is used, create separate directories for application, cache, logs, and temporary files. Do this according the Linux Filesystem Hierarchy Standard (FHS). To create the directories and change the permissions you need administrative rights. Complete this task under the user account root.

  • Create the directories for application, cache, logs, and temporary files:

    Create directories

    mkdir -p /opt/datameer chown -R datameer:datameer /opt/datameer mkdir -p /var/cache/datameer chown -R datameer:datameer /var/cache/datameer mkdir -p /var/log/datameer chown -R datameer:datameer /var/log/datameer mkdir -p /tmp/datameer chown -R datameer:datameer /tmp/datameer

Switch the User and Change the Working Directory 

This should be the last task to which administrative rights are necessary. 

  • Switch to the new Datameer user and change to the working directory where Datameer is being installed:

    Switch user and directory

    su - datameer cd /opt/datameer

    Proceed from within the Datameer installation directory and under the user account datameer only. 

Download and Unzip Datameer

Download the appropriate Datameer package for your Hadoop cluster distribution. If you have already a Datameer installation you can also start from here. 

  • Download and unzip. 

    Switch user and directory

    curl -s -k -o Datameer-<package>.zip "https://download.datameer.com.s3.amazonaws.com/releases/Datameer-<version>/<dist>/Datameer-<package>.zip?<AWSproperties>" ; unzip Datameer*

If you are an authorized Enterprise customer, you can get the download link for the latest public available package from https://my.datameer.com/workspace/downloads, or request one through your Customer Success Manager (CSM). 

Best Practice: Create a symlinks and change the working and log directory

 To be prepared for future upgrades, create symlinks to the current (or latest) package as well as for the log directory. 

  • Create symlink and change the working directory:

    Create symlink and change working directory

    ln -s Datameer-<package> current cd current

By default, all Datameer logs are in the installation subdirectory logs/ . For logs, there is no single property to specify the location, but many depending on the type of log. The main configuration file where you can change the location for most of the log files is conf/log4j-production.properties. To keep the change fast and simple, log in a central location according the Linux Filesystem Hierarchy Standard (FHS)

  • Move the log directory:

    Create symlink to log directory

    mv logs/.donotdelete /var/log/datameer rm -rf logs ln -s /var/log/datameer logs

Download and Install the MySQL Database JDBC Connector 

By default, the Datameer application runs with an HSQL file database that is created on the local filesystem under das-data/database/hsql-db. If you are setting up Datameer for production use, Datameer strongly recommends using MySQL instead of the HSQL file database. 
As of Datameer 7.4: MariaDB is supported as an alternative to MySQL.

  • Download the official MySQL JDBC driver ZIP file, extract the driver from the archive file, and copy it into the correct destination:

    Download and install JDBC

    # Lookup latest JDBC driver version JDBCDRV="$(curl -s -k 'https://dev.mysql.com/downloads/connector/j/' | grep -o -m 1 'mysql-connector-java.*zip')"  # Download latest JDBC driver version curl -s -L0 -k -O "https://dev.mysql.com/get/Downloads/Connector-J/${JDBCDRV}"  # Unzip driver package unzip mysql-connector* -d etc/custom-jars  # Move only the necessary JAR file mv etc/custom-jars/mysql-connector*/*bin.jar etc/custom-jars # Clean up rm -rf etc/custom-jars/mysql-connector-java-?.?.?? 
  • Double-check if etc/custom-jars contain the latest mysql-connector-java-<version>-bin.jar:

    Check installation

    echo $JDBCDRV ll etc/custom-jars 

Configure Datameer for MySQL Database

Datameer service depends on the MySQL database. The MySQL database is used for writing to workbooks, permission changes, job execution, scheduling, and more. To function properly,  a response time should be between ten and twenty milliseconds. To run the application in MySQL mode, the following changes need to be implemented. As of Datameer 7.4: MariaDB is supported as an alternative to MySQL.

  • Check database connection:

    Connection check

    mysqladmin version mysqladmin ping mysqladmin status echo q | telnet -e q `hostname` 3306 nc -z -w1 `hostname` 3306

    You can follow up later with using the Check if the Datameer Application Database is Running and Accessible article.

  • Initialize application database:

    Initialize database

    mysql -uroot -p < bin/mysql-init.sql mysql -uroot -p dap < bin/create-tables.sql