Before installing Datameer X, complete the following prerequisites:
java -version
and echo $JAVA_HOME
hadoop
, yarn
, and mysql
can be executedroot
accessAdministrative rights are required to create the Datameer X user on the machine where Datameer X is being installed. This can be accomplished under the root
account. Make sure the user ID is above 500 and that the account has enough resources and file descriptors available.
Create the user account under which the Datameer X service will be started and running later:
./usr/sbin/groupadd --system datameer ./usr/sbin/useradd --system --create-home --gid datameer datameer |
These commands also create the directory /home/datameer
.
Check the max number of open files - global level or per-user limits (or both) - and set it to 64K if it isn't set already. This configuration needs to be done on all nodes within the cluster and might require a reboot.
For performance reasons and to have better control about where space on the file systems and on disks is used, create separate directories for application, cache, logs, and temporary files. Do this according the Linux Filesystem Hierarchy Standard (FHS). To create the directories and change the permissions you need administrative rights. Complete this task under the user account root
.
Create the directories for application, cache, logs, and temporary files:
mkdir -p /opt/datameer chown -R datameer:datameer /opt/datameer mkdir -p /var/cache/datameer chown -R datameer:datameer /var/cache/datameer mkdir -p /var/log/datameer chown -R datameer:datameer /var/log/datameer mkdir -p /tmp/datameer chown -R datameer:datameer /tmp/datameer |
This should be the last task to which administrative rights are necessary.
Switch to the new Datameer X user and change to the working directory where Datameer X is being installed:
su - datameer cd /opt/datameer |
Proceed from within the Datameer X installation directory and under the user account datameer
only.
Download and unzip the appropriate Datameer X package for your Hadoop cluster distribution:
INFO: If you already have a Datameer X installation, you can also start from here.
curl -s -k -o Datameer-<package>.zip "https://download.datameer.com.s3.amazonaws.com/releases/Datameer-<version>/<dist>/Datameer-<package>.zip?<AWSproperties>" ; unzip Datameer* |
To be prepared for future upgrades, you should create symlinks to the current (or latest) package as well as for the log directory. |
Create the symlink and change the working directory:
ln -s Datameer-<package> current cd current |
By default, the Datameer X application runs with an HSQL file database that is created on the local filesystem under 'das-data/database/hsql-db'. If you are setting up Datameer X for production use, Datameer strongly recommends MySQL instead of the HSQL file database. MariaDB is also supported as an alternative metastore database engine to MySQL. |
To define which database to use, make an entry in the 'live.properties' file under 'conf/live.properties':
#Define which database to use: hsql-memory, hsql-file, mysql, mariadb system.property.db.mode=mysql |
or
#Define which database to use: hsql-memory, hsql-file, mysql, mariadb system.property.db.mode=mariadb |
By default, the Datameer X application runs with an HSQL file database that is created on the local filesystem under das-data/database/hsql-db
. If you are setting up Datameer X for production use, Datameer strongly recommends using MySQL instead of the HSQL file database.
Download the official MySQL JDBC driver ZIP file, extract the driver from the archive file, and copy it into the correct destination:
# Lookup latest JDBC driver version JDBCDRV="$(curl -s -k 'https://dev.mysql.com/downloads/connector/j/' | grep -o -m 1 'mysql-connector-java.*zip')" # Download latest JDBC driver version curl -s -L0 -k -O "https://dev.mysql.com/get/Downloads/Connector-J/${JDBCDRV}" # Unzip driver package unzip mysql-connector* -d etc/custom-jars # Move only the necessary JAR file mv etc/custom-jars/mysql-connector*/*bin.jar etc/custom-jars # Clean up rm -rf etc/custom-jars/mysql-connector-java-?.?.?? |
Double-check if etc/custom-jars
contain the latest mysql-connector-java-<version>-bin.jar:
echo $JDBCDRV ll etc/custom-jars |
Datameer X service depends on the MySQL database. The MySQL database is used for writing to workbooks, permission changes, job execution, scheduling, and more. To function properly, a response time should be between ten and twenty milliseconds. To run the application in MySQL mode, the following changes need to be implemented.
Check the database connection:
mysqladmin version mysqladmin ping mysqladmin status echo q | telnet -e q `hostname` 3306 nc -z -w1 `hostname` 3306 |
INFO: You can follow up later with using the Check if the Datameer X Application Database is Running and Accessible article.
Create a new database via the "./bin/mysql-init.sql" script:
for MYSQL 5.x and lower:
mysql -u <user> -p<password> -h <host/ip> -P <port> < bin/mysql-init.sql |
for MYSQL 8.x and higher:
mysql -u <user> -p<password> -h <host/ip> -P <port> < bin/mysql8x-init.sql |
Set the deploy mode to "live" in the "./etc/das-env.sh" file:
# Change this to DAS_DEPLOY_MODE=live when you want to run in live mode against a mysql db export DAS_DEPLOY_MODE=live |
Set the database name in the "./conf/default.properties" file:
# Set the name of the MySql database DATAMEER uses. system.property.db.name=dap |
Execute the command to initialize the database:
./bin/database.sh init |
The database configuration is now completed.
If you don't have a license, email the application's product ID to license@datameer.com and request the key. Find the product ID displayed at the 'Welcome' page. See 'License Information' for information on how to update the license and for details about volume-based licensing. |
If you have already received a Datameer X license:
Start the Datameer X service.
Working within the current
installation directory, use the following commands:
# Start the Datameer X service ./bin/conductor.sh start # Check the process ID (PID) ps -ef | grep -i "java.*jetty.*datameer" | grep -v grep | tr -s " " | cut -d " " -f2 # Monitor the process booting and the log files cat logs/jvm-stdout.log; sleep 3; tail -F logs/`date +"%Y_%m_%d"`.stderrout.log logs/conductor.log |
Stop the Datameer X service.
Working within the current
installation directory, use the following commands:
# Stop the Datameer X service ./bin/conductor.sh stop # Monitor the process shutting down cat logs/jvm-stdout.log; sleep 3; tail -F logs/`date +"%Y_%m_%d"`.stderrout.log logs/conductor.log |
Restart the Datameer X service.
Working within the current
installation directory, use the following commands:
# Restart the Datameer X service ./bin/conductor.sh restart # Monitor the process booting and the log files cat logs/jvm-stdout.log; sleep 3; tail -F logs/`date +"%Y_%m_%d"`.stderrout.log logs/conductor.log |
Gracefully shut down the Datameer X service.
Check if the Datameer X service is running and accessible.
Working within the current
installation directory, use the following commands:
./bin/conductor.sh check ps -ef | grep -i "java.*datameer" | grep -v grep lsof -i tcp@`hostname`:8080 lsof -i tcp@`hostname`:8443 lsof -i tcp | grep 'datameer' echo -e "GET /login \n\n" | openssl s_client -connect `hostname`:8443 -quiet | grep -i -m 1 'datameer' |
Monitor if the service is running and accessible later:
curl -k "https://`hostname`:8443/watchdog" lsof -i -p ps -ef | grep -i "java.*jetty.*datameer" | grep -v grep | tr -s " " | cut -d " " -f2 |
You can also monitor the Datameer X core directory size in HDFS.
Before configuring Datameer X for a Kerberos Secured cluster, test Kerberos authentication and job execution on CLI.
Send a test job to the cluster:
hadoop jar /<distribution-specific-path>/hadoop-mapreduce-examples-*.jar pi -Dmapreduce.job.queuename=root.default 3 10 |
To configure Datameer X for a Kerberos-secured cluster, follow the Secure Mode Configuration instructions.
You must have a properly configured connection to a Kerberos-secured cluster to use the tool to secure the Hadoop Distributed Filesystem (HDFS) .
For the initial setup of secure impersonation, execute the following commands:
# Check current available access rights hadoop fs -ls /user/datameer # Configure Datameer X Core Directories (aka Private Folder) ./bin/secure_hdfs_tool.sh -u -g <dasuser> hadoop fs -chown -R datameer:<dasuser> /user/datameer/.staging hadoop fs -chmod -R 770 /user/datameer/.staging # Check if changes are made correctly hadoop fs -ls /user/datameer |
Start the Datameer X service to do final testing.
Working within the current
installation directory, use the following commands:
# Start the Datameer X service ./bin/conductor.sh start # Check the process ID (PID) ps -ef | grep -i "java.*jetty.*datameer" | grep -v grep | tr -s " " | cut -d " " -f2 # Monitor the process booting and the log files cat logs/jvm-stdout.log; sleep 3; tail -F logs/`date +"%Y_%m_%d"`.stderrout.log logs/conductor.log |
Datameer X service depends on the MySQL database, it is used for writing to workbooks, permission changes, job execution, scheduling, and more. It is highly recommended to backup the application database frequently.
0 * * * * mysqldump -u'dap' -p'dap' dap | gzip > /home/datameer/<company>_<system>_<datameer-version>_`date +\%Y\%m\%d_\%H\%M`.sql.gz |
Don't leave the backup unattended for a long time. Monitor the directory /home/datameer for its size! |
# Check from time to tome how long the database dump will take and if it fits into the timeslot time mysqldump -u'dap' -p'dap' dap | gzip > /home/datameer/<company>_<system>_<datameer-version>_`date +\%Y\%m\%d_\%H\%M`.sql.gz # Verify from time to time if the files are OK gzip -d /home/datameer/company>_<system>_<datameer-version>_<date>_<time>.sql.gz head /home/datameer/<company>_<system>_<datameer-version>_<date>_<time>.sql |
Validate the content. Don't leave backup files on the application server. Move backup files from /home/datameer to a safe and secure remote location. |
Use a path that doesn't depend on a Datameer X installation directory. Because the das-data
folder is stored inside of your installation directory by default, you need to make a backup of your stored data every time you create a new distribution or upgrade.
Log in and change the default admin password following the instructions on managing user accounts.
If you are setting up Datameer X for production use, it is most likely in a Kerberos Secured environment. To use Kerberos, an additional plug-in is necessary. This Datameer X plug-in is part of the Advanced Governance module.
Download the Kerberos plug-in and install it.
# Look up for corresponding Kerberos plug-in version before curl -s -k -o plugin-kerberos-<version>.zip "https://download.datameer.com.s3.amazonaws.com/releases/Datameer-<version>/plug-ins_Advanced_Governance/plugin-kerberos-<version>.zip?<AWSproperties>" ; mv plugin-kerberos* etc/custom-plugins |
If you are an authorized Enterprise customer, you can request the download link from your Customer Success Manager (CSM).
current
directory. To address enterprise requirements, some changes need to be implemented. To avoid any mismatch in the configuration files or incompatibility with different versions, don't copy over configuration files from other versions. Make changes every time based on the originally delivered versions. |
# Create a backup of the original configuration file cp conf/default.properties conf/default.properties.original # Move the cache for workbook-previews and dfs sed -i "s/\(localfs.cache-root=\).*\$/\1\/var\/cache\/datameer/" conf/default.properties # Move the temp folder for local-execution sed -i "s/\(localfs.temporary-files=\).*\$/\1\/tmp\/datameer/" conf/default.properties # Provide REST API access by setting failed.login.attempts.max=0 sed -i "s/\(failed.login.attempts.max=\).*\$/\10/" conf/default.properties # Switch off tutorial bar sed -i "s/\(system.property.integratedTutorial.enabled=\).*\$/\1false/" conf/default.properties # Depending on your infrastructure and data set the timezone UTC sed -i "s/\(system.property.das.default-timezone=\).*\$/\1UTC/" conf/default.properties # Create a log of changes made diff -e conf/default.properties.original conf/default.properties > changes.default.properties # Do not expose stack traces to end users sed -i "s/\(verbose.error.reporting=\).*\$/\1false/" conf/default.properties |
# Create a backup of the original configuration file cp conf/live.properties conf/live.properties.original # Name the address and port used to connect to Datameer X UI # The value given will be used in email notification only EXT_DM_URL="<hostname>.<domain>.<tld>" sed -i "s/\(system.property.server.address=\).*\$/\1https:\/\/${EXT_DM_URL}:8443/" conf/live.properties # Comment out temporary file directory since changes were implemented in default sed -i '/localfs.temporary-files=/ s/^#*/# /' conf/live.properties # Create a log of changes made diff -e conf/live.properties.original conf/live.properties > changes.live.properties |
# Create a backup of the original configuration file cp conf/skin-default.properties conf/skin-default.properties.original # Make UI access faster sed -i '/#.*menu.show-welcome.visibility=/s/^#//' conf/skin-default.properties sed -i '/#.*dialog.welcome.visibility=/s/^#//' conf/skin-default.properties sed -i '/#.*page.home.visibility=/s/^#//' conf/skin-default.properties # Provide REST API access by setting force.license-agreement=false sed -i "s/\(force.license-agreement=\).*\$/\1false/" conf/skin-default.properties # Create a log of changes made diff -e conf/skin-default.properties.original conf/skin-default.properties > changes.skin-default.properties |
more changes.* cp changes.* /home/datameer |
history > /home/datameer/install_command.log cp ~/.bash_history /home/datameer |
Validate the changes made. Move files from /home/datameer to a safe and secure remote location. |
Before the next steps, consider reverse proxies or a load balancer to offload the SSL traffic or to use wild card certificates. In that case, you only need to configure rewrite handling. |
Enable TLS for use with Datameer X in production environments. As Datameer X is packed with Jetty 9, you only need to enable modules.
Enable TLS:
# Check default configuration java -jar start.jar --list-config | grep -i 'etc/jetty*' # Add SSL and HTTPS to the startup modules java -jar start.jar --add-to-start=ssl,https # Check final configuration java -jar start.jar --list-config |
Configure TLS for Embedded Jetty (for more security)
To change the HTTPS port follow the instructions under Configure TLS
All port changes should be made in the |
Use your own custom certificate
You can proceed further with Enabling SSL for MySQL service as well. |
Set up shell aliases for most common commands to make work easier, faster, and less error prone.
Working within the current
Datameer X installation directory, add the following aliases:
# Edit your profile file nano ~/.bash_profile |
# Add aliases alias dmpid='ps -ef | grep -i "java.*jetty.*datameer" | grep -v grep | tr -s " " | cut -d " " -f2' alias dmver='ps -ef | grep -i "java.*datameer" | grep -v grep' alias dmstart='./bin/conductor.sh start' alias dmstop='./bin/conductor.sh stop' alias dmcheck='./bin/conductor.sh check' alias dmkill="kill `dmpid`" alias dmpath='readlink `pwd`; pwd' alias dmdap='mysql -udap -pdap dap -Bse' alias dmsqlping='for ((i=1; i<=5; i++)); do time -p dmdap "START TRANSACTION; INSERT INTO test_entity2 (version) VALUES ('1'); UPDATE test_entity2 SET version = 2 WHERE version = 1; DELETE FROM test_entity2 WHERE version = 2; ROLLBACK;"; sleep 1; done' alias jettyconf='java -jar start.jar --list-config' alias classpath='yarn classpath | tr ":" ","' alias dminit="kinit datameer@<DOMAIN>.<TLD> -k -t /home/datameer/datameer.keytab" alias dmfs="hadoop fs -du -h /user/datameer" # Most important alias to set, since this will cover all three phases of Datameer's boot process alias dmlog='cat logs/jvm-stdout.log; sleep 3; tail -F logs/`date +"%Y_%m_%d"`.stderrout.log logs/conductor.log' |
# Load your profile file source ~/.bash_profile |
Usage: conductor.sh <command> <option>
Commands:
start
- Starts the conductorstop
- Stops the conductorrestart
- Restarts the conductorcheck
- Checks if the conductor is already runningOptions:
--injectExamples
- Injects example import jobs and workbooks on start-up. (This option only works the first time when starting Datameer)--resetPassword
- Resets the admin password to default value.--jobschedulerPaused
- Previously scheduled jobs are paused until re-enabling the job scheduler. --jmx
- Starts JMX management extension for managing and monitoring DAS.--profile
- Runs conductor with attached profiling agent.--profile-sampling
- Runs conductor with cpu profiling (sampling) activated.--profile-tracing
- Runs conductor with cpu profiling (tracing) activated.--profile-memory
- Runs conductor with cpu and memory profiling (sampling) activated.--help
- Opens the help dialog.Examples: