Set up Datameer on Google Dataproc
INFO
Datameer X can use Google's Dataproc as execution engine if it is deployed in Google Cloud Platform against a Google Cloud Platform 1.4 cluster.
INFO
Find Google`s documentation here.
Configure Datameer X to Use the Google Dataproc Cluster
INFO
We recommend to install Datameer X on a separate machine. This machine can be on-premises or also within the Google cloud. Use machines in the Google cloud within the same network performance wise.
Datameer X and the Google Dataproc cluster must be in the same VPC and subnet.
Requirements for the configuration of the Google Dataproc cluster are:
To configure Datameer X
- Open the "Admin" tab and select "Hadoop Cluster".Â
INFO: You have to be logged in as an administrator.
 - Select "Edit" at the bottom. The setting page opens.
 - Select "Google Cloud DataProc" from the drop-down 'Mode'.Â
INFO: For Kerberos security select "Google Cloud Dataproc (secured)".Â
 - Enter "gs://<bucket-name>" in the field 'Name Node'.Â
 - Enter the path for the Datameer X 'Private Folder'.Â
INFO: Entering a path leads Datameer X putting the private information in Hadoop's file system.
INFO: Leaving the field empty, Datameer X puts the information in the user's home folder.Â
 - If needed, mark the checkbox to enable impersonation.
 - If needed, mark the checkbox to enable HDFS transparent encryption.Â
INFO: If a Google Cloud Storage bucket is used as private folder, it is not recommended to mark the checkbox.Â
INFO: If it is required to use transparent encryption, then the Google Cloud Storage bucket has to be set up with 'encryption managed by Google'.
 - Mark the checkbox 'Use Google IAM Role' if Datameer X is deployed on an IAM enabled Google Cloud instance.Â
INFO: When the checkbox is not marked, a Google Service Account JSON key is used instead. Choose the required file by clicking on "Choose File". Â
 - Enter "<clustername>-m:8032" in the field 'YARN Resource Manager Address'. Â
 - Enter "http://<clustername>-m:8088" in the field 'YARN Resource Manager Webapp Address'.
 - Enter '<clustername>-m:8030" in the field 'YARN Resource Manager Scheduler Address'.
 - If needed, change the YARN application class path according to your yarn-site.xml or execute the 'yarn classpath' on the command line.Â
 - Set the number of simultaneous jobs that may be run simultaneously.
 - View the prefilled properties in section 'Hadoop Properties'.
INFO: If you want to connect to a secured cluster (Kerberos), you need to uncomment the last line with 'Hadoop.rpc.protection=privacy' in field 'Hadoop Specific Properties'. Â
 - If needed, change the entries in the section 'Logging Settings'.
 - Confirm the settings with "Save". The settings are saved. The 'File Browser' opens. Configuring Datameer X to use the Google Dataproc cluster is finished.Â