Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleINFO

Datameer X can use Google's Dataproc as execution engine if it is deployed in Google Cloud Platform against a Google Cloud Platform 1.4 cluster.

Info
titleINFO

Find Google`s documentation here.

Table of Contents

Configure Datameer X to Use the Google Dataproc Cluster

Info
titleINFO

We recommend to install Datameer X on a separate machine. This machine can be on-premises or also within the Google cloud. Use machines in the Google cloud within the same network performance wise.

Datameer X and the Google Dataproc cluster must be in the same VPC and subnet.

Requirements for the configuration of the Google Dataproc cluster are:

To configure Datameer :X

  1. Open the "Admin" tab and select "Hadoop Cluster"
    INFO: You have to be logged in as an administrator.
    Image RemovedImage Added 
  2. Select "Edit" at the bottom. The setting page opens.
    Image RemovedImage Added 
  3. Select "Google Cloud DataProc" from the drop-down 'Mode'. 
    INFO: For Kerberos security select "Google Cloud Dataproc (secured)". 
    Image RemovedImage Added 
  4. Enter "gs://<bucket-name>" in the field 'Name Node'. 
    Image RemovedImage Added 
  5. Enter the path for the Datameer X 'Private Folder'. 
    INFO: Entering a path leads Datameer X putting the private information in Hadoop's file system.
    INFO: Leaving the field empty, Datameer X puts the information in the user's home folder. 
    Image RemovedImage Added 
  6. If needed, mark the checkbox to enable impersonation.
    Image RemovedImage Added 
  7. If needed, mark the checkbox to enable HDFS transparent encryption. 
    INFO: If a Google Cloud Storage bucket is used as private folder, it is not recommended to mark the checkbox. 
    INFO: If it is required to use transparent encryption, then the Google Cloud Storage bucket has to be set up with 'encryption managed by Google'.
    Image RemovedImage Added 
  8. Mark the checkbox 'Use Google IAM Role' if Datameer X is deployed on an IAM enabled Google Cloud instance. 
    INFO: When the checkbox is not marked, a Google Service Account JSON key is used instead. Choose the required file by clicking on "Choose File".  
    Image RemovedImage Added 
  9. Enter "<clustername>-m:8032" in the field 'YARN Resource Manager Address'.  
    Image RemovedImage Added 
  10. Enter "http://<clustername>-m:8088" in the field 'YARN Resource Manager Webapp Address'.
    Image RemovedImage Added 
  11. Enter '<clustername>-m:8030" in the field 'YARN Resource Manager Scheduler Address'.
    Image RemovedImage Added 
  12. If needed, change the YARN application class path according to your yarn-site.xml or execute the 'yarn classpath' on the command line. 
    Image RemovedImage Added 
  13. Set the number of simultaneous jobs that may be run simultaneously.
    Image RemovedImage Added 
  14. View the prefilled properties in section 'Hadoop Properties'.
    INFO: If you want to connect to a secured cluster (Kerberos), you need to uncomment the last line with 'Hadoop.rpc.protection=privacy' in field 'Hadoop Specific Properties'.  
    Image RemovedImage Added 
  15. If needed, change the entries in the section 'Logging Settings'.
    Image RemovedImage Added 
  16. Confirm the settings with "Save"The settings are saved. The 'File Browser' opens. Configuring Datameer X to use the Google Dataproc cluster is finished.