Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Configuring HDFS as a Connection

To import from HDFS, you must first create a connection.

  1. Click the + (plus) button and select  Connection or right-click in the browser and select Create new > Connection.
  2. Click on the drop-down list and choose HDFS as the connection type.
  3. Click Next.
  4. Enter the host address for the HDFS server and the port number.  
  5. Add the root path prefix of the directory from which to import data.
  6. Add needed custom properties when importing from HDFS.
     
  7. Add a description and click Save.  
  8. Give your HDFS connection a name and click Save

HDFS High Availability

HDFS has added high-availability capabilities allowing the main metadata server (the NameNode) to be failed over manually to a backup in the event of failure.

Follow the setup instructions above to set up the HDFS HA (High Availability) feature on the connector.

  1. Use a logical namenode name in the hostname. Keep the port field empty.
  2. Add the custom properties:
dfs.nameservices=nnmain
dfs.ha.namenodes.nnmain=nn0,nn1
dfs.namenode.rpc-address.nnmain.nn0=ip-<host-name>:<port>
dfs.namenode.rpc-address.nnmain.nn1=ip-<host-name>:<port>
dfs.client.failover.proxy.provider.nnmain=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 

Importing Data with the HDFS Connector

After configuring an HDFS connection the wizard you can set up one or more import jobs which access the HDFS connection.

  1. Click the + (plus) button and select Import Job or right-click in the browser and select Create new > Import job.
  2. Click Select Connection and choose the name of your HDFS connection (here - HDFS_Connector).
  3. Enter the path for a file or folder on the HDFS.
  4.  A preview of the imported data is displayed. Review the schema.
  5. Review the schedule, data retention, and advanced properties for the job.
  6. Add a description, click Save, and name the file.
  • No labels