Configuring HDFS as a Connection
To import from HDFS, you must first create a connection.
- Click the + (plus) button and select Connection or right-click in the browser and select Create new > Connection.
- Click on the drop-down list and choose HDFS as the connection type.
- Click Next.
- Enter the host address for the HDFS server and the port number.
- Add the root path prefix of the directory from which to import data.
- Add needed custom properties when importing from HDFS.
- Add a description and click S ave.
- Give your HDFS connection a name and click S ave.
HDFS High Availability
HDFS has added high-availability capabilities allowing the main metadata server (the NameNode) to be failed over manually to a backup in the event of failure.
Follow the setup instructions above to set up the HDFS HA (High Availability) feature on the connector.
- Use a logical namenode name in the hostname. Keep the port field empty.
- Add the custom properties:
Code Block |
---|
dfs.nameservices=nnmain dfs.ha.namenodes.nnmain=nn0,nn1 dfs.namenode.rpc-address.nnmain.nn0=ip-<host-name>:<port> dfs.namenode.rpc-address.nnmain.nn1=ip-<host-name>:<port> dfs.client.failover.proxy.provider.nnmain=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider |
Importing Data with the HDFS Connector
After configuring an HDFS connection the wizard you can set up one or more import jobs which access the HDFS connection.
- Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
- Click Select Connection and choose the name of your HDFS connection (here - HDFS_Connector).
Enter the path for a file or folder on the HDFS.
Note As of Datameer 7.4
Datameer has added the Remote Data Browser feature. This file browser gives you a visual interface to select the file from your HDFS for which to import.
The filter box at the top of the Remote Data Browser can be used to find folders/files within the current directory.
- A preview of the imported data is displayed. Review the schema.
- Review the schedule, data retention, and advanced properties for the job.
- Add a description, click Save, and name the file.