Info | ||
---|---|---|
| ||
Secure File Transfer Protocol (SFTP) is a network protocol that provides file access, file transfer, and file management over any reliable data stream. |
Table of Contents |
---|
Custom Property Recommendation
Tip | ||
---|---|---|
| ||
Ingesting many files via the SFTP connection will open and close SSH connections to the server for each file. These connection attempts might be refused with many parallel tasks running, based on the SSH server settings. For that we recommend to set the following custom property:
This allows using the Hadoop FileServer cache on the cluster; with the FileServer cache enabled for SCP/ SFTP scheme, there will be a single SSH connection for each 'das.cluster.hadoop.fs.cache.scheme.whitelist=scp,sftp'. |
Prerequisites
Set up the Authentication Mechanism
Info | ||
---|---|---|
| ||
/wiki/spaces/DASSB110/pages/20221232642 Set up the required authentication mode for SSH/SFTP within the 'ssh_config' and 'sushi_config' in your environment. |
...
- Click the + (plus) button and select Connection or right-click in the browser and select Create new > Connection.
- From drop-down list, select SFTP as the connection type.
- Enter the SFTP host name(s), the port number, authentication credentials, ssh key (if needed), and the root path prefix.
Select if the connection to be used for import, export, or both.
- If required, add a description and click Save.
...
- Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
- Click Select Connection and choose the name of your SFTP connection (here - SFTP_Connector).
Choose what type of file to import and then click Next.
Enter the path to the file or folder you want to access and set the delimiter.
Note As of Datameer X 7.4
Datameer X has added the Remote Data Browser feature. This file browser gives you a visual interface to select the file from your SFTP connection for which to import.
The filter box at the top of the Remote Data Browser can be used to find folders/files within the current directory.
- A preview of the imported data is displayed. Review the schema.
- Review the schedule, data retention, and advanced properties for the job.
- Add a description, click Save, and name the file.