SFTP

Secure File Transfer Protocol (SFTP) is a network protocol that provides file accessfile transfer, and file management over any reliable data stream.

Custom Property Recommendation

TIP

Ingesting many files via the SFTP connection will open and close SSH connections to the server for each file. These connection attempts might be refused with many parallel tasks running, based on the SSH server settings. For that we recommend to set the following custom property:

'das.cluster.hadoop.fs.cache.scheme.whitelist=scp,sftp'

This allows using the Hadoop FileServer cache on the cluster; with the FileServer cache enabled for SCP/ SFTP scheme, there will be a single SSH connection for each 'das.cluster.hadoop.fs.cache.scheme.whitelist=scp,sftp'.

Configuring SFTP as a Connection

In order to import and export with SFTP, you must first create a connection.

  1. Click the + (plus) button and select  Connection or right-click in the browser and select Create new > Connection.
  2. From drop-down list, select SFTP as the connection type.
  3. Enter the SFTP host name(s), the port number, authentication credentials, ssh key (if needed), and the root path prefix.
    Select if the connection to be used for import, export, or both.
  4. If required, add a description and click Save.

Importing Data with a SFTP Connector

After configuring an SFTP connection you can set up an import job to access that connection.

  1. Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
  2. Click Select Connection and choose the name of your SFTP connection (here - SFTP_Connector).
    Choose what type of file to import and then click Next.
  3. Enter the path to the file or folder you want to access and set the delimiter.

    As of Datameer X 7.4

    Datameer X has added the Remote Data Browser feature. This file browser gives you a visual interface to select the file from your SFTP connection for which to import.

    The filter box at the top of the Remote Data Browser can be used to find folders/files within the current directory.

  4.  A preview of the imported data is displayed. Review the schema.
  5. Review the schedule, data retention, and advanced properties for the job.
  6. Add a description, click Save, and name the file.