Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleINFO

Secure File Transfer Protocol (SFTP) is a network protocol that provides file accessfile transfer, and file management over any reliable data stream.

Table of Contents

Custom Property Recommendation

Tip
titleTIP

Ingesting many files via the SFTP connection will open and close SSH connections to the server for each file. These connection attempts might be refused with many parallel tasks running, based on the SSH server settings. For that we recommend to set the following custom property:

No Format
'das.cluster.hadoop.fs.cache.scheme.whitelist=scp,sftp'

This allows using the Hadoop FileServer cache on the cluster; with the FileServer cache enabled for SCP/ SFTP scheme, there will be a single SSH connection for each 'das.cluster.hadoop.fs.cache.scheme.whitelist=scp,sftp'.

Prerequisites

Set up the Authentication Mechanism 

Info
titleINFO

/wiki/spaces/DASSB110/pages/20221232642 Set up the required authentication mode for SSH/SFTP within the 'ssh_config' and 'sushi_config' in your environment.

...

  1. Click the + (plus) button and select  Connection or right-click in the browser and select Create new > Connection.
  2. From drop-down list, select SFTP as the connection type.
  3. Enter the SFTP host name(s), the port number, authentication credentials, ssh key (if needed), and the root path prefix.
    Select if the connection to be used for import, export, or both.
  4. If required, add a description and click Save.

...

  1. Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
  2. Click Select Connection and choose the name of your SFTP connection (here - SFTP_Connector).
    Choose what type of file to import and then click Next.
  3. Enter the path to the file or folder you want to access and set the delimiter.

    Note

    As of Datameer X 7.4

    Datameer X has added the Remote Data Browser feature. This file browser gives you a visual interface to select the file from your SFTP connection for which to import.

    The filter box at the top of the Remote Data Browser can be used to find folders/files within the current directory.

  4.  A preview of the imported data is displayed. Review the schema.
  5. Review the schedule, data retention, and advanced properties for the job.
  6. Add a description, click Save, and name the file.