S3


Prerequisites

INFO

Minimum access level required for importing data from an S3 bucket to Datameer.

The Account/ IAM Role one uses in the S3 connection should be allowed to perform the following actions:

  • GetObject

  • GetObjectAttributes

  • ListBucket (should be applied for an entire bucket but could be restricted to a specific directory(s) via the Condition key - "Condition": { "StringLike": {"s3:prefix": "<path>/*" }} )

For an edge case suggestion, please check the community article Bucket policy for S3 Import in multi-tenant Datameer instance.

Usable Custom Properties

Disabling the IAM Role Authentication Option

INFO

Disabling the IAM role mode allows more strict access control to S3 buckets requiring the IAM with assumed role mode to be used instead. In IAM with assumed role data access will happen under an assumed AWS IAM role explicitely defined in the S3 connection.

To disable the IAM role option enter the following custom property on the Admin's page within the 'Custom Properties' settings:

das.s3.iam.without.assumed.role.enabled=true

Configuring S3 as a Connection

In order to import and export from S3, you must first create a connection.

  1. Click the + (plus) button and select Connection or right-click in the browser and select Create new > Connection.
  2. From drop-down list, select S3  as the connection type. Click Next.
  3. Add your S3 bucket name and select the authentication mode: 
    'Access Key and Secret Code pair' or 'IAM role' or 'IAM assumed role'. Datameer X supports import/export from/to S3 encrypted buckets. 
    INFO: Selecting 'IAM role' will use the IAM role associated with the EC2 instance or the EMR cluster. Selecting the  'IAM assumed role' will use the IAM role associated with the EC2 instance or the EMR cluster to assume a specific role.

    Add the root path prefix, if necessary.

    Indicate if the connection should be used for import, export, or both.

    For the encryption support, select between Amazon Web Services, AES256 and KMS.

  4. Click Next.
  5. If required, add a description and click Save.

Importing Data with a S3 Connector

This connector isn't able to import from S3 buckets without access to read the metadata over the  getObjectMetadata() method.

After configuring a connection to S3, you can set up an import job to access the connection.

  1. Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
  2. Click Select Connection and choose the name of your S3 connection (here - S3_Connection) then click Next.
  3. Enter the file or folder path and the delimiter.
    Click Next.

    As of Datameer X 7.4

    Datameer X has added the Remote Data Browser feature. This file browser gives you a visual interface to select the file from your S3 bucket for which to import.

    The filter box at the top of the Remote Data Browser can be used to find folders/files within the current directory.

  4.  A preview of the imported data is displayed. Review the schema.
  5. Review the schedule, data retention, and advanced properties for the job.
  6. Add a description, click Save, and name the file.