Importing from a Cloud Storage

INFO

Find all information and how-to import from a cloud storage here.

Supported Cloud Storages

Datameer X allows to import from:

  • Azure Data Lake Gen 2
  • Google Cloud Storage

Importing from a Cloud Storage

Azure Data Lake Gen 2

To import data from Azure Data Lake Storage Gen 2:

  1. Click the "+" button and choose "Import Job" or right-click in the file browser and select "Create New" → "Import Job". The 'New Import Job' tab appears in the menu bar.
     
  2. Click "Select Connection". The dialog 'Select Connection' opens.
     
  3. Click on the connection for Azure Data Lake gen 2 and confirm with "Select". The connection is displayed.
     
  4. Select the required file type from the drop-down "File Type" and confirm with "Next".
     
  5. Enter the file or folder name as it is named in your storage. 
      
  6. Define the delimiter character the schema and the column names. 
    INFO: The default value for delimiter is ','.
     
  7. Select the schema of the imported data. 
  8. If needed, uncheck the check-box to not include the column names in the first row.
    INFO: The check-box is marked per default. The column names are contained in the first row.
     
  9. If you want to filter by data and time select the filter method from the drop-down.

    INFO: Select the start date and end date from the calendar for the filter mode 'Fixed dates'. 

    INFO: Enter a 'das' expression as the start and end expression, e.g. 'TODAY()-4d for the filter mode 'Dynamic dates'.
  10. If needed, exclude data by the file modification day, enter the amount of days. 
  11. If needed, modify the advanced settings, e.g. the character encoding and confirm with "Next". The tab 'Data Fields' opens.
     
  12. Confirm with "Next". The tab 'Define Fields' opens.
     
  13. Mark all required columns.
     
  14. If needed, enter placeholder values and confirm with "Apply".
  15. Decide how to handle invalid data. 
     
  16. Decide whether you want partition data and confirm with "Next". The tab 'Schedule' opens.
    INFO: If you have checked 'Partition Data', enter a date expression and select the data format from the drop-down.
     
  17. Decide whether the import shall be triggered manually or on a schedule.
     
  18. Select the option for data retention.   
     
  19. If needed, enter the amount of sample records and the maximum amount of errors to log and confirm with "Next". The tab 'Schedule' opens.
    INFO: Higher values lead to more precise preview results but can rapidly decrease the performance. 
     
  20. If needed, enter an import job description.
  21. Demark the checkbox if the import shall not start immediately after the saving. 
    INFO: The check-box is marked per default to start the import right after saving the import job. 
     
  22. If needed, enter the email address for several notifications and confirm with "Next". The 'Save Import Job' dialog opens.
     
  23. Select the path the data shall be imported to, enter a name and confirm with "Save". Data Import from Azure Data Lake Storage gen 2 is finished.Â