Exporting Data

Exporting Data

You can create export jobs to export data from Datameer to other connections such as a database, remote file server, or export results to a third-party BI (business intelligence) software package. Exported data is the result from executing analyses in a workbook on the original data set. You can initiate a one-time manual export, or you can configure the job to run each time the workbook is updated or at a specific time interval.

It is also possible to download data from Datameer without using an export job.

Introduction

Before you can export data, you must create a workbook. Once the workbook has executed, you can export data from one of the saved sheets in that workbook.

Exporting data is only available in Datameer's Enterprise product.

File Formats Available

Select to export a worksheet to one of the following file formats:

  • Apache AVRO

  • CSV

  • Parquet

  • Tableau (TDSX)

  • Tableau (TDE) 

Exporting Data

Only the data from a saved workbook sheet is available for export.

To export data:

  1. Select Export Data from the + (plus) drop down menu located in the top left corner of the screen or right-click in the navigation bar are the left side of the screen and select Create New > Export Data.

  2. Select a workbook by clicking Select then click Next.

  3. Select which sheet to export, then click Next. At the bottom of the page you see a preview of the a sheet's data.

  4. Select which connection the data should be exported to, either an existing connection or create a new one. Select to export the data as a CSV file, an AVRO file, a Parquet file, or Tableau (TDSX) file. Then click Next

  5. Enter the information on the Data Details tab which varies depending on the type of file you are creating. Click Next.

    1. Under the File settings, enter the file name and file path. Variable placeholders can be used instead of static values. Exports with the same file name will be overwritten.


      Under Advanced Settings, enter the maximum file size and how consecutive files should be numbered. All export jobs add a task number to the end of the filenames (before the extension) generated by the export job.

    2. There are additional settings if the file is being exported as a CSV. Select whether column headers should be contained in the first row, and how delimiter, quote, and escape characters should be defined.
      If you are exporting a float value to a .CSV, an additional trailing zero (.0) is added following the original value. Leading zeros (0.) are removed if the integer part of the float value is larger than zero.

      If you export to a database such as MySQL, you must select a table and select  either Use Existing Table to overwrite the database table or Create new if it doesn't exist to add the table. If you use an existing table with a large amount of previous data, the overall export might be slow as it first needs to delete the old data. However, if you create a new table, delete statements aren't used and performance might be improved. There is a hard limit of 1,000 entries.

      Exporting with SFTP

      An additional requirement for exporting with SFTP is that the user creating the export job must have permissions to create a folder (mkdir) on the SFTP server. This permission is required even when the folder on the SFTP server already exists.

      Exporting to a database

      When exporting to a database, the database connector is associated with a user name. This database user must have write and/or create permissions on the database itself in order to complete the export job.

      When exporting data to DB2 or to an Oracle connection, there are limitations on the number of characters contained in string data fields.

      • for DB2, the number of characters is limited to 2000.

      • for Oracle, the number of characters is limited to 4000.

      Attempting to export data fields containing a larger number of characters results in dropped records. 

      Record replacement policy

      If the Use existing option is selected Datameer doesn't create a primary key column with name "ID". (i.e., The workbook must be aware of the ID column. All columns from the destination table (can't be null) must be mapped to a column of the workbook.)

      Append Mode

      • The records append to the existing table.

      Overwrite Mode

      • All records from the existing table are deleted.

      • New records are written to the destination table.

      If the Create new if not exist option is selected Datameer creates an additional column with name "ID" (primary key). When the database supports auto increment, the database computes the value of this column.

      Example: Netezza doesn't support auto increment. In this case Datameer uses a sequence database object to generate the ID. Specific for Netezza: the name of the sequence used is <tablename>_seq.

      Append Mode

      • The table is created if it doesn't already exists, otherwise the existing table is re-used without changing any schema.

      • The records are appended to the existing table.

      Overwrite Mode

      • Any previous tmp table is first dropped.

      • A tmp table is created.

      • The records are written to the tmp table.

      • The entire destination table is dropped if one exists.

      • The tmp table is renamed to the destination table.

      • The tmp table is dropped.

      Advanced settings

      • Setting the Maximum number of concurrent data base connections gives control of database parallelism. The export job can't launch more than the set number of parallel tasks. Each task opens one database connection.

      • Setting the Maximum records per transaction exports approximately this number of records within a single database transaction.

      • Setting the Rows per batch establishs a limit for the number of rows inserted for each transaction within a task. Tuning this value can influence speed and avoid potential timeouts. Very large batches can overwhelm some databases and limiting the number of rows returned at a time can reduce strain.

  6. Select which fields to include. Clear any fields that should be excluded. Select whether empty fields can be accepted. 


    By default, the column mapping of the export job is based on the columns of the current worksheet being exported. When editing and/or overwriting an export job, it is possible for the column mapping to be based on the the previous export job mapping instead of the current worksheet column mapping. This can be configured in the default.properties file.

    default.properties

    # Enables the workbook columns to synchronize with export wizard mapping. # If enabled, the export wizard displays the workbook columns on the mapping step. # Otherwise, the saved mapping from the database is used on the export wizard's mapping step. export.file.inherit-workbook-column-names.enabled=true export.db.inherit-workbook-column-names.enabled=true export.other.inherit-workbook-column-names.enabled=true
  7. Select how to handle invalid records, and click Next.

  8. Select the schedule for exporting the data. Select Manually for a non-recurrent export or you can select either After the workbook is calculated or On a Schedule. If you select On a Schedule, indicate the time settings or use a custom cron pattern. Under Advanced Settings, you can enter custom properties as key/value pairs. Then click Next.

  9. Enter a description if desired, click Save As. Give the file a name, then click Save. You can also enter an email address to receive any error messages.