Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Section

Open Data Format

The 'External Systems Plug-in' provides the functionality to synchronize workbooks in Hive. Data duplication from Datameer X to Hive is avoided and Hive creates external tables, that point back to the workbooks, which are managed and synchronized by Datameer. This reduces the amount of cluster resources used to just what is required for the workbook and import jobs to run.

...

Section

Support for Hortonworks HDP 3.1.0

Datameer X now supports Hortonworks HDP 3.1.0 for Datameer X versions 7.4 and higher.

Section

Improved UI for Admin Variables

Datameer X improved the UI for the Admin Variables.

Section

Configuring Network Proxy for S3 Native

S3 proxy settings now apply for S3 native connections.

...

Section

Import Storage Format

Datameer X now uses Parquet as its default storage format. So all import gets converted into the Parquet file format.

...

Section

Improved Workbook Saving

When creating a workbook from anywhere in Datameer , X it gets saved in your home folder with a default name.

You can rename and move an existing workbook within your file browser. The content remains the same while only the name and saving location changes.

You can duplicate a workbook within your file browser. The content of the original workbook remains the same while you can save and edit the content of the duplicate.

Closing New Workbook with Content

Closing an opened workbook with content, which has not been renamed or moved for the first time, opens the 'Rename and Move' dialog to save the workbook. Otherwise the workbook is named 'Workbook' as default.

Closing an open workbook with content, being in 'File Browser' or another workbook, the 'Rename and Move' dialog appears.

Closing New Workbook without Content

Closing an empty workbook, which is saved under default name and location, the workbook is deleted automatically.

Closing an empty workbook, which is saved under default name and location, being in 'File Browser' or another workbook, the empty workbook is deleted automatically.

...

Section

Quick Column Sorting

You can now sort the columns of a worksheet directly in the worksheet. Sorting is reflected in the "Sort Dialog". You can also edit and remove the sorting criteria.

Workbook Production Mode

The Workbook Production Mode accelerates performance by excluding sample generation and column statistics production to reduce resource usage. Production mode is intended for data pipelines that have been tested and refined to the point that a workbook can be scheduled and run without further modification.

...

Section

Support for Export to Tableau Hyper Format

Datameer X now supports exports to Tableau version 10.5 using the Tableau Hyper format.

...

Google Cloud Connectivity

Section

Google Cloud Data Proc as Execution Engine

Google Cloud DataProc v1.2 supports all Datameer X jobs (e.g. workbooks, import, export, system jobs) to be executed within Google DataProc. All job results are stored in HDFS. With this feature you can run Datameer X on their existing Google Cloud offering. This avoids maintaining a second storage account and transferring large datasets from Google into other cloud solutions (e.g. Microsoft Azure).

Google Cloud Storage Import & Export

Datameer X now supports import and export from Google Cloud Storage.

Therefore we provide the plugin "plugin-gcs" which contains the connector to Datameer. With the Google Cloud Storage import connector you can ingest data from Google Cloud Storage and enable joining and aggregation or data moving. With the Google Cloud Storage export connector you can expose data from Datameer X into Google Cloud Storage and continue on working of prepared, cleaned and aggregated data with other 3rd party systems.

Google Cloud Storage Private Folder

You can now build and set up Datameer X with Google Cloud Storage as a Private Folder. Datameer's job results can be stored in a Google Cloud Storage Private Folder.

Performance Improvement

Section

Join Operation Computes Parallel Inputs

A join of two data links now computes inputs in parallel to prevent the creation of too many DAG(s) on the cluster, and the Map-Side join picks the smallest of the two inputs to cache in memory.

...