10.0 New and Noteworthy

Setup and Administration Updates

Open Data Format

The 'External Systems Plug-in' provides the functionality to synchronize workbooks in Hive. Data duplication from Datameer X to Hive is avoided and Hive creates external tables, that point back to the workbooks, which are managed and synchronized by Datameer. This reduces the amount of cluster resources used to just what is required for the workbook and import jobs to run.

Support Engineer Plugin - System & Usage Report

The 'System & Usage Report' now contains the customer`s meta data information which is needed in support cases and provides a higher level overview. Therefore all technical environment information can be reported. Only the administrator can view the system report as well as download the report files individually.

Support for Hortonworks HDP 3.1.0

Datameer X now supports Hortonworks HDP 3.1.0 for Datameer X versions 7.4 and higher.

Improved UI for Admin Variables

Datameer X improved the UI for the Admin Variables.

Configuring Network Proxy for S3 Native

S3 proxy settings now apply for S3 native connections.

Import Functionality Updates

Import Storage Format

Datameer X now uses Parquet as its default storage format. So all import gets converted into the Parquet file format.

Working with Workbooks Updates

Improved Workbook Saving

When creating a workbook from anywhere in Datameer X it gets saved in your home folder with a default name.

You can rename and move an existing workbook within your file browser. The content remains the same while only the name and saving location changes.

You can duplicate a workbook within your file browser. The content of the original workbook remains the same while you can save and edit the content of the duplicate.

Closing New Workbook with Content

Closing an opened workbook with content, which has not been renamed or moved for the first time, opens the 'Rename and Move' dialog to save the workbook. Otherwise the workbook is named 'Workbook' as default.

Closing an open workbook with content, being in 'File Browser' or another workbook, the 'Rename and Move' dialog appears.

Closing New Workbook without Content

Closing an empty workbook, which is saved under default name and location, the workbook is deleted automatically.

Closing an empty workbook, which is saved under default name and location, being in 'File Browser' or another workbook, the empty workbook is deleted automatically.

Workbook Inspector

The workbook information is now in the Sheet Inspector, providing the import job path as well as the reference workbook.

You can now navigate to an external workbook or external data source from the current workbook which contains the external sheet type. Clicking on the external sheet tab opens the external workbook with the correct preselected sheet or the external data source. Furthermore clicking on the external data source link opens the external workbook with the correct preselected sheet.

Clicking on external sheets or import jobs within the sheet dependencies graph, the target workbook will not open if there's only full data access. The workbook opens in another tab. The target sheet is the one that's in the target workbook tab. If the reference is an import job, it is opened.

Quick Column Sorting

You can now sort the columns of a worksheet directly in the worksheet. Sorting is reflected in the "Sort Dialog". You can also edit and remove the sorting criteria.

Workbook Production Mode

The Workbook Production Mode accelerates performance by excluding sample generation and column statistics production to reduce resource usage. Production mode is intended for data pipelines that have been tested and refined to the point that a workbook can be scheduled and run without further modification.

Improved Search Function

You can search for paths or folders in the file browser:

  • Upper-case and lower-case letters or digits don't matter for search, a leading "/" doesn't matter.
  • "*" or "?" still work as wildcards.
  • All other search options work like before. 
  • A more complex work with "AND" works, e.g. ""apache* AND path:admin AND owner:system"
  • For searching for a path put "path" infront of the search, e.g. "path:/Admin/Workbooks"
  • Searching for a path is just a multiword search like "word1/word2/word3". 

Enabeled Formula List in Formula Sheet

If the current sheet is a formula sheet, the formulas are shown in the sheet inspector. Therefore the list of formulas with column name and formula pairs is displayed.

Export Functionality Updates

Support for Export to Tableau Hyper Format

Datameer X now supports exports to Tableau version 10.5 using the Tableau Hyper format.


Google Cloud Connectivity

Google Cloud Data Proc as Execution Engine

Google Cloud DataProc v1.2 supports all Datameer X jobs (e.g. workbooks, import, export, system jobs) to be executed within Google DataProc. All job results are stored in HDFS. With this feature you can run Datameer X on their existing Google Cloud offering. This avoids maintaining a second storage account and transferring large datasets from Google into other cloud solutions (e.g. Microsoft Azure).

Google Cloud Storage Import & Export

Datameer X now supports import and export from Google Cloud Storage.

Therefore we provide the plugin "plugin-gcs" which contains the connector to Datameer. With the Google Cloud Storage import connector you can ingest data from Google Cloud Storage and enable joining and aggregation or data moving. With the Google Cloud Storage export connector you can expose data from Datameer X into Google Cloud Storage and continue on working of prepared, cleaned and aggregated data with other 3rd party systems.

Google Cloud Storage Private Folder

You can now build and set up Datameer X with Google Cloud Storage as a Private Folder. Datameer's job results can be stored in a Google Cloud Storage Private Folder.

Performance Improvement

Join Operation Computes Parallel Inputs

A join of two data links now computes inputs in parallel to prevent the creation of too many DAG(s) on the cluster, and the Map-Side join picks the smallest of the two inputs to cache in memory.