Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Datameer 7.5 includes several updates to UI performance, the Import Job Wizard, the Hive Data Link Wizard, and behavior of partitioned and non-partitioned Hive tables. Performance improvements include reduced execution time and new Production Mode. See HiveServer2Exporting to HiveServer1 and HiveServer2, and Hive for related documentation.

As of Datameer 7.5, Datameer Classic data field type mapping is removed and only Hive specific mapping is performed, where Datameer BigDecimal maps to Hive Decimal and Datameer Date maps to Hive TimeStamp.

Note

Note that Datameer support for Hive Metastore (Hive Server1) is being deprecated in favor of HiveServer2, effective in a future Datameer minor release.

It is strongly recommended that the appropriate configuration changes be introduced as soon as possible. 

Exporting into non-partitioned Hive tables

...

Datameer is able to compute the partition location while importing data from a partitioned Hive table without needing to fetch it from the HiveServer 2. This improves performance if the default partition location pattern is used. The setting can be made globally within the plugin configuration page, and the default value can be overwritten with the Import Job or Data Link wizard.

Image RemovedImage Added


Merging fixes and improvements from version 7.5 down to 7.4

...

You can now access the menu command "New Data Link" from the browser connection context menu.

Image RemovedImage Added

Save & Migrate Added for Data Link and Import Job

...

Datameer supports exports to Tableau 10.5 or higher using the Tableau Hyper format.

Workbook Functionality Updates and Extensions

Pivot Table for Exploration and Preparation

...

As of release 7.5, Datameer supports case-sensitive column names so that, for example, Foo, foo, FoO, and fOO (etc.) will all be legal column names within a workbook or sheet.

Data Science Encoding for Workbook Columns

Datameer 7.5 includes the option to perform ordinal, 1-hot or binned encoding on column data. Once applied, encoding can be updated as needed until the desired results are achieved.

Workbook Production Mode

Workbook production mode accelerates performance by excluding sample generation and column statistics production to reduce resource usage.   Production mode is intended for data pipelines that have been tested and refined to the point that a workbook can be scheduled and run without further modification.

...

Datameer 7.5 includes the option to split workbook string and list type column content by a selected delimiter.

Image RemovedImage Added

Column Encoding

Column encoding provides the option to perform ordinal, 1-hot or binned encoding on column data, which assigns a unique numeric value to each categorical or continuous value. Once applied, encoding can be updated as needed until the desired results are achieved. Using Datameer to perform encoding provides a consistent view of prepared data, which is especially helpful for teams working together on model building and testing activities.

Workbook Data Access Permissions

You can use the File Browser to set permissions for import, export jobs, data links, and connections, and to set workbook permissions. You can also use related custom properties to determine whether or not reading workbook data requires permissions on the upstream data. Documentation in on the Behavior Change page under "Setting Access Permission Evaluation for Workbooks." 

Datameer Configuration

Datameer Global and Workbook Variables

...

Azure Data Lake Storage (ADLS) Gen1 as Private Data Store

Datameer can now use Azure Data Lake Storage (gen1) to store results.

Support for Amazon Corretto Open JDK

...

A new property delays data deletion after a Datameer upgrade can be used to assure that rollback is successful if it is needed. See How to Roll Back to a Previous Database Version and Adjusting for Rollback.

Custom Property "ErrorHandling Mode"

...

New property das.job.hadoop-progress.timeout sets a duration, in seconds, after which a Datameer job will terminate with a "Timed out" status if there is no progress on the job. 

Make changes to Enabling SSL for MySQL service page

The location of the file /<Datameer installation dir>/webapps/conductor/WEB-INF/classes/META-INF/persistence.xml has changed and is now a part of <Datameer installation dir>/webapps/conductor/WEB-INF/lib/dap-common-<version>.jar.  The connection.url must be accordingly edited.

Additions to Supported Hadoop Distributions

Support for Hadoop CDH 6.1/6.2

Datameer 7.5 supports CDH 6.1 and 6.2 with configuration of certain custom properties. Older Hadoop distributions might no longer be supported as of Datameer v7.0, see Supported Hadoop Distributions.


Support for AWS EMR 5.24

See Support for Hive import/export on AWS EMR 5.24 above.

...