New and Retired Features and Distributions

New Features

Data Warehouse Connections (Beta)

You can now ingest data directly into Spectrum via a Data Warehouse Connector. Creating a connection and executing a separate import job is not needed when using a Data Warehouse Connection.

Why
1. Reduce the number of Datameer entities that need to be created and managed to access Tables and Views from Enterprise Data Warehouses
2. Simplify UX around finding Data Warehouse Objects to process within Workbooks
3. Reduce EDW resources by allowing for filters to be pushed-down to the warehouse on read
JDBC
1. Supports Import-Only (No Export Sheet at this time)
2. Drivers for Redshift, Postgres, and MySQL at this time
3. Easily extendable to other Databases with JDBC drivers
Snowflake
1. Supports both Import and Export
2. Follows Snowflake best-practices for loading data into Snowflake
3. Easily control access to and resources of Snowflake Objects and Warehouses
Pluggable
1. The Data Warehouse Connector is a new extension point that allows for new Data Warehouses to be supported via Datameer plugins
2. Easily upgrade existing or install new Data Warehouse Connectors by installing an updated plugin without having to upgrade the Datameer deployment

Workbook Incremental Processing

Reduce resource consumption on compute cluster when Workbooks only need to process
”latest” data from Appending Imports or other incremental Workbooks
Supports Advanced expressions for selecting “latest” based on your use-case requirements
Currently does not support selecting Partitions as well

BigQuery ODF Support

When Running on GCS, Workbook Sheets can be synced to BigQuery to make the data available as soon as the Workbook finishes on the cluster. Previously one would need to run both a Workbook and an Export Job and wait for both jobs to run before the data was available in BigQuery.

Separate Storage and Compute Cluster Mode

Initial implementation is for use-cases that do not require a large compute cluster, Datameer’s Local Mode execution can now be used in production storing results in cloud storage systems like S3
Supports Scaling-Up instead of out to simplify cloud deployments
Supports both single node and dual node (fail-over) cloud deployments
Compute and Storage are now pluggable to support other configurations if your use-cases require

Export Sheets (Beta)

Part of the Data Warehouse Connector extension point
Export to external systems from within a Workbook instead of using a separate Export Job
Allows for single Workbook (single job) ELT pipelines

Retired Features

Smart Analytics has been removed
1. Existing Smart Analytics Sheets have been migrated to empty sheets
2. There will be no replacement
Infographics has been removed
1. There will be no replacement
The ‘useraction.log’ file is no longer being written and similar data is now available via the ‘event_logging.log’ file
1. plugin-event-logging plugin controls the style of logging in the event_logging.log file
2. BASIC → more log like style
3. VERBOSE → JSONL file with each event being a line of json
4. Documentation Logging Events for Artifacts
HiveServer1 connectivity was removed
1. HiveServer2 connectivity should be used instead
2. Specifically our use of the Hive MetaStore Thrift API was removed
Plugins Removed:
1. Cobol
2. HBase
3. HTML Input Type
4. Mail
5. MBox
6. Text Mining Functions
7. Wikipedia

Distribution Changes

HDP

All HDP 2.x support was removed. 3.1.x is supported for now.

CDH

All CDH 5.x and 6.x support was removed except for CDH 6.3.4.

MapR

All MapR support has been removed.

Datameer X v12