New and Retired Features and Distributions
New Features
Data Warehouse Connections (Beta)
You can now ingest data directly into Spectrum via a Data Warehouse Connector. Creating a connection and executing a separate import job is not needed when using a Data Warehouse Connection.
Why
Reduce the number of Datameer entities that need to be created and managed to access Tables and Views from Enterprise Data Warehouses
Simplify UX around finding Data Warehouse Objects to process within Workbooks
Reduce EDW resources by allowing for filters to be pushed-down to the warehouse on read
JDBC
Supports Import-Only (No Export Sheet at this time)
Drivers for Redshift, Postgres, and MySQL at this time
Easily extendable to other Databases with JDBC drivers
Snowflake
Supports both Import and Export
Follows Snowflake best-practices for loading data into Snowflake
Easily control access to and resources of Snowflake Objects and Warehouses
Pluggable
The Data Warehouse Connector is a new extension point that allows for new Data Warehouses to be supported via Datameer plugins
Easily upgrade existing or install new Data Warehouse Connectors by installing an updated plugin without having to upgrade the Datameer deployment
Workbook Incremental Processing
Reduce resource consumption on compute cluster when Workbooks only need to process
”latest” data from Appending Imports or other incremental WorkbooksSupports Advanced expressions for selecting “latest” based on your use-case requirements
Currently does not support selecting Partitions as well
BigQuery ODF Support
When Running on GCS, Workbook Sheets can be synced to BigQuery to make the data available as soon as the Workbook finishes on the cluster. Previously one would need to run both a Workbook and an Export Job and wait for both jobs to run before the data was available in BigQuery.
Separate Storage and Compute Cluster Mode
Initial implementation is for use-cases that do not require a large compute cluster, Datameer’s Local Mode execution can now be used in production storing results in cloud storage systems like S3
Supports Scaling-Up instead of out to simplify cloud deployments
Supports both single node and dual node (fail-over) cloud deployments
Compute and Storage are now pluggable to support other configurations if your use-cases require
Export Sheets (Beta)
Part of the Data Warehouse Connector extension point
Export to external systems from within a Workbook instead of using a separate Export Job
Allows for single Workbook (single job) ELT pipelines
Retired Features
Smart Analytics has been removed
Existing Smart Analytics Sheets have been migrated to empty sheets
There will be no replacement
Infographics has been removed
There will be no replacement
The ‘useraction.log’ file is no longer being written and similar data is now available via the ‘event_logging.log’ file
plugin-event-logging
plugin controls the style of logging in theevent_logging.log
fileBASIC → more log like style
VERBOSE → JSONL file with each event being a line of json
Documentation Logging Events for Artifacts
HiveServer1 connectivity was removed
HiveServer2 connectivity should be used instead
Specifically our use of the Hive MetaStore Thrift API was removed
Plugins Removed:
Cobol
HBase
HTML Input Type
Mail
MBox
Text Mining Functions
Wikipedia
Distribution Changes
HDP
All HDP 2.x support was removed. 3.1.x is supported for now.
CDH
All CDH 5.x and 6.x support was removed except for CDH 6.3.4.
MapR
All MapR support has been removed.