General Product Description

INFO

Find here the general product description for Datameer X.

Datameer Application at a Glance

Datameer is an agile data fabric for building end-to-end, self-service, code-free data pipelines. Between ingest and landing data at the destination, Datameer offers advanced data curation and exploration capabilities in a spreadsheet-style interface. Furthermore, data pipelines can be scheduled and executed at scale.

Enterprise security and governance integrations, strong data management, and obfuscation capabilities, as well as automation/orchestration APIs, meet IT needs and allow large deployments in a controlled environment in sync with regulatory requirements. Datameer can run on-premises and also cloud-native on Google Cloud Platform, Amazon AWS and Microsoft Azure. Wherever it is deployed, it ships with deep integrations into the platform. Hybrid deployments can perfectly bridge both worlds in a secured fashion.

Security

Datameer integrates very well with the existing enterprise IT infrastructure. It is possible to connect Datameer with the existing shared user repositories like LDAP, Active Directory or Okta. SAML/ SSO based authentication is possible as well. The SAML/ SSO integration can be customized by Datameer’s Authentication SDK API. If Datameer is configured against a Google DataProc cluster, GCP’s IAM and KMS can be leveraged.

Since Datameer can be deployed in on-premises systems it also has a smooth integration with Kerberos, Sentry and Ranger. Datameer can obfuscate and encrypt the data while ingestion. The Function SDK API can be used if specific encryptions or obfuscations are required.

Datameer supports encrypting the data either on transit and at rest.

Monitoring & Administration

Datameer can be monitored by system administrators by using Nagios or JMX. Datameer’s EventBus can be leveraged to send events to third party systems. Audit Logs contain information about the user behavior.

Datameer can also be configured to send notifications for specific events. If system administrators or users want to be notified by specific events, Datameer provides an EventBus SDK API.

Processing & Storage

Datameer is able to execute the import jobs, datalinks, workbooks or export jobs against various distributed computation systems. Jobs can run against Google DataProc, Amazon EMR or Azure HDI as well as all major Apache Hadoop vendors. The data produced by Datameer can be stored in various distributed file systems like Google Cloud Storage, Amazon S3, Azure Data Lake Storage as well as HDFS or a compatible DFS.

Web/ Mobile Usage

Datameer is a web application that can be used by recent versions of Google Chrome (version 46+), Mozilla Firefox (version 42+), Microsoft Edge (version 12+), Apple Safari (version 9+) and Microsoft Internet Explorer (version 11+).

Analytics

Datameer provides a spreadsheet-like interface which enables you to do all types of analytics to your data pipeline.

Just to name a few examples for data transformations, it is possible to:

  • sort or filter data
  • aggregate and expand data
  • join, union or pivoting data
  • de-duplicate data
  • data science operations like one-hot encoding or bined encoding

Datameer provides more than 70 workbook functions, besides a powerful Functions SDK API. The application also provides a point-and-click interface to parse your JSON data. Datameer provides information about your data profile & metrics like number of unique records, minimum, mean and maximum value (if the data type has such an operator semantics), ...

Datameer is providing moreover four algorithms well known by data scientists - Clustering, Decision Tree, Column Dependencies and Recommendations.

Data Sources & Sinks

Datameer is able to ingest data from and export data to a wide range of various third party systems and therefore provides reading/ writing connectors for:

  • cloud native data warehouses, e.g. BigQuery, Snowflake, Redshift, ...
  • cloud data lakes, e.g. Google Cloud Storage, Amazon S3, Azure Data Lake Storage, ...
  • on-premises systems, e.g. SFTP, Hive or HBase, ...
  • JDBC-based systems, e.g. Exasol, Netezza or Teradata, …
  • files, e.g. CSV, JSON, Parquet, ORC or Avro, Cobol Copybooks, …
  • web services, e.g. Salesforce, Marketo, ...

If you miss any type of datasource or data sink - Datameer provides a powerful pluggable Connector SDK API.

Data Integration & Management

Once the data pipeline design is completed, Datameer supports the user to schedule single artifacts or entire data pipelines either in a time-based or data-event driven fashion.

Datameer provides the retention policy modes “Append”, “Replace” or “Sliding Time Window”. Each artifact in Datameer has a JSON representation and can be versioned. Datameer has a Git repository integration which supports you with the management of your different artifact versions.

A Datameer workbook can be configured to run in production mode, which will compute only the really required data-transformations. This optimizes the workbook by reducing computing and storing resources.

Datameer’s Open Data Format allows end users to expose the workbook’s data into Hive without copying data.

Datameer furthermore comes with strong metadata management features like tags & search, full lineage analysis, tracking different metrics of a Datameer job as well as descriptions on different artifact, sheet and column levels. Once changes on Datameer artifacts are done, Datameer informs you about potential impact on dependening artifacts like downstream workbooks.

REST Interface

It is possible to automate tasks by using Datameer’s REST API. A user can start, stop or monitor executed jobs. You can also use the REST API to create or update your artifacts or data pipelines.