Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Welcome to Datameer X - your unified platform for the entire Big Data lifecycle.

The Datameer X approach is focused on ease of use for business users without imposing an additional load on the IT organization.

  • A familiar interactive spreadsheet interface allows you to easily navigate and analyze large data volumes
  • Over 175 built-in functions offer business users powerful, but easy-to-use data analytics for exploring and discovering complex relationships
  • Drag & drop reporting and charting facilitates the effortless creation and customization of personalized data visualizations (infographics)
  • Plug-in API supports building custom functions

...

Use Datameer X to analyze customer relationship management content, web logs, customer data, sales data, social media content, and even data from Excel files. You can store that data on your own servers or use a service available on the cloud such as Amazon Web Services.

Datameer X provides a familiar interactive spreadsheet-based interface that is easy to use, but also powerful so that you don’t need to turn to developers for analytics. The spreadsheet is specifically designed for visualization of big data and includes more than 200 built-in functions for exploring and discovering complex relationships. In addition, because Datameer X is extensible, you can use functions from third-party tools or write your own commands.

...

Your existing data feeds into Datameer X where Hadoop manages and distributes both the data and the computational load over multiple computers networked together. 

The Datameer X tools allow you to easily Extract, Transform, and Load (ETL) data from multiple sources including your current transactional database systems regardless of source or formats. Then you can analyze relationships in the data using an interactive spreadsheet interface and visualize the results of that analysis using the built-in infographic widgets.

Datameer X is specifically designed to solve the challenges of accessing, analyzing and using massive amounts of data, leveraging Apache Hadoop open source technology. Datameer X enables enterprises to gain insights from all available data sources regardless of size in a cost effective manner.

Massively parallel processing architecture facilitates ultra-fast performance of complex analytics. Hadoop scales to 4000 servers and petabytes of data and the application processes are fully parallelized inside Hadoop clusters. This dynamic workload optimization utilizes hardware more efficiently.

Datameer X includes built-in fault resilience for high application availability, and elastic expansion to dynamically expand storage capacity without system downtime. The advanced data compression increases performance and decreases storage requirements.

Connect to Various Types of Data

Each type of data is set up as a connection so it can be used by Datameer X. For example, you can have sales data from an Oracle or MySQL server, other content from a CSV file exported from Excel, twitter feeds about your company and products, and customer call logging data from yet another source. You can easily pull all that information into Datameer X. 

Process and Analyze Data

You create a workbook in Datameer X that connects to one or more of these sources of data which you can then use to do analysis. For example, you could use sales data from your corporate database, twitter feeds, customer call logging data-all from different sources as the basis for your analysis. See Configuring a Connection to learn how to connect to one or more data sources.

Export Data

The data saved with a workbook is available for use when creating an export job. You choose what data gets saved in the Save Workbook settings page. See Exporting Data to learn more.

Optimize Performance

By saving only the information you need, you can conserve disk storage space and reduce the time needed to calculate your jobs. In the Save Workbook settings you can discard the intermediate steps. See Working With Workbooks to learn more.

System administrators can also optimize performance by optimizing the configuration of the Hadoop cluster. See the Hadoop and Datameer X page.

Data Storage

Datameer X preserves state information in the File Browser's localStorage for workbooks, including column view information such as width, position, etc. At no point does Datameer X store actual data in the browser cache.

...

Info
titleINFO

Find here the general product description for Datameer X.

Table of Contents

Datameer Application at a Glance

Datameer is an agile data fabric for building end-to-end, self-service, code-free data pipelines. Between ingest and landing data at the destination, Datameer offers advanced data curation and exploration capabilities in a spreadsheet-style interface. Furthermore, data pipelines can be scheduled and executed at scale.

Enterprise security and governance integrations, strong data management, and obfuscation capabilities, as well as automation/orchestration APIs, meet IT needs and allow large deployments in a controlled environment in sync with regulatory requirements. Datameer can run on-premises and also cloud-native on Google Cloud Platform, Amazon AWS and Microsoft Azure. Wherever it is deployed, it ships with deep integrations into the platform. Hybrid deployments can perfectly bridge both worlds in a secured fashion.

Security

Datameer integrates very well with the existing enterprise IT infrastructure. It is possible to connect Datameer with the existing shared user repositories like LDAP, Active Directory or Okta. SAML/ SSO based authentication is possible as well. The SAML/ SSO integration can be customized by Datameer’s Authentication SDK API. If Datameer is configured against a Google DataProc cluster, GCP’s IAM and KMS can be leveraged.

Since Datameer can be deployed in on-premises systems it also has a smooth integration with Kerberos, Sentry and Ranger. Datameer can obfuscate and encrypt the data while ingestion. The Function SDK API can be used if specific encryptions or obfuscations are required.

Datameer supports encrypting the data either on transit and at rest.

Monitoring & Administration

Datameer can be monitored by system administrators by using Nagios or JMX. Datameer’s EventBus can be leveraged to send events to third party systems. Audit Logs contain information about the user behavior.

Datameer can also be configured to send notifications for specific events. If system administrators or users want to be notified by specific events, Datameer provides an EventBus SDK API.

Processing & Storage

Datameer is able to execute the import jobs, datalinks, workbooks or export jobs against various distributed computation systems. Jobs can run against Google DataProc, Amazon EMR or Azure HDI as well as all major Apache Hadoop vendors. The data produced by Datameer can be stored in various distributed file systems like Google Cloud Storage, Amazon S3, Azure Data Lake Storage as well as HDFS or a compatible DFS.

Web/ Mobile Usage

Datameer is a web application that can be used by recent versions of Google Chrome (version 46+), Mozilla Firefox (version 42+), Microsoft Edge (version 12+), Apple Safari (version 9+) and Microsoft Internet Explorer (version 11+).

Analytics

Datameer provides a spreadsheet-like interface which enables you to do all types of analytics to your data pipeline.

Just to name a few examples for data transformations, it is possible to:

  • sort or filter data
  • aggregate and expand data
  • join, union or pivoting data
  • de-duplicate data
  • data science operations like one-hot encoding or bined encoding

Datameer provides more than 70 workbook functions, besides a powerful Functions SDK API. The application also provides a point-and-click interface to parse your JSON data. Datameer provides information about your data profile & metrics like number of unique records, minimum, mean and maximum value (if the data type has such an operator semantics), ...

Datameer is providing moreover four algorithms well known by data scientists - Clustering, Decision Tree, Column Dependencies and Recommendations.

Data Sources & Sinks

Datameer is able to ingest data from and export data to a wide range of various third party systems and therefore provides reading/ writing connectors for:

  • cloud native data warehouses, e.g. BigQuery, Snowflake, Redshift, ...
  • cloud data lakes, e.g. Google Cloud Storage, Amazon S3, Azure Data Lake Storage, ...
  • on-premises systems, e.g. SFTP, Hive or HBase, ...
  • JDBC-based systems, e.g. Exasol, Netezza or Teradata, …
  • files, e.g. CSV, JSON, Parquet, ORC or Avro, Cobol Copybooks, …
  • web services, e.g. Salesforce, Marketo, ...

If you miss any type of datasource or data sink - Datameer provides a powerful pluggable Connector SDK API.

Data Integration & Management

Once the data pipeline design is completed, Datameer supports the user to schedule single artifacts or entire data pipelines either in a time-based or data-event driven fashion.

Datameer provides the retention policy modes “Append”, “Replace” or “Sliding Time Window”. Each artifact in Datameer has a JSON representation and can be versioned. Datameer has a Git repository integration which supports you with the management of your different artifact versions.

A Datameer workbook can be configured to run in production mode, which will compute only the really required data-transformations. This optimizes the workbook by reducing computing and storing resources.

Datameer’s Open Data Format allows end users to expose the workbook’s data into Hive without copying data.

Datameer furthermore comes with strong metadata management features like tags & search, full lineage analysis, tracking different metrics of a Datameer job as well as descriptions on different artifact, sheet and column levels. Once changes on Datameer artifacts are done, Datameer informs you about potential impact on dependening artifacts like downstream workbooks.

REST Interface

It is possible to automate tasks by using Datameer’s REST API. A user can start, stop or monitor executed jobs. You can also use the REST API to create or update your artifacts or data pipelines.