Supported External Data Types and Sources

Supported External Data Types and Sources

INFO

Datameer X supports the following types of structured, semi-structured, and unstructured data types and sources/ connectors when importing and exporting.

Supported External Connectors

Supported External Connector

Description

Available for Import

Available for Export

Notes

Supported External Connector

Description

Available for Import

Available for Export

Notes

Connectors for Web Services

Amazon EC2

  • a secure and resizable compute capacity to support virtually any workload

 

 

 

Atlassian Jira

  • an issue and product tracking software from Atlassian to import data from JIRA via JQL (Java Query Language)

 

 

 

GitHub

  • a software for distributed version management of files

 

 

 

Google AdSense

  • an online service that displays advertising on websites outside of in-house offerings

 

 

 

Google Analytics

  • a tracking tool , which is used for traffic analysis of websites

 

 

 

Google Fusion Tables

  • a web service provided by Google for data management

  • Fusion Tables can be used to collect, visualize and share data tables

 

 

 

Google Plus

  • Google's former social network

 

 

 

Google Spreadsheet

  • Google's spreadsheet program that is part of the free, web-based Google Docs Editors Suite

 

 

 

Marketo REST - Lead List

  • a marketing automation software for account-based marketing

 

 

 

Marketo Soap - Lead Activity

  • creation, retrieval and removal of entities and data stored within Marketo

 

 

 

New York Times API

  • search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia, and other article metadata

 

 

 

Salesforce

  • an enterprise cloud computing firm that specializes in social and mobile cloud technologies, including sales and CRM applications helping companies connect with customers, partners, and employees

 

 

 

Zendesk

  • a cloud-based customer support platform 

 

 

 

Connectors for Files

Apache Knox WebHDFS

  • is the REST API and application gateway for the Hadoop ecosystem

 

Amazon S3

  • Amazon's Cloud online storage to store, organize and save data

 

Azure Blob Storage

  • a Microsoft storage service for large unstructured binary and text data. Available for Datameer X HDP 2.0+ and CDH 4+ users. Please contact our services department for the connector plug-in

  • (For HDP2.0, 2.2 and CDH+4 users) Contact Datameer X services for info

Custom Protocol

  • can be assigned the same name as a pre-defined protocol, in order to extend the number of IP addresses or ports associated with the original protocol

 

Datameer X Server Filesystem

  • the local Datameer X filesystem

 

FTP (File Transfer Protocol)

  • a standard network protocol used to transfer from one host or to another host over a TCP-based network, such as the internet

 

HDFS (Hadoop Distributed File System

  • a distributed file system used by Hadoop applications that creates multiple replicas of data blocks and distributes them on nodes throughout a cluster to allow extremely rapid computations

 

OpenStack Swift

  • offers cloud storage software so that you can store and retrieve lots of data with a simple API

  • is built for scale and optimized for durability, availability, and concurrency across the entire data set

  • is ideal for storing unstructured data that can grow without bound

 

SFTP (SSH File Transfer Protocol)

  • transfers files and encrypts both commands and data, preventing passwords and sensitive information from being transmitted openly over the network

 

SSH (Secure Shell)

  • is a set of Unix utilities including SCP and SFTP, based on SSL, which uses a simple Public Key Infrastructure and Encryption to allow you to securely transfer files between Unix file systems

INFO: Datameer X supports Bitverse SSH Server/Client for the Windows platform. The root paths to be specified while creating the connection should look something like: /c:/mydata/folder1

 

 

MapR FS

  • a clustered field system that supports both very large-scale and high-performance uses

 

Connectors for Cloud Storages

Amazon Redshift - Fast Load

  •  a fast exporting method by loading your data into your S3 server and then copying the data to your Redshift database

 

Azure Data Lake Storage Gen 2

  • a set of capabilities dedicated to big data analytics, built on Azure Blob storage

  • Data Lake Storage Gen2 is the result of converging the capabilities of the two existing storage services, Azure Blob storage and Azure Data Lake Storage Gen1

 

Google Cloud Storage

  • is a REST file hosting web service for storing and retrieving data on Google Cloud Platform infrastructure

  • the service combines the performance and scalability of Google's cloud with advanced security and sharing features

 

S3 Native

  • Amazon's Cloud online storage to store, organize and save data

 

Snowflake

  • a comprehensive data platform provided as Software-as-a-Service (SaaS)

  • enables data storage and analytic solutions

 

Connectors for Databases

INFO

Relational databases include Oracle, DB2, and MySQL.

Amazon Athena

  • a query service to run Sql queries against their data

 

Amazon Redshift

  • a quick, scalable data warehouse as a service from the cloud

  • Native Amazon Redshift JDBC 4.1 driver or a PostgreSQL jdbc driver can be used

Azure Cosmos DB

  • a fully managed NoSQL database service

 

Azure Databricks

  • an Apache Spark based analytics service with an interactive workspace

 

Azure Synapse

  • an unlimited analytics service which enables flexible data queries as you see fit, using on-demand server less resources or provisioned resources at scale

 

DB2

  • IBM's relational database management system

 

Greenplum

  • an open-source massively parallel processing (MPP) database

 

HSQL_file

  • a lightweight, 100% Java SQL Database Engine

 

MSSQL

  • a relational database based on structured query language

 

MySQL

  • a relational database based on structured query language

 

Netezza

  • a column-oriented database management system

 

Oracle

  • a relational database management system designed for grid computing inclusive CLOB support for importing data

 

PostgreSQL

  • an object-relational database management system (ORDBMS)

 

Sybase IQ

  • a column-based, relational database software

 

Teradata Aster

  • a relational database based on structured query language

  • Teradata database needs to be configured to support the appropriate character set

Vertica 5.1+

  • a grid-based and column-oriented analytic database software

 

Other Connectors

Datameer Spotlight

  • gives organizations fast access and deep visibility into all of their enterprise data assets - whether in the cloud or on-premises - via a single unified self-service platform 

  • with Datameer Spotlight business teams can discover, access, collaborate and analyze more data for faster, more trusted cloud analytics while eliminating complex data movement and maintaining strong governance

 

Google BigQuery

  • is Google's fully managed data warehouse for petabyte analytics

 

Hive (JDBC)

  • an open source data warehouse system for querying and analysing large data sets stored in Hadoop

 

Hive Server2 (JDBC)

  • a service that enables clients to execute queries against Hive

  • it supports multi-client concurrency and authentication

  • provides support for open API clients like JDBC and ODBC

 

Knox Hive Server2 JDBC

  • the security instance when you have a Hive Server2 JDBC instance running

 

Power BI

  • a business analytics service provided by Microsoft

  • provides interactive visualizations with self-service business intelligence capabilities

 

Tableau Server

  • visual analytics platform to host, and hold all tableau workbooks, datasources and more

  • minimum CentOS 7 as operating system

  • requirements on Hadoop cluster`s operation system libraries:

    • GNU C Library (libc6) version >= 2.15

    • GNU Standard C++ Library v3 (libstdc++6) version >= 6.1.0

Datameer X is able to split large files across multiple mappers enabling parallel data ingestion. Two requirements must be fulfilled for this to be possible.

  1. Splitting of the file protocol must be supported. Currently splitting all of the above protocols is supported.

  2. Splitting of the compression type must be supported. Currently LZO and Gzip are splittable, zip and Bz2 aren't supported.

See Importing Data for more information.

Supported External Data Types

INFO

You can import or upload individual sheets from a spreadsheet by first converting the file to a .CSV file type.

File Type

Description

Available for Import

File Type

Description

Available for Import