Supported External Connector | Description | Available for Import | Available for Export | Notes |
---|
Connectors for Web Services |
Amazon EC2 | - a secure and resizable compute capacity to support virtually any workload
|
|
|
|
Atlassian Jira | - an issue and product tracking software from Atlassian to import data from JIRA via JQL (Java Query Language)
|
|
|
|
GitHub | - a software for distributed version management of files
|
|
|
|
Google AdSense | - an online service that displays advertising on websites outside of in-house offerings
|
|
|
|
Google Analytics | - a tracking tool , which is used for traffic analysis of websites
|
|
|
|
Google Fusion Tables | - a web service provided by Google for data management
- Fusion Tables can be used to collect, visualize and share data tables
|
|
|
|
Google Plus | - Google's former social network
|
|
|
|
Google Spreadsheet | - Google's spreadsheet program that is part of the free, web-based Google Docs Editors Suite
| |
|
|
Marketo REST - Lead List | - a marketing automation software for account-based marketing
| |
|
|
Marketo Soap - Lead Activity | - creation, retrieval and removal of entities and data stored within Marketo
|
|
|
|
New York Times API | - search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia, and other article metadata
|
|
|
|
Salesforce | - an enterprise cloud computing firm that specializes in social and mobile cloud technologies, including sales and CRM applications helping companies connect with customers, partners, and employees
|
|
|
|
Twitter REST API | - perform searches on Twitter
|
|
|
|
Zendesk | - a cloud-based customer support platform
|
|
|
|
Connectors for Files |
Apache Knox WebHDFS | - is the REST API and application gateway for the Hadoop ecosystem
| | |
|
Amazon S3 | - Amazon's Cloud online storage to store, organize and save data
| | |
|
Azure Blob Storage | - a Microsoft storage service for large unstructured binary and text data. Available for Datameer X HDP 2.0+ and CDH 4+ users. Please contact our services department for the connector plug-in
| | | - (For HDP2.0, 2.2 and CDH+4 users) Contact Datameer X services for info
|
Custom Protocol | - can be assigned the same name as a pre-defined protocol, in order to extend the number of IP addresses or ports associated with the original protocol
| | |
|
Datameer X Server Filesystem | - the local Datameer X filesystem
| | |
|
FTP (File Transfer Protocol) | - a standard network protocol used to transfer from one host or to another host over a TCP-based network, such as the internet
| | |
|
HDFS (Hadoop Distributed File System | - a distributed file system used by Hadoop applications that creates multiple replicas of data blocks and distributes them on nodes throughout a cluster to allow extremely rapid computations
| | |
|
OpenStack Swift | - offers cloud storage software so that you can store and retrieve lots of data with a simple API
- is built for scale and optimized for durability, availability, and concurrency across the entire data set
- is ideal for storing unstructured data that can grow without bound
| | |
|
SFTP (SSH File Transfer Protocol) | - transfers files and encrypts both commands and data, preventing passwords and sensitive information from being transmitted openly over the network
| | |
|
SSH (Secure Shell) | - is a set of Unix utilities including SCP and SFTP, based on SSL, which uses a simple Public Key Infrastructure and Encryption to allow you to securely transfer files between Unix file systems
INFO: Datameer X supports Bitverse SSH Server/Client for the Windows platform. The root paths to be specified while creating the connection should look something like: /c:/mydata/folder1
| | |
|
MapR FS | - a clustered field system that supports both very large-scale and high-performance uses
| | |
|
Connectors for Cloud Storages |
Amazon Redshift - Fast Load | - a fast exporting method by loading your data into your S3 server and then copying the data to your Redshift database
| | |
|
Azure Data Lake Storage Gen 2 | - a set of capabilities dedicated to big data analytics, built on Azure Blob storage
- Data Lake Storage Gen2 is the result of converging the capabilities of the two existing storage services, Azure Blob storage and Azure Data Lake Storage Gen1
| | |
|
Google Cloud Storage | - is a REST file hosting web service for storing and retrieving data on Google Cloud Platform infrastructure
- the service combines the performance and scalability of Google's cloud with advanced security and sharing features
| | |
|
S3 Native | - Amazon's Cloud online storage to store, organize and save data
| | |
|
Snowflake | - a comprehensive data platform provided as Software-as-a-Service (SaaS)
- enables data storage and analytic solutions
| | |
|
Connectors for Databases Info |
---|
| Relational databases include Oracle, DB2, and MySQL. |
|
Amazon Athena | - a query service to run Sql queries against their data
| | |
|
Amazon Redshift | - a quick, scalable data warehouse as a service from the cloud
| | | - Native Amazon Redshift JDBC 4.1 driver or a PostgreSQL jdbc driver can be used
|
Azure Cosmos DB | - a fully managed NoSQL database service
| | |
|
Azure Databricks | - an Apache Spark based analytics service with an interactive workspace
| | |
|
Azure Synapse | - an unlimited analytics service which enables flexible data queries as you see fit, using on-demand server less resources or provisioned resources at scale
| | |
|
DB2 | - IBM's relational database management system
| | |
|
Greenplum | - an open-source massively parallel processing (MPP) database
| | |
|
HSQL_file | - a lightweight, 100% Java SQL Database Engine
| | |
|
MSSQL | - a relational database based on structured query language
| | |
|
MySQL | - a relational database based on structured query language
| | |
|
Netezza | - a column-oriented database management system
| | |
|
Oracle | - a relational database management system designed for grid computing inclusive CLOB support for importing data
| | |
|
PostgreSQL | - an object-relational database management system (ORDBMS)
| | |
|
Sybase IQ | - a column-based, relational database software
| | |
|
Teradata Aster | - a relational database based on structured query language
| | | - Teradata database needs to be configured to support the appropriate character set
|
Vertica 5.1+ | - a grid-based and column-oriented analytic database software
| | |
|
Other Connectors |
Datameer Spotlight | - gives organizations fast access and deep visibility into all of their enterprise data assets - whether in the cloud or on-premises - via a single unified self-service platform
- with Datameer Spotlight business teams can discover, access, collaborate and analyze more data for faster, more trusted cloud analytics while eliminating complex data movement and maintaining strong governance
| | |
|
Google BigQuery | - is Google's fully managed data warehouse for petabyte analytics
| | |
|
HBase | - is an open-source non-relational distributes database
- is written in Java and runs on top of HDFS
| | | - In order to satisfy the classloader requirements, hbase-protocol.jar must be included in Hadoop's classpath and the root Datameer X classpath (/etc/custom-jars) for version 0.96.1 to 0.98.0
- Learn more on the Apache HBase Reference.
|
Hive Metastore | - a service that stores metadata related to Apache Hive and other services, in a backend RDBMS, such as MySQL or PostgreSQL
| | |
|
Hive (JDBC) | - an open source data warehouse system for querying and analysing large data sets stored in Hadoop
| | |
|
Hive Server2 (JDBC) | - a service that enables clients to execute queries against Hive
- it supports multi-client concurrency and authentication
- provides support for open API clients like JDBC and ODBC
| | |
|
IMAP & POP3 (Internet Message Access Protocol) | - IMAP is the internet standard protocol used by email clients to retreive email messages from a mail server over a TCP/ IP connection
- POP3 is a client/ server protocol in which email is received and held
| | |
|
Knox Hive Server2 JDBC | - the security instance when you have a Hive Server2 JDBC instance running
| | |
|
Power BI | - a business analytics service provided by Microsoft
- provides interactive visualizations with self-service business intelligence capabilities
| | |
|
Tableau Server | - visual analytics platform to host, and hold all tableau workbooks, datasources and more
| | | - minimum CentOS 7 as operating system
- requirements on Hadoop cluster`s operation system libraries:
- GNU C Library (libc6) version >= 2.15
- GNU Standard C++ Library v3 (libstdc++6) version >= 6.1.0
|