Datameer supports the following types of structured, semi-structured, and unstructured data. See Supported Data Sources for additional details.

Databases

Relational databases include Oracle, DB2, and MySQL.

HSQL (file) – is a lightweight, 100% Java SQL Database Engine. You need to provide the database name you want to use, the username, and password.

Please look here for more information about importing data from a database.

Before being able to import data from a database an administrator need to Install Database Drivers.

Files

You can import or upload individual sheets from a spreadsheet by first converting the file to a .CSV file type.

File System Connectors

Datameer supports Bitverse SSH Server/Client for the Windows platform. The root paths to be specified while creating the connection should look something like: /c:/mydata/folder1

Datameer is able to split large files across multiple mappers enabling parallel data ingestion. Two requirements must be fulfilled for this to be possible.

  1. Splitting of the file protocol must be supported. Currently splitting all of the above protocols is supported.
  2. Splitting of the compression type must be supported. Currently LZO and Gzip are splittable, zip and Bz2 aren't supported.

See Importing Data for more information.

Others

Hive – a data warehouse infrastructure built on Hadoop that provides data summarization and ad hoc querying. You need to provide the connection type for the connection where the hive puts its data. This is usually a HDFS or S3 connection. In addition, you need to provide the warehouse location and the metastore URI in format such as thrift://host:10000Learn more about Hive.

HiveServer2 - a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.