Page Comparison

Export data to a HiveServer1 or HiveServer2 database.

Warning
HiveServer1 is being deprecated and will be removed as a connector after Datameer v7.5.

Table of Contents

Configuring Hive as a Connection

A HiveServer1 or HiveServer2 connection with Datameer

Table of Contents

Requirements

A HiveServer2 connection with Datameer X must be created before exporting data (see Hive or HiveServer2 connection documentation).As of Datameer 7.4.x, .

Info

title	INFO

Learn about setting up a connection between Datameer X and HiveServer2, depending on your HiveServer here.

Info

title	INFO

The export behavior and validation for partitioned and unpartitioned tables

...

are the same. Incorrect mapping will result in a failed export, and invalid records will be dropped from the export results. However, for unpartitioned tables with mode RECREATE, the original table definition is used to recreate the table and only columns with name and type are replaced by export sheet definition.

Note that it is possible to disable value validation, should you choose, by setting the custom Hadoop property

No Format
'hive.server2.export.record.validation=false'.

Limitations

Info

title	INFO

HiveServer1 is being deprecated and will be removed as a connector after Datameer X version 11.

Configuring Export Jobs with a

...

HiveServer2 Connection

To export a worksheet to through a Hive HiveServer2 connector:

Select Click the + (plus) at the top left of the File Browser and select Export Job"+" button and choose "Export Job" or right-click on the workbook to export from in the browser and select "Create New" → "Export Job".
Image Removed Image Removed The 'New Export Job' tab appears in the menu bar.
Image Added or Image Added
Select the workbook to export from in the browser by clicking on "Select".
Image Removed Note
title As of Datameer 7.5
Datameer 7.5 has implemented automatic sorting of data via partitioned Hive columns before an export job runs.
For Hive exports in Datameer prior to version 7.5, we recommend that workbook be sorted via partitioned columns for the best performance. The dialog 'Select Workbook' opens.
Image Added
Select the workbook and confirm with "Select". The name of the workbook appears in the workbook field.
Image Added
Click "Next".
Image Added
Select the worksheet to export from out of the drop-down menu.
Image Removed The 'name of the sheet is displayed in the 'Select Sheet' field. A preview is displayed.
INFO: Only kept sheets are available for export. If the sheet you want to export is missing, modify the workbook settings to keep the sheet.
Image Added
Click "Next".
Image Added
Select the Hive or HiveServer2 connector by clicking on "Select Connection".
Image Removed Anchordata_detailsdata_detailsSelect a database on the Hive server from the drop down menu.
If a database filter has been added to the Hive connection settings, that filter is respected when displaying databases in the drop down list.
Image Removed
To export to an existing table on the Hive server, select the radio button Use existing.
Image Removed
To The dialog 'Select Connection' opens.
INFO: If there is already a valid HiveServer2 connection, you will find it unter 'connection'.
INFO: If there is no valid connection to HiveServer2 yet, you can create it here by clicking on "New Connection".
Image Added
Select the required HiveServer2 connection and confirm with "Select". The selected connection is displayed in the 'Connection' field.
Image Added
Click "Next". The 'Data Details' section opens.
Image Added
Select the database in which to export.
INFO: To export to a new table on a the database, select the radio button "Create new if not exist and exist" and enter a new table name .
Image Removed
Select the expected under "Name" and select the output format from the drop-down "Table Output Format".
INFO: To export to an existing table on the Hive server, select "Use existing". Selecting the file format will be ignored for already existing tables.
Image Added
Select a record replacement policy. Learn more about database record replacement.

INFO: 'Drop and create new table' - The existing table is dropped from the database. A new table is created with the schema defined from the worksheet being exported. If the initial table was an external one, Datameer X will drop it and create a new external table. This replacement policy is only available for unpartitioned tables and is written as a TEXTFILE a TEXTFILE (plain text).
INFO: 'Overwrite records' - All records from the existing table are deleted and the new records are written to the table. No table properties are modified and the data is written into the original file type. (e.g. , 'ORC')
INFO: 'Append records' - The records append to the existing table.
Image Removed Anchorclassic_mapclassic_mapExporting: Datameer Versions Through 7.4.x ("Classic") Anchorpartitioned_mappingpartitioned_mappingExporting into partitioned tables
Supports exporting to an existing table on the Hive server.
If exporting to a Hive table in a Sentry secured HDFS, the table being exported to requires read, write, and execute permissions.

Datameer exports support the Hive data types listed below in the table.

Datameer type

Hive type

void

boolean

tinyint

smallint

int

bigint

float

double

decimal

string

varchar

char

timestamp

date

binary

array< data_type >

STRING

BOOLEAN

FLOAT

INTEGER

DATE

BIG_DECIMAL

BIG_INTEGER

LIST

The export is written into the existing Hive output format. (e.g., ORC, Parquet, …)

Tip

Users have run into Null Pointer Exceptions when exporting into a parquet table. The root cause is a bug in the Hive implementation that uses a default codec for compression. You can solve this issue by configuring the codec: (e.g., org.apache.hadoop.io.compress.DefaultCodec)

Panel
`mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec`

Limitations on exporting to existing partitioned tables:

Unable to write a list inside another list.
Unable to write the Hive data types "void" and "binary".
If a record being written to the Hive server doesn't fit correctly into the Hive data type, it is dropped. The response to invalid data can be configured on the Mapping tab under "How to handle invalid data?" (see Step 6).
This feature has been tested on a Sentry secured environment with Cloudera but has not been tested in a Ranger secured environment with Hortonworks.

Exporting into unpartitioned tables

Anchor

unpartitioned_mapping

For unpartitioned tables with mode RECREATE, the original table definition is used to recreate the table - only columns with name and type are replaced by the export sheet definition.
Supports both exporting to a new table or into an existing table on the Hive server.
If exporting to a Hive table in a Sentry secured HDFS, the table being exported to requires read, write, and execute permissions.

Datameer exports support a limited number of Hive data types (see the tables below).

The list of supported types is related to the data field mapping mode that can be configured in the Hive plug-in.

Datameer classic

Datameer type

In Hive (Datameer classic)

boolean

bigint

double

string

array< data_type >

STRING

array<string>

BOOLEAN

array<boolean>

FLOAT

array<double>

INTEGER

array<bigint>

DATE

YYYY-mm-dd hh:MM:ss

array<string>

BIG_DECIMAL

array<string>

BIG_INTEGER

array<string>

LIST

Hive specific

Datameer type

In Hive (Hive specific)

boolean

bigint

double

decimal

string

date

timestamp

array< data_type >

STRING

array<string>

BOOLEAN

array<boolean>

FLOAT

array<double>

INTEGER

array<bigint>

DATE

array<date> or array<timestamp>

BIG_DECIMAL

array<decimal>

BIG_INTEGER

array<string>

LIST

The export is written in the output format TEXTFILE (plain text).

Anchor

preview

Preview how the data is being mapped to the Hive table and select how to handle invalid data.

Image Removed

Requirements for exporting:

A minimum of one column must be included.
The column names must be unique.

Requirements for exporting into an existing partitioned table:

Column names in Datameer must match the column names in the Hive table (case sensitivity applies).
Partitioned columns must match to the existing partition of the Hive table.
The Datameer column data types must match the Hive column data types.
The number of included columns from the Datameer worksheet in the export must match the number of columns in the Hive table.

Requirements for exporting into an existing unpartitioned table:

Column names in Datameer must match the column names in the Hive table. (Case insensitive)
The column order in the Datameer export must match the column order in the Hive table.

The Datameer column data types must match the Hive column data types.
The number of included columns from the Datameer worksheet in the export must match the number of columns in the Hive table.

Select when to run the export job. This can be done manually by a user, run each time the workbook in Datameer is calculated, or at specifically scheduled times.
Custom properties can be added in the advanced section as well as logging specifications.
Image Removed
Enter a description and click Save.
Image Removed

...

Implicit datatype mapping

...

Image Added
Confirm with "Next". The section 'Mapping' opens.
Image Added
Check the preview and decide which columns will be included and whether invalid data is being accepted.
INFO: All checkboxes are activated per default.
Image Added
Decide how to handle invalid data and confirm with "Next". The section 'Schedule' opens.
Image Added
Select the schedule method for when to run the export job.
Image Added
Confirm with "Next".
Image Added
If needed, add an export job description and mark the check box if you want to have the export job processed right after saving.
Image Added
If needed, add your email address in section 'Notification' and confirm with "Save". The dialog 'Save Export Job' opens.
INFO: Datameer X can only send email notifications if an email server has been configured.
Image Added
Select the place to save the export job, name it and confirm with "Save". Exporting data is finished.
Image Added

Implicit Datatype Mapping

Info

title	INFO

The mapping table below describes how Datameer X creates table schema when a new one must be created

...

and how a Datameer X value is validated and transformed into a Hive value type. See Hive documentation Allowed Implicit Conversions for further details.

For exports into an existing table, type mapping validation is performed to make sure records match the expected table column structure.

Datameer X Value Type	Hive equivalent	boolean	bigint	double	string	timestamp
STRING	string				(♦)
BOOLEAN	boolean	(♦)
FLOAT	double			(♦)
INTEGER	bigint		(♦)
DATE	timestamp				(1)	(♦)
BIG_DECIMAL

-

Decimal								(♦)(2)
BIG_INTEGER	-								(3)	(♦)
List<type>	array<type>	(4)	(4)	(4)	(4)	(4)	(4)	(4)	(4)	(4)	(4)	(4)	(4)

(♦) The default transformation for a newly created table - describes which Hive type Datameer X uses when creating a new table.

(1) Transformation of Date to String by given date pattern, otherwise the default Hive date pattern (yyyy-MM-dd HH:mm:ss) will be used to format.

(2) Datameer X BigDecimal has an unlimited precision/scale; this conversion is limited to Hive's decimal with maximum precision of 38 and scale of 38.

...

(4) Conversion of Datameer X List's to Hive array uses the same logic to convert any underlying Datameer value type.

Newly Created Tables

There is a behavior change with Datameer 7.5 when a new table is created in the export wizard:

...

Datameer 7.4.x & earlier

...

Datameer 7.5+ Hive implicit

...

If you need to use the 7.4 Classic mode, you can set the BigDecimal or Date column to String in the workbook, which will be exported as String when using the Hive implicit (7.5+) modeX value type.

Versions Compared

Old Version 1

New Version Current

Key

Configuring Hive as a Connection

Requirements

Limitations

Configuring Export Jobs with a

HiveServer2 Connection

Exporting into unpartitioned tables

Implicit datatype mapping

Implicit Datatype Mapping

Newly Created Tables