Exporting to HiveServer2
Requirements
A HiveServer2 connection with Datameer X must be created before exporting data.
INFO
Learn about setting up a connection between Datameer X and HiveServer2, depending on your HiveServer here.
INFO
The export behavior and validation for partitioned and unpartitioned tables are the same. Incorrect mapping will result in a failed export, and invalid records will be dropped from the export results. However, for unpartitioned tables with mode RECREATE, the original table definition is used to recreate the table and only columns with name and type are replaced by export sheet definition.
Note that it is possible to disable value validation, should you choose, by setting the custom Hadoop property
'hive.server2.export.record.validation=false'.
Limitations
INFO
HiveServer1 is being deprecated and will be removed as a connector after Datameer X version 11.
Configuring Export Jobs with a HiveServer2 Connection
To export a worksheet through a HiveServer2 connector:
- Click the "+" button and choose "Export Job" or right-click in the browser and select "Create New" → "Export Job". The 'New Export Job' tab appears in the menu bar.
or - Select the workbook to export from in the browser by clicking on "Select". The dialog 'Select Workbook' opens.
- Select the workbook and confirm with "Select". The name of the workbook appears in the workbook field.
- Click "Next".
- Select the worksheet to export from out of the drop-down menu. The 'name of the sheet is displayed in the 'Select Sheet' field. A preview is displayed.
INFO: Only kept sheets are available for export. If the sheet you want to export is missing, modify the workbook settings to keep the sheet. - Click "Next".
- Select the HiveServer2 connector by clicking on "Select Connection". The dialog 'Select Connection' opens.
INFO: If there is already a valid HiveServer2 connection, you will find it unter 'connection'.
INFO: If there is no valid connection to HiveServer2 yet, you can create it here by clicking on "New Connection".
- Select the required HiveServer2 connection and confirm with "Select". The selected connection is displayed in the 'Connection' field.
- Click "Next". The 'Data Details' section opens.
- Select the database in which to export.
INFO: To export to a new table on the database, select "Create new if not exist" and enter a new table name under "Name" and select the output format from the drop-down "Table Output Format".
INFO: To export to an existing table on the Hive server, select "Use existing". Selecting the file format will be ignored for existing tables. - Select a record replacement policy.
INFO: 'Drop and create new table' - The existing table is dropped from the database. A new table is created with the schema defined from the worksheet being exported. If the initial table was an external one, Datameer X will drop it and create a new external table. This replacement policy is only available for unpartitioned tables and is written as a TEXTFILE (plain text).
INFO: 'Overwrite records' - All records from the existing table are deleted and the new records are written to the table. No table properties are modified and the data is written into the original file type. (e.g. 'ORC')
INFO: 'Append records' - The records append to the existing table. Confirm with "Next". The section 'Mapping' opens.
- Check the preview and decide which columns will be included and whether invalid data is being accepted.
INFO: All checkboxes are activated per default. - Decide how to handle invalid data and confirm with "Next". The section 'Schedule' opens.
- Select the schedule method for when to run the export job.
- Confirm with "Next".
- If needed, add an export job description and mark the check box if you want to have the export job processed right after saving.
- If needed, add your email address in section 'Notification' and confirm with "Save". The dialog 'Save Export Job' opens.
INFO: Datameer X can only send email notifications if an email server has been configured. - Select the place to save the export job, name it and confirm with "Save". Exporting data is finished.
Implicit Datatype Mapping
INFO
The mapping table below describes how Datameer X creates table schema when a new one must be created and how a Datameer X value is validated and transformed into a Hive value type. See Hive documentation Allowed Implicit Conversions for further details.
For exports into an existing table, type mapping validation is performed to make sure records match the expected table column structure.
Datameer X Value Type | Hive equivalent | boolean | tinyint | smallint | int | bigint | float | double | decimal | string | varchar | timestamp | date |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STRING | string | (♦) | |||||||||||
BOOLEAN | boolean | (♦) | |||||||||||
FLOAT | double | (♦) | |||||||||||
INTEGER | bigint | (♦) | |||||||||||
DATE | timestamp | (1) | (♦) | ||||||||||
BIG_DECIMAL | Decimal | (♦)(2) | |||||||||||
BIG_INTEGER | - | (3) | (♦) | ||||||||||
List<type> | array<type> | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) | (4) |
(♦) The default transformation for a newly created table - describes which Hive type Datameer X uses when creating a new table.
(1) Transformation of Date to String by given date pattern, otherwise the default Hive date pattern (yyyy-MM-dd HH:mm:ss) will be used to format.
(2) Datameer X BigDecimal has an unlimited precision/scale; this conversion is limited to Hive's decimal with maximum precision of 38 and scale of 38.
(3) Datameers BigInteger has an unlimited precision, this conversion is limited to hive's decimal with max precision 38 and scale 0
(4) Conversion of Datameer X List's to Hive array uses the same logic to convert any underlying Datameer X value type.