Content Comparison

...

Click the + (plus) button and select I mport Job or right-click in the browser and select Create new > Import job.
Click Select Connection, select your database connection and click Select. After you have selected the database connection, click Next.

Select the Table, View, or Enter SQL.

Note

title	Notes for database data details

The drop-down box for tables and views has a hard limit of 1,000 entries.

The schema selection option is available to filter table names for the following database types:

MSSsql
Oracle
Postgre
PostgreSQL82/Greenplum
Netezza

Info

title	INFO

For Azure Cosmos DB imports:

Disable the 'Character Encoding'.

Image Added

View a sample of the data set to confirm this is the data source you want to use. Mark the checkboxes to select which fields to import into Datameer. You can also specify the format for date fields. Click the help link question mark to see a complete list of supported formats. You can specify the data type using the list box as shown.

To enable parallel loading of the table, Datameer uses the chosen column to segment rows into unique subsets. Good options to split the column include primary keys, auto-increment columns, or unique indexed columns. The column type should be a number or date.

The import uses a single select statement if a split column isn't defined, even if the limit of mappers is configured to a higher number.

You can see the difference in behavior within the job log.

Code Block

language	bash
title	Single split

INFO ... (JdbcSplitter.java:70) - number of desired splits: 4
INFO ... (JdbcConnector.java:150) - connected to '<connection_string>' with schema set to 'null'
WARN ... (DataDrivenSplitStrategy.java:138) - creating single split because splitColumn is set to '$NO_APPROPRIATE_ORDER_COLUMN$'
INFO ... (JdbcSplitter.java:104) - 1 JdbcSplits:
INFO ... (JdbcSplitter.java:106) - SELECT {"id", bytes_processed", ...} FROM ...

After defining an appropriate split column for the import job, it is processed in parallel.

Code Block

language	bash
title	Multiple splits

INFO ... (JdbcSplitter.java:70) - number of desired splits: 4
INFO ... (JdbcConnector.java:150) - connected to '<connection_string>' with schema set to 'null'
INFO ... (JdbcConnector.java:356) - SELECT (SELECT MIN("id") FROM "dap_file"."id") AS MIN_VALUE, (SELECT MAX("id") FROM "dap_file"."id") AS MAX_VALUE FROM DUAL
INFO ... (JdbcSplitter.java:104) - 4 JdbcSplits:
INFO ... (JdbcSplitter.java:106) - SELECT {"id", bytes_processed", ...} FROM "dap_file" WHERE {...} ... 
INFO ... (JdbcSplitter.java:106) - SELECT {"id", bytes_processed", ...} FROM "dap_file" WHERE {...} ... 
INFO ... (JdbcSplitter.java:106) - SELECT {"id", bytes_processed", ...} FROM "dap_file" WHERE {...} ... 
INFO ... (JdbcSplitter.java:106) - SELECT {"id", bytes_processed", ...} FROM "dap_file" WHERE {...} ...

Define the /wiki/spaces/DASSB70/pages/33036121740, specify whether to replace or append data, and click Next.

With a database import job, under the heading Data Retention Policy, if you select to append (with or without the time window) you have the option to select to enable an incremental mode.
Incremental mode only imports rows that contain values in the split column greater than the maximum value from the previous import run.
Add a description and click the checkbox to start the import immediately if desired Click Save when finished.
You can also specify notification emails to be sent for error messages received and when a job has successfully run.

Give the new import job a name and then click Save.
The import job data is now accessible from the browser.

...

Version	Old Version 6	New Version 7
Changes made by	Juliane Wetzel	Juliane Wetzel
Saved on	Aug 14, 2020	Aug 14, 2020

Versions Compared

Key