Importing Data
When you import data, you import it into a connection, which is a collection of data from various types of files and databases (see Configuring a Connection), and then create a workbook that uses data from multiple sources (for example, a user list from a database and an Apache error log file). See Types of Data Supported for information about the types of databases and files that you can import into Datameer.
Once the connection is set up, you create import jobs to import the data you want to use. You can also edit, rename, create a copy, run, view the full data, view the details and information, or delete an existing import job.
See Define File Path Range to learn how to use a date range to limit which files get imported.
Obfuscation is available with Datameer's Advanced Governance module.
- 1 Creating an Import Job
- 1.1 Type Conversions
- 1.2 Raw Records
- 2 How to View and Edit the Job Settings
- 3 Viewing Dropped Records
- 4 Scheduling Job Runtime
- 5 Edit an Import Job
- 6 Create a Copy of an Import Job
- 7 Run an Import Job
- 8 Delete an Import Job
- 9 Linking Data to a New Workbook
- 10 Viewing Import Job Upload Size and Monthly Upload Sizes
- 11 Identify Workbooks Affected by an Import Job Schema Change
Creating an Import Job
To create an import job:
Click the + (plus) button and select Import Job or right-click in the browser and select Create new > Import job.
Click Select Connections, choose the connection and click Select, then select the file type and click Next. Click New Connection to add a new connection if needed.
Specify the file and folder location and click Next. You can use wildcard characters.
The fields on the Data Details page depend on the type of the file, however, there are several fields in common. In the Encryption section, enter all columns to obfuscated with a space between the names. Note that when a column is obfuscated, that data is never pulled into Datameer.
See the following sections for additional details about importing each of the file types:Apache log: specify the file or folder and the log format. See the samples provided in the dialog box for details.
CSV/TSV files:
Specify the delimiter such as"\t" for tab, comma ",", or semicolon ";".
Specify if column headers should be made from the first non-ignored row.
Specify the escape character to "escape" processing that character and just show it in Advanced Settings.
Specify the quote character in advanced settings. If Enable strict quoting is selected, characters outside the quotes are ignored.
Fixed width: specify the file or folder and specify if any of the first lines in the data should be skipped and then if column headers should be made from the first non ignored row.
JSON: specify the file or folder and other parameters about what to parse within the JSON structure.
Mbox: specify the file or folder. This is a format used for collections of electronic mail messages.
Parquet: columnar storage format available in Hadoop.
Regex Parsable Text files: specify the file or folder, a Regex pattern for processing the data (see note below), and specify if any of the first lines in the data should be skipped and then if column headers should be made from the first non ignored row.
Twitter data: specify the file or folder.
XML data: specify the file or folder, the root element, container element, and XPath expressions for the fields you would
View a sample of the data set to confirm this is the data source you want to use. Use the checkboxes to select which fields to import into Datameer. The accept empty checkbox allows you to specify if NULL and empty values are used or dropped upon import. Verify or select a data type for each column from the drop down menu. You can specify the format for date type fields. Click the question mark icon in the data format field to see a complete list of supported date pattern formats.
The Raw Records section shows how your data is viewed by Datameer before the import.
The Empty value placeholders section is a feature giving you the ability to assign specific values as NULL. Values added here aren't imported into Datameer.
The How to handle invalid data? section lets you decide how to proceed if part of a record doesn't fit with the defined schema during import.
Selecting the option to drop the record removes the entire record from the import job. The option to abort the job stops the import job when an invalid record is detected.
You can partition your data using date parameters. When this data is loaded into a workbook, you can choose to run your calculations on all or on just a part of your data. Also if you decide to export data, you can choose to export all or just a part of your data. Learn more about time based partitions.
Click Next.Define the schedule details.
In the Loading section, select Manually to rerun the import job in order to update or On a schedule to run the import job update at a specified time.
In the Data Retention Policy section choose whether to replace new updated data or to append it to existing data when updating an import job. You also have the option to choose Append with sliding time window to define a range during which the update expires and how many results to keep.Add a description, name the file, click the checkbox to start the import immediately if desired, and click Save. You can also specify notification emails to be sent for any error messages and when a job has completed successfully. Use a comma to separate multiple email addresses. The maximum character count in these fields is 255.
Type Conversions
Integer columns can be imported as date by interpreting the integer value as UNIX timestamp or epoch timestamp.
Date columns can be converted as integer, the converted columns are shown as an epoch timestamp.
Strings can be converted to Boolean, where "false", "no", "f", "n" and "0" are converted to false and "true", "yes", "t", "y" and "1" are converted to true.
Raw Records
Click Raw Records to view an expanded sample of the raw data not in tabular format. Click Raw Records again to hide the raw data.
How to View and Edit the Job Settings
Some of these settings can also be accessed through the Save Workbook settings. The Save Workbook settings let you specify when jobs are run, how error handling should be done and specify who gets notified, and lets you specify what data is saved with the workbook and how much historical data (if any) is saved. See Configuring Workbook Settings to learn details about each of the settings.
To view and edit the job settings through the Import Data view:
Click the File Browser tab.
Click on Import Jobs in the navigation window on the left side of the screen.
Highlight the name of the data source you want to view.
Click Configure. (As of v6.3, click Open.)
Click Next to view each type of job setting. You can also make changes.
The Schedule screen has settings that can also be set through Save Workbook settings.
Specify whether to replace or append data and whether to append using a sliding time window. You can then specify when the data should expire and how many results you should keep.
Select which groups have view, edit, and run access permissions and specify what access permissions all users have.
Click Save to save your changes. You can also click Rename to rename the import.