When importing data into Datameer X it is important to know which type of data you are importing or need to import. When using your data within workbooks Workbooks, certain Datameer X functions only work with certain types of data. Congruently, certain functions return only a specific type of data.Also there are data requirements when using infographic widgets to visualize your data.
Table of Contents |
---|
Data Field Types
...
Big integers are like integers, but they are not limited to 64 bits. They are represented using arbitrary-precision arithmetic. Big integers represent only whole numbers. Big integers in Datameer X are treated differently than in Hive because Datameer X allows a larger range of values, so they are written as strings into a Hive table if you export. By default, the precision for big integers is set at 32 upon import. This can updated if needed in the default properties by the default properties by changing the value of das.big-decimal.precision=.
...
In Datameer X data in the DATE primitive data type is always represented in a Gregorian, month-day-year (MDY) format (e.g., "Sep 16, 2010 02:56:39 PM"). Datameer X detects if your data should be parsed into the DATE data type during ingest. This can also be done after ingest as other data types can be converted to the DATE primitive data type using workbook functions.
Anchor | ||||
---|---|---|---|---|
|
When using information other than numbers or dates in Datameer X it is represented as a string. This includes text, unparsed date patterns, URLs, JSON arrays, etc.
Anchor | ||||
---|---|---|---|---|
|
...
Anchor | ||||
---|---|---|---|---|
|
In Datameer X integers, big integers, floats, and big decimals are considered to be numbers.
...
Some visualizations and functions are able to use data represented by any data field type. These can be either a number, a string, a date, or a Boolean.
Data Mapping in Avro
Import mapping
...
Datameer X field type | Parquet field type |
---|---|
INTEGER | INT64 |
DATE | BINARY |
BIG_INTEGER | BINARY |
FLOAT | DOUBLE |
BIG_DECIMAL | BINARY |
STRING | BINARY UTF8 |
BOOLEAN | BOOLEAN |
LIST (integer) | INT64 |
LIST (float) | DOUBLE |
LIST (string) | BINARY |
LIST (Boolean) | BOOLEAN |
...
Parquet files using the INT96 format are interpreted as time stamps. Datameer X accepts these columns, but cuts off the nanoseconds. If the workbook has Ignore Errors enabled, then those error messages are stored in a separate column and the column with the error is NULL. The chart below provides additional Parquet storage mapping details.
...
Datameer X Field Type | Hive Field Type |
---|---|
STRING | STRING |
FLOAT | DOUBLE |
INTEGER | BIGINT |
DATE | BIGINT |
BIG_INTEGER | DECIMAL (38,0) |
BIG_DECIMAL | BINARY only (Datameer uses per-record scale & precision for BIG_DECIMAL with copied precision and scalerecords and has a custom parquet byte encoding for storing such values. This encoding is not compatible with Hive and its fixed per-column DECIMAL scale & precision.) |
LIST[A] | ARRAY [A] (INFO: We support recursive types) |
BOOLEAN | BOOLEAN |
...