Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

When importing data into Datameer X it is important to know which type of data you are importing or need to import. When using your data within workbooks, certain Datameer X functions only work with certain types of data. Congruently, certain functions return only a specific type of data.

Data Field Types

Field type

Product icon

Description

Internal representation

INTEGER

64-Bit integer value

Java Long

BIG_INTEGER

Unlimited integer value

Java BigInteger

FLOAT

64-Bit float value

Java Double

BIG_DECIMAL

High-precision float value

Java BigDecimal

DATE

Date object

Java Date

STRING

String object

Java String

BOOLEAN

Boolean object

Java Boolean

LIST
a collection of multiple values of one data type

NUMBER

float, big decimal, integer, or big integer


ANY

float, big decimal, integer, big integer, date, string, list, or Boolean


Integer

In mathematics integers (aka whole numbers) are made up of the set of natural numbers including zero (0,1,2,3, ...) along with the negatives of natural numbers (-1,-2,-3, ...). When talking about Integers in computer programming, it is necessary to define a minimum and maximum value. Datameer X uses a 64-bit integer which allows the user to represent whole numbers between -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

Big integer

Big integers are like integers, but they are not limited to 64 bits. They are represented using arbitrary-precision arithmetic. Big integers represent only whole numbers. Big integers in Datameer X are treated differently than in Hive because Datameer X allows a larger range of values, so they are written as strings into a Hive table if you export.  By default, the precision for big integers is set at 32 upon import. This can updated if needed in the default properties by changing the value of das.big-decimal.precision=. 

Float

In mathematics, there are real numbers that represent fractions (1/2, 12/68) or numbers with decimal places (12.75, -18.35). Datameer X uses double precision floating-point representation (aka float) to manipulate and represent real numbers. The complete range of numbers that can be represented this way is approximately 2 -1022 through (1+(1-2 -52))x2 1023. During import/upload, Datameer X automatically recognizes a number with either a single period (.) or single comma (,) as a decimal separator and defines this data as a float data type. After ingestion, Datameer X stores float and big decimal values using a period (.) character. The auto schema detection for the float data type works with CSV, JSON, XML, Key/value files. 

Big decimal

Big decimals are similar to float values. The main advantage of this data field type is that they are exact to the number of decimal places for which they are configured, float values might be inaccurate in certain cases. If a number has more decimal places than big decimal was configured for, then the number is rounded. The number of decimal places can be configured in conf/default.properties:

# Maximum precision used for BIG_DECIMAL types. Precision is equal to the maximum number of digits a BigDecimal 
# can have.
system.property.das.big-decimal.precision=32

32 digits is the default precision used by Datameer X for big decimal values upon import.

Date

In Datameer X data in the DATE primitive data type is always represented in a Gregorian, month-day-year (MDY) format (e.g., "Sep 16, 2010 02:56:39 PM"). Datameer X detects if your data should be parsed into the DATE data type during ingest. This can also be done after ingest as other data types can be converted to the DATE primitive data type using workbook functions.

String

When using information other than numbers or dates in Datameer X it is represented as a string. This includes text, unparsed date patterns, URLs, JSON arrays, etc.

Boolean

Boolean data in computing has two values, either true or false. It is used in many logical expressions and is derived from Boolean algebra created by George Boole in the 19th century.

List

In Datameer X multiple values can be combined into a list. Lists are a series of values of a single data type, which starts counting from zero (0).

Number

In Datameer X integers, big integers, floats, and big decimals are considered to be numbers.

Any

Some visualizations and functions are able to use data represented by any data field type. These can be either a Data Field Types in Datameer#number, a Data Field Types in Datameer#string, a Data Field Types in Datameer#date, or a Data Field Types in Datameer#Boolean.

Data Mapping in Avro

Import mapping

When importing data to Datameer X data types are mapped as follows:

Avro Schema TypeDatameer X Value Type
NullSTRING
BooleanBOOLEAN
IntINTEGER
LongINTEGER
FloatFLOAT
DoubleFLOAT
BytesSTRING
Bytes with logical type DecimalBIG_DECIMAL
StringSTRING
RecordsSTRING
EnumsSTRING
ArraysSTRING
MapsSTRING
UnionsSTRING
FixedSTRING
Fixed with logical type DecimalBIG_DECIMAL

Export mapping

When exporting data to Avro, data types are mapped as follows:

INFO

If a column marked as accept empty a union scheme type is created, e.g. union( null, string) for nullable string type.

Datameer X Value TypeAvro Schema Type
STRINGString
BOOLEANBoolean
INTEGERLong
FLOATDouble
DATE without patternLong
DATE with patternString
BIG_INTEGERString
BIG_DECIMALString
LIST<listtype>Arrays<converted list type>

Data Mapping in Parquet

Export data mapping

When exporting data to Parquet, data types are mapped as follows:

Datameer X field type

Parquet field type

INTEGERINT64
DATEBINARY
BIG_INTEGERBINARY
FLOATDOUBLE
BIG_DECIMALBINARY
STRINGBINARY UTF8
BOOLEANBOOLEAN
LIST (integer)INT64

LIST (float)

DOUBLE
LIST (string)BINARY
LIST (Boolean)BOOLEAN

Import data mapping

When importing data from Parquet, data types are mapped as follows:

Parquet Field TypeDatameer X Field Type
BOOLEANBOOLEAN
INT32INTEGER
INT32 DECIMALBIG_DECIMAL
INT64INTEGER
INT64 DECIMALBIG_DECIMAL
INT96DATE
FLOATFLOAT
DOUBLEFLOAT
BINARYSTRING
BINARY DECIMALBIG_DECIMAL
FIXED_LEN_BYTE_ARRAYSTRING
FIXED_LEN_BYTE_ARRAY DECIMALBIG_DECIMAL


Parquet files using the INT96 format are interpreted as time stamps. Datameer X accepts these columns, but cuts off the nanoseconds. If the workbook has Ignore Errors enabled, then those error messages are stored in a separate column and the column with the error is NULL. The chart below provides additional Parquet storage mapping details.

Datameer X Type

Parquet Type

Description

DATETIMESTAMP_MILLISStored as INT64
INTEGERINT_64
FLOATDOUBLE
STRINGUTF8BINARY format with UTF-8 encoding
BIGDECIMALDECIMALBINARY format
BIGINTEGERDECIMALBINARY format with precision of 1 and scale of 0
LISTRepeated elements of group of the list element type.

Optional group of a repeating group of optional element types. Nested lists are supported.

Data Mapping in External Systems

Export Data Mapping 

When export data to External Systems, data types are mapped as follows:

Datameer X Field TypeHive Field Type
STRINGSTRING
FLOATDOUBLE
INTEGERBIGINT
DATE BIGINT
BIG_INTEGERDECIMAL (38,0)
BIG_DECIMAL

BINARY only

(Datameer uses per-record scale & precision for BIG_DECIMAL records and has a custom parquet byte encoding for storing such values. This encoding is not compatible with Hive and its fixed per-column DECIMAL scale & precision.)

LIST[A]ARRAY [A] (INFO: We support recursive types)
BOOLEANBOOLEAN
  • No labels