In Datameer, you can create a workbook to get to new insights with your data. Inside the workbook, you can add additional data sources, change the column and sheet names, collapse columns, apply formulas, view workbook details, and more.
...
If the worksheet's calculations are based on the full data, the notification is green. It tells you how many sample rows are being displayed in the worksheet out of how many total records from the data source. If all the records of the source are being used, the status bar displays the total record count.
Opening Existing Workbooks
To open an existing workbook:
...
If desired, you can use the Split tab in the Inspector to enter new split criteria. Click Update Sheet to apply changes.
...
Encoding Columns
Info | ||
---|---|---|
| ||
Column encoding performs ordinal, one-hot or binned |
...
encoding on column data |
...
which assigns a unique numeric value to each categorical or |
...
continuous value. Once applied, column encoding can be |
...
updated as needed until the desired results are achieved |
...
. Columns with a high cardinality are not suited for ordinal/ binned encoding. |
Tip |
---|
Column encoding provides a consistent view of prepared data |
...
which is especially helpful for teams working together on model building and testing activities. |
...
Ordinal Encoding
For ordinal encoding:
- Right-click on the column header of the column you want to encode and select Encode Column from the context menu, use the menu option Edit > Encode Column, or click on the Encode Column icon in the toolbar.
- In the Encode Columns dialog, the Input column will be selected automatically.
- Select the Type of encoding, either Ordinal Encoding or 1-Hot Encoding. For dates and integers, Binned Encoding is also an option, where values are grouped and elements in a group are encoded the same way.
- Unknown Values controls how values beyond the 100 most frequent values are processed. Drop value ignores a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
or - If needed, change the column by entering the required column name in 'Column'.
- Select the encoding type "Ordinal Encoding" from the drop-down. Further selection options adapt to the needs.
- Decide how to deal with unknown values by clicking the required statement.
INFO: 'Drop Value' ignores values beyond the first 100 most frequent. Include as last column adds a new element to the list that encodes together all values beyond the 100 most frequent. - The bottom of the dialog displays the top 15 values, and also provides a blank field where you can add new values. You can click on value in the list to change its order or remove it, or type in the blank field to add a new value.
- Output Encoding determines how many columns the data will be divided into. For 1-hot encoding, you have the option to output As Columns, which spilts each binary pair into its own column, or As List, which . 'Default Value' shows values, which can not be encoded.
- View the top 32 values (by count).
- If needed, add a new value in the blank field, change the order of the top values or delete single values.
- Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. Ordinal Encoding is finished.
One-hot Encoding
For one-hot encoding:
- Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
or - If needed, change the column by entering the required column name in 'Column'.
- Select the encoding type "1-Hot Encoding" from the drop-down. Further selection options adapt to the needs.
- Decide how to deal with unknown values by clicking the required statement.
INFO: 'Drop Value' ignores values beyond the first 100 most frequent. 'Include at last column' adds a new element to the list that encodes together all values beyond the 100 most frequent. - Select the output format from the dropdown 'Output'.
INFO: 'As List' keeps all binary pairs together in a single column. 'As Column' creates binary pairs each in their own column. - View the top 32 values (by count).
- If needed, add a new value in the blank field, change the order of the top values or delete single values.
- Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. One-hot encoding is finished.
Anchor | ||||
---|---|---|---|---|
|
For binned encoding:
- Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
or - If needed, change the column by entering the required column name in 'Column'.
- Select the encoding type "Binned Encoding" from the drop-down. Further selection options adapt to the needs.
- Select the output format from the dropdown 'Output'.
INFO: 'As List' keeps all binary pairs together in a single column. Similarly, for binned encoding choose As Columns to create binary pairs each in their own column, or Ordinal to encode 'Ordinal' encodes as ordinal numbers in a single column.
- View the default value distributions.
- For binned encoding, Bin Dividers determine the number of bins, and the highest value for each bin divider determines the value distribution in each percentile. You can use Add new Divider to add a new bin. You can change the highest value in a given bin by selecting a different value. For dates, set the Date Bins by option to day, hour, month, quarter or year.
In the example shown below, years between 1999 and 2006 will be grouped and encoded together, 2007 to 2011 in the next, 2012 to 2018 in the next, and 2019 to 2021 in the last.
- Click Encode. The new encoded columns are added in an EncodingSheet.
...
-
INFO: The graph changes according to the amount of dividers.
- Enter the required bin dividers to change the percentile size of the divider.
INFO: There are 3 dividers set as default, e.g. 'Divider 1' contains the 25% of the selected column values.
INFO: To delete a divider, click on "x" next to the required divider.
- If needed, click on "Add new Divider" to add a new divider.
INFO: Clinking the button adds an additional bucket, recalculates the percentile size and the corresponding absolute values.
- Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. Binned encoding is finished.
Anchorformat_column format_column
Formatting columns
format_column | |
format_column |
...
You can create formulas on worksheets that are editable. When you click the data area of a column that has a formula associated with it, the formula displays above the workbook in the Fx bar. You can also create a new sheet and create formulas that reference fields from other sheets. Click the Fx button on the formula line to display the formula builder. (As of Datameer 7.2, the formula builder is located in the worksheet inspector.)
To create a formula using the Formula Builder:
...
Anchor | ||||
---|---|---|---|---|
|
Production mode can be applied to a workbook to accelerate performance by skipping the calculation of metrics used for columnmetric, flip sheet displays and generation of data samples.
To launch production mode for a workbook:
...