Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In Datameer, you can create a workbook to get to new insights with your data. Inside the workbook, you can add additional data sources, change the column and sheet names, collapse columns, apply formulas, view workbook details, and more.

...

If the worksheet's calculations are based on the full data, the notification is green. It tells you how many sample rows are being displayed in the worksheet out of how many total records from the data source. If all the records of the source are being used, the status bar displays the total record count.


Opening Existing Workbooks

To open an existing workbook:

...

If desired, you can use the Split tab in the Inspector to enter new split criteria. Click Update Sheet to apply changes.

...

Encoding Columns

Info
titleINFO

Column encoding performs ordinal, one-hot or binned

...

encoding on column data

...

which assigns a unique numeric value to each categorical or

...

continuous value. Once applied, column encoding can be

...

updated as needed until the desired results are achieved

...

.

Columns with a high cardinality are not suited for ordinal/ binned encoding.


Tip

Column encoding provides a consistent view of prepared data

...

which is especially helpful for teams working together on model building and testing activities.

...

Ordinal Encoding

For ordinal encoding:

  1. Right-click on the column header of the column you want to encode and select Encode Column from the context menu, use the menu option Edit > Encode Column, or click on the Encode Column icon in the toolbar.  
    Image Removed 
  2. In the Encode Columns dialog, the Input column will be selected automatically. 
  3. Select the Type of encoding, either Ordinal Encoding or 1-Hot Encoding. For dates and integers, Binned Encoding is also an option, where values are grouped and elements in a group are encoded the same way. 
    Image Removed
  4. Unknown Values controls how values beyond the 100 most frequent values are processed. Drop value ignores a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  5. If needed, change the column by entering the required column name in 'Column'. 
    Image Added 
  6. Select the encoding type "Ordinal Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  7. Decide how to deal with unknown values by clicking the required statement. 
    INFO: 'Drop Value' ignores values beyond the first 100 most frequentInclude as last column adds a new element to the list that encodes together all values beyond the 100 most frequent. 
  8. The bottom of the dialog displays the top 15 values, and also provides a blank field where you can add new values. You can click on value in the list to change its order or remove it, or type in the blank field to add a new value.
    Image Removed 
  9. Output Encoding determines how many columns the data will be divided into. For 1-hot encoding, you have the option to output As Columns, which spilts each binary pair into its own column, or As List, which . 'Default Value' shows values, which can not be encoded. 
    Image Added 
  10. View the top 32 values (by count).
    Image Added 
  11. If needed, add a new value in the blank field, change the order of the top values or delete single values.
    Image Added 
  12. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. Ordinal Encoding is finished. 
    Image Added 

One-hot Encoding

For one-hot encoding:

  1. Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  2. If needed, change the column by entering the required column name in 'Column'.
    Image Added 
  3. Select the encoding type "1-Hot Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  4. Decide how to deal with unknown values by clicking the required statement. 
    INFO: 'Drop Value' ignores values beyond the first 100 most frequent. 'Include at last column' adds a new element to the list that encodes together all values beyond the 100 most frequent. 
    Image Added
  5. Select the output format from the dropdown 'Output'. 
    INFO: 'As List' keeps all binary pairs together in a single column. 'As Column' creates binary pairs each in their own column.
    Image Added
  6. View the top 32 values (by count).
    Image Added 
  7. If needed, add a new value in the blank field, change the order of the top values or delete single values.
    Image Added 
  8. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. One-hot encoding is finished.
    Image Added 

Anchor
DM_WB_Encoding_Binned
DM_WB_Encoding_Binned
Binned Encoding

For binned encoding:

  1. Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  2. If needed, change the column by entering the required column name in 'Column'.
    Image Added 
  3. Select the encoding type "Binned Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  4. Select the output format from the dropdown 'Output'.
    INFO: 'As List' keeps all binary pairs together in a single column. Similarly, for binned encoding choose As Columns to create binary pairs each in their own column, or Ordinal to encode 'Ordinal' encodes as ordinal numbers in a single column.
    Image Added 
  5. View the default value distributions.
  6. For binned encoding, Bin Dividers determine the number of bins, and the highest value for each bin divider determines the value distribution in each percentile. You can use Add new Divider to add a new bin. You can change the highest value in a given bin by selecting a different value. For dates, set the Date Bins by option to day, hour, month, quarter or year. 
    In the example shown below, years between 1999 and 2006 will be grouped and encoded together, 2007 to 2011 in the next, 2012 to 2018 in the next, and 2019 to 2021 in the last.       
    Image Removed 
  7. Click Encode. The new encoded columns are added in an EncodingSheet.

...

  1.  
    INFO: The graph changes according to the amount of dividers. 
    Image Added 
  2. Enter the required bin dividers to change the percentile size of the divider. 
    INFO: There are 3 dividers set as default, e.g. 'Divider 1' contains the 25% of the selected column values. 
    INFO: To delete a divider, click on "x" next to the required divider.
    Image Added 
  3. If needed, click on "Add new Divider" to add a new divider. 
    INFO: Clinking the button adds an additional bucket, recalculates the percentile size and the corresponding absolute values. 
    Image Added 
  4. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. Binned encoding is finished. 
    Image Added

Anchor
format_column
format_column
Formatting columns

...

You can create formulas on worksheets that are editable. When you click the data area of a column that has a formula associated with it, the formula displays above the workbook in the Fx bar. You can also create a new sheet and create formulas that reference fields from other sheets. Click the  Fx  button on the formula line to display the formula builder. (As of Datameer 7.2the formula builder is located in the worksheet inspector.)

To create a formula using the Formula Builder:

...

Anchor
book_prod_mode
book_prod_mode
Workbook Production Mode

Production mode can be applied to a workbook to accelerate performance by skipping the calculation of metrics used for columnmetric, flip sheet displays and generation of data samples. 

To launch production mode for a workbook:

...