Working with Workbooks
In Datameer, you can create a workbook to get to new insights with your data. Inside the workbook, you can add additional data sources, change the column and sheet names, collapse columns, apply formulas, view workbook details, and more.
You can also copy workbooks, import worksheets from other workbooks, duplicate workbooks and worksheets, and exchange datasources.
- 1 Creating a Workbook
- 2 Adding Additional Data Sources
- 3 Exchanging Data Sources
- 4 Working with Sheets and Columns
- 4.1 Viewing sheets within a workbook
- 4.2 Distinguishing different sheet types
- 4.3 Renaming sheets
- 4.4 Deleting sheets in a workbook
- 4.5 Duplicating sheets in a workbook
- 4.6 Reordering sheets in a workbook
- 4.7 Reordering columns in a workbook
- 4.8 Displaying sheet dependency (graph overlay)
- 4.9 Importing sheets
- 4.10 Column names
- 4.11 Resizing columns
- 4.12 Hiding and expanding columns
- 4.13 Inserting columns
- 4.14 Removing columns
- 4.15 Cell context menu options
- 4.16 Splitting columns
- 5 Encoding Columns
- 6 Deduplication
- 7 Revert Actions
- 8 Using Formulas
- 9 Using Variables Within Workbooks
- 10 Workbook Production Mode
- 11 Completing Workbook Level Tasks
- 12 Workbook Sharing Permissions and Security Settings
Creating a Workbook
To create a workbook:
Choose Workbooks from the + (plus) button or right-click in the navigation bar on the left side of the screen and select Create New > Workbook.
From the File menu, select Add Data.
Navigate to the folder where the data sources are located and select the name of the data source you want to use.
Click Add Data.
When using data from a sheet in another workbook or an import job, the data from that source sheet is copied to the new sheet if the new sheet is marked as kept. In previous versions, the source sheet was referenced instead, which led to workbook data not being deleted by housekeeping.
The workbook is populated with a sample of data from that data source. To use data in Datameer, that data must be part of a data source. If you have a source of data you want to use for analysis, such as an Excel spreadsheet, you need to create a data source using that data, or have a system administrator create one for you.
Sample data and full data
Data displayed in Datameer workbooks is a sampling of the full data from the source. When creating or editing an import job, file upload, or data link, you set the number of sample records to populate a workbook from the specific data source being used.
A data source added to a workbook displays the number of sample records configured for that source. When creating calculations in the workbook, those calculations are being applied only to the sample data until the workbook has run. After running a workbook, the calculations made in the workbook are applied to the full data and can be viewed or downloaded by right-clicking on the workbook and selecting Browse All Data.
The next time the workbook is opened after running, while the calculations have been applied to the full data, you still view the sample size configured from the data source.
Status notifications have been added to the bottom of workbook worksheets to show if the worksheet is displaying information based on sample or full data and if the worksheet is filtered.
Sample data messages
If the worksheet's calculations are based only on sample data, the notification is orange. It tells you how many sample rows are being displayed in the worksheet and that you can run the workbook to display the sheet calculations based on the full data from the data source.
If the worksheet is displaying no information, this could be that none of the original sample data fits the criteria of the calculations. Running the workbook takes into account all current worksheet calculations and might bring in new sample data to populate the worksheet if data from the data source meet the calculation criteria.
Full data messages
If the worksheet's calculations are based on the full data, the notification is green. It tells you how many sample rows are being displayed in the worksheet out of how many total records from the data source. If all the records of the source are being used, the status bar displays the total record count.
Opening Existing Workbooks
To open an existing workbook:
On the File Browser tab, find the workbook you want to open.
Highlight the workbook by clicking on it.
Double-click to open or right-click on the highlighted workbook and select Open.
Opening a workbook using a URL
A workbook can be opened by a user with permissions to view the workbook by entering a URL into the browser.
There are two valid URLs that can open a workbook, using the ID or the path.
Workbook ID: http(s)://<server>:<port>/workbook/<wbkID>
Example: https://localhost:8080/workbook/25
Workbook path: http(s)://<server>:<port>/workbook?path=<path/to/workbook.wbk>
Example: https://localhost:8080/workbook?path=Users/Matthew/Seasonal_Earnings
Open a workbook to a specific worksheet: http(s)://<server>:<port>/workbook?path=<path/to/workbook.wbk>&sheet=<sheetName>
Example: https://localhost:8080/workbook?path=/Users/Matthew /Seasonal_Earnings.wbk&sheet=products
Opening a prior version of a workbook
Datameer gives you the option of saving previous versions of a workbook when it is run more than once. Learn more about optimizing your workbooks and data retention.
To open a prior version of a workbook:
On the File Browser tab, find the workbook you want to open.
Highlight the workbook by clicking on it.
Right-click on the highlighted workbook and select Show Details.
Find the previously run workbook iteration in the Current Data section. Click the Show Data icon.
From the Full Results page, you can open the workbook or download the data.
For convenience, other previous ran workbook iterations have a link.
Adding Additional Data Sources
To add additional data sources to an existing workbook:
From the File menu, choose Add Data or click the Add Data icon on the toolbar.
Navigate to the folder where the data sources are located and select the name of the data source you want to use.
Click Add Data.
Repeat the previous steps for each data source you want to use.
For each data source you select, a tab is added to the workbook, containing a sample of data from that data source. You can add multiple data sources at the same time by holding the SHIFT button.
Exchanging Data Sources
You can exchange data sources to run a workbook on a different set of similar data, which is faster than recreating the workbook.
For example, you can run a workbook on data for each month by exchanging the data source to point to the data for the next month. Each datasource must have the same structure. Further analysis applies only to the new data after the exchange. This option is only available when the current data sheet uses a data source imported by adding data. The option isn't available for sheets created using filters, joins, sorts, external data sheets, or formulas. You must also have write permissions for this workbook to use this option.
To exchange data sources in an existing workbook:
From the File menu, choose Exchange Data or click the Exchange Data icon on the toolbar.
Navigate to the folder where the datasources are located and select the name of the data source you want to use.
Click Exchange Data.
Errors might occur when exchanging data if column names have changed. On a joined sheet, original references to columns remain even if that column name doesn't exist in the new exchanged data. If this happens, replace the joined sheet reference with the updated column name from the exchanged data.
Working with Sheets and Columns
You can set up sheet names, resize columns, collapse and show columns, rename worksheets, import worksheets from other workbooks, and more.
Viewing sheets within a workbook
There are multiple ways to navigate between sheets within a workbook:
Click the sheet tab you want to view a the bottom of the page.
Select the Sheets menu and click the sheet you want to view.
Click on the Sheet Dependencies button in the toolbar and switch between sheets by clicking on sheets that are dependent with each other.
Distinguishing different sheet types
There are eight different worksheet types in Datameer. Only the formula (Fx) worksheet type analyzes data using the formula bar or with the function wizard.
Renaming sheets
To change the name of a sheet:
Right-click the sheet name and click Rename.
Enter the new name and press Enter.
To undo the change, right-click the name again and click Undo.
Deleting sheets in a workbook
To delete a sheet:
Click the tab in the workbook that you want to delete.
Right-click the tab and choose Delete.
Click Delete.
Duplicating sheets in a workbook
You can duplicate workbook sheets. New sheets are named with the next available number appended to the sheet name.
To duplicate sheets in a workbook:
Click the tab in the workbook.
From the Workbook menu Sheets, select Duplicate Sheet or right-click the sheet tab and select Duplicate.
In the Duplicate Worksheet, select the columns you want to copy to the new sheet and click Create Sheet Copy.
If a sheet has actions applied to it, you have the option to copy both the logic and the data, or you can select the Copy data onlycheckbox to copy just the data.
As of Datameer v6.3
In addition to the raw data of a sheet with formulas, joins, sorts, or clustering, you can also copy the underlying logic and actions of a sheet into a new sheet.
Reordering sheets in a workbook
To reorder sheets within a workbook:
Right-click the sheet tab.
Select Move.
Click the space between the current sheet tabs where you want the selected sheet to be placed.
Reordering columns in a workbook
To reorder columns in a sheet:
Left-click and drag the column name to the desired position.
Displaying sheet dependency (graph overlay)
The graph overlay feature provides a graphic display of the currently selected sheet and any ingoing and outgoing sheets.
View the sheet dependency overlay by clicking the Sheet Dependencies button in the toolbar. The currently selected sheet will be highlighted in the middle of the graph.
To navigate between dependent sheets, click the sheet name located on the graphic overlay.
To close the sheet dependencies overlay, click the toolbar button or click X on the current sheet.
Importing sheets
Importing sheets allows you to chain workbooks together.
To import sheets from one workbook into another workbook:
Open the workbook you want to import the worksheet into.
From the Workbook menu, select Add Data, or click the Add Data icon on the toolbar.
First, select the workbook you want to use and click Add Data.
Then select the sheets you want to import and click Add Data.
The new worksheet tabs are added to the right of the existing worksheets.
Column names
When a user creates new columns using the Formula Builder, Datameer creates default column names based on the column name in the formula, and the formula type.
Column names in workbook sheets that are editable can be renamed. The status line just above the list of tabs indicates whether a particular sheet is editable.
Column names can only contain capital or lower-case characters, numbers, and underscores. A column name cannot begin with a number. Note that column names are case sensitive! For example, column names foo, FOO, FoO and fOO will be unique columns within the same worksheet.
As of Datameer 7.5
All column names, in all workbook sheet types, are case sensitive. This is a behavior change with version 7.5.
As of Datameer 7.2
Column names have a 255 character limit.
To edit a column name:
Right-click on the column name and click Rename.
Enter the new name and press Enter.
To undo the change, right-click the name again and click Undo.
Resizing columns
To resize columns:
Place the mouse between two column headings so the icon changes to a double-ended arrow.
Drag the column marker to the desired width.
Click to lock the width.
You can also double-click a column name to fully expand the column.
Hiding and expanding columns
To hide (collapse) a column, right-click the column name and select Hide Column.
Double-click the hidden column to bring it back into view.
A toggle columns toolbar button displays all columns of the current worksheet. You can use this menu to show or hide columns as a batch process.
Inserting columns
To insert a new column:
Right-click a column name.
Select Insert Column. A new column is added to the left of the chosen column.
Removing columns
To remove a column:
Right-click a column name.
Select Remove Column.
Cell context menu options
To access the cell context menu, right-click on a cell within a workbook.
The cell context menu includes three options:
Copy - Copy the cell value to the clipboard.
Filter by - A shortcut to open the filter feature and auto completes the fields using the equals expression to filter for matches the cell's value.
Exclude - A shortcut to open the filter feature and auto completes the fields using the does not equal expression to filter for matches that don't match the cell's value.
Splitting columns
Datameer 7.5 includes the option to split workbook string and list type column content by a selected delimiter. Several optional additional controls are available. Resulting column names are the original column name with a sequential number appended. For list columns, commas are the only delimiter and there will be one column per comma up to the column limit.
The split column operation can be updated as many times as you like until you have desired split configuration.
To split a column:
Right-click on the column header of the column you want to split to open the context menu and select Split Column, or use the icon in the toolbar.
In the Split to Columns dialog, the Input column will be selected automatically.
For string columns, you must select the Type of column contents, either Delimited String or JSON Array String.
For string columns, enter a Delimiter character.
The following optional controls are available:
(Strings only) Use Skip Leading to enter the number of characters that will be ignored before the delimiter used to split the column.
(Strings only) Use Skip Trailing to enter the number of characters that will be ignored after the specified number of columns are split.
Use Skip first to ignore the first x number of elements, where elements are the characters between delimiters.
Use Split into to specify a fixed number of columns into which data will be split.
(JSON array only) When Catch additional data in overflow column is checked, if the number of original elements exceeds the number of specified output columns, the remaining elements will be kept and grouped into one column, in brackets.
Use Drop additional elements to ignore all elements past the specified output columns.
(Strings only) Use Trim Whitespace to delete any leading or trailing spaces and line breaks.
(Strings only) Use Define Empty Value to specify what is substituted in if no existing data is split into the last columns (i.e. when the amount of data is smaller than the column limit.)
5. Optionally, enter a Column prefix that be prepended to the new column names.
6. Once you have entered your desired parameters, click Create Sheet.
7. A new SplitString or SplitList sheet opens containing the split columns.
If desired, you can use the Split tab in the Inspector to enter new split criteria. Click Update Sheet to apply changes.
Encoding Columns
INFO
Column encoding performs ordinal, one-hot or binned encoding on column data which assigns a unique numeric value to each categorical or continuous value. Once applied, column encoding can be updated as needed until the desired results are achieved.
Columns with a high cardinality are not suited for ordinal/ binned encoding.
Column encoding provides a consistent view of prepared data which is especially helpful for teams working together on model building and testing activities.
Ordinal Encoding
For ordinal encoding:
Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
orIf needed, change the column by entering the required column name in 'Column'.
Select the encoding type "Ordinal Encoding" from the drop-down. Further selection options adapt to the needs.
Decide how to deal with unknown values by clicking the required statement.
INFO: 'Drop Value' ignores values beyond the first 100 most frequent. 'Default Value' shows values, which can not be encoded.
View the top 32 values (by count).
If needed, add a new value in the blank field, change the order of the top values or delete single values.
Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. Ordinal Encoding is finished.
One-hot Encoding
For one-hot encoding:
Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
orIf needed, change the column by entering the required column name in 'Column'.
Select the encoding type "1-Hot Encoding" from the drop-down. Further selection options adapt to the needs.
Decide how to deal with unknown values by clicking the required statement.
INFO: 'Drop Value' ignores values beyond the first 100 most frequent. 'Include at last column' adds a new element to the list that encodes together all values beyond the 100 most frequent.Select the output format from the dropdown 'Output'.
INFO: 'As List' keeps all binary pairs together in a single column. 'As Column' creates binary pairs each in their own column.View the top 32 values (by count).
If needed, add a new value in the blank field, change the order of the top values or delete single values.
Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. One-hot encoding is finished.
Binned Encoding
For binned encoding:
Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right.
orIf needed, change the column by entering the required column name in 'Column'.
Select the encoding type "Binned Encoding" from the drop-down. Further selection options adapt to the needs.
Select the output format from the dropdown 'Output'.
INFO: 'As List' keeps all binary pairs together in a single column. 'Ordinal' encodes as ordinal numbers in a single column.
View the default value distributions.
INFO: The graph changes according to the amount of dividers.
Enter the required bin dividers to change the percentile size of the divider.
INFO: There are 3 dividers set as default, e.g. 'Divider 1' contains the 25% of the selected column values.
INFO: To delete a divider, click on "x" next to the required divider.
If needed, click on "Add new Divider" to add a new divider.
INFO: Clinking the button adds an additional bucket, recalculates the percentile size and the corresponding absolute values.
Confirm with "Encode". The encoding result is displayed in a new encoding sheet within the workbook. Binned encoding is finished.
Formatting columns
To format the contents of a column:
Right-click a column name.
Select Format Cells.
General
Shading rows in a workbook
The worksheet highlights rows by each group series when a GROUPBY function is used.
Copy cells
Double-click a cell to select it, and then right-click it and select Copy Cell Content to copy its contents to the browser's clipboard. This functionality doesn't work with the Safari browser.
Workbook inspector
The inspector gives you important information about the data of each column in your workbook. Use this feature for quick insights into the data that each of your columns contain.
Information displayed in the column inspector:
- The name of the column being inspected.
- The number and percentage of valid and empty rows.
- A vertical bar chart displays all the different values and their frequency.
- Mousing over a bar in the chart displays the value(s) and count.
- If there are over 32 unique number type values, the bars group the values in bins.
- If there are over 32 unique non-number type values, the vertical bar chart is not displayed.
- The middle row of statistics displays the column's count of unique, minimum, mean, and maximum value.
- Mousing over a value displays the full value.
- Number and date field types are display all values. Other field types display only the unique count.
- The bottom horizontal bar chart displays the top unique values for the column and their respective counts.
- Mousing over a bar in the chart displays the value and count.
- Click the See more link below the bar chart to view additional column values in descending order.
As of Datameer 7.2
There are two tabs at the top of the workbook inspector. The inspector is separated at the worksheet level and column level.
The worksheet level tab has a description box where text can be entered about each worksheet in the workbook.
The column level tab has a description box where text can be entered about each column of a worksheet. Below the description box, the column information is as described above.
To open or close the workbook column inspector:
Click View in the menu bar and select/unselect Inspector.