Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleINFO

In this chapter you find information to optimize a workbook.

Table of Contents

Avoiding Join of Dates

Tip
titleTIP

Do not join data when having 'date' data.

Don't do cartesian joins.

Avoiding Large Amount of Columns Per Sheet

Tip
titleTIP

We recommend you not to keep more columns within a sheet than really needed.

Avoiding Unexpected Large Parquet Files

Info
titleINFO

Kept sheets in a Datameer X workbook which contains large strings (e.g. 200MB of JSON) will result in big Parquet files.

...

  • restructure the workbook chain and make it a multi layer data transformation where only the required data is defined as kept/ result sheet
  • increase the JVM heap space for the Datameer X job

Breaking up the Workflow When Having High Number of Sheets and Joints

Due to the execution performance, break up your workflow into. multiple Workbooks when:

  • having more than 20 sheets per Workbook
  • having two joins

Configuration of Data Retention Policy 

Tip
titleTIP

We recommend you to configure the section 'Data Retention Policy' when configuring the import of data. This regulates the options on how to store the Workbook data and minimizes the data footprint.

Image Added

Deletion of Non-kept Sheets

Tip
titleTIP

Workbook sheets that are not kept and used by a downstream Workbook or an export job reduce a Workbook's processing time and cost storage. You should only save the Workbook results for the necessary sheets in your downstream process.

To avoid a long processing time you will find an indicator about kept and therefore sheets that need to be kept while non-kept sheets can be deleted:

  • find the indicator 'Consumer' on the Workbook Setting page and on the Workbook Details page e.g. here:
    Image Added 
  • the indicator 'Consumer' helps admins and users to identify which portions of old work are no longer needed to be kept
  • delete non-kept sheets that are not used any more

Filtering Data

Tip
titleTIP

We recommend you to filter your data as soon as you can do it during your data editing process.

Performance of SQL Sheets

Tip
titleTIP

Due to performance reasons, use subqueries in their own SQL sheet when using SQL sheets.

Usage of Descriptions

Tip
titleTIP

Fill in the descriptions for data transformation. The annotations will help you to follow your transformation process.

Usage of Partitions

Tip
titleTIP

When importing data from type 'date', always use data partitions.