In this chapter you find information to optimize a workbook. |
Do not join data when having 'date' data. Don't do cartesian joins. |
We recommend you not to keep more columns within a sheet than really needed. |
Kept sheets in a Datameer X workbook which contains large strings (e.g. 200MB of JSON) will result in big Parquet files. |
To avoid large Parquet files the following options apply:
Due to the execution performance, break up your workflow into. multiple Workbooks when:
We recommend you to configure the section 'Data Retention Policy' when configuring the import of data. This regulates the options on how to store the Workbook data and minimizes the data footprint. |
Workbook sheets that are not kept and used by a downstream Workbook or an export job reduce a Workbook's processing time and cost storage. You should only save the Workbook results for the necessary sheets in your downstream process. |
To avoid a long processing time you will find an indicator about kept and therefore sheets that need to be kept while non-kept sheets can be deleted:
We recommend you to filter your data as soon as you can do it during your data editing process. |
Due to performance reasons, use subqueries in their own SQL sheet when using SQL sheets. |
Fill in the descriptions for data transformation. The annotations will help you to follow your transformation process. |
When importing data from type 'date', always use data partitions. |