Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

When importing data using an import job or a file upload, you can partition your data using date parameters. When this data is loaded into a workbook, you can choose to run your calculations on all or on just a part of your data. Also if you decide to export data, you can choose to export all or just a part of your data.

If you are using Microsoft Internet Explorer 7 or 8, these versions don't support Scalable Vector Graphics (SVG). When using partitions in workbook or when exporting a a workbook or when exporting a partition, the simple partition filter isn't available - only the time window and advanced partition filter are available. Refer to Supported Browsers for alternate web browsers.

...

To use partitions you must first configure your data within an import an import job, data  data link, or a file a file upload. When configuring your job, take into account that the record sample size is per partition. Multiple partitions in combination with large sample sizes may lead to performance issues.

...

If your input job is partitioned and you have changed the partition or schema, when you reach the last step of the Import Job wizard, clicking the Save Copy As button in fact does not re-import the data, rather it triggers a migration job instead.

Simple partition filter

In Column Name choose a column that contains a date. In Date Format choose the time scale to use, so you can concentrate on only periods of years, months, days or hours.

Advanced partition filter

...

  1. Create a new data link or choose to edit a current data link.
  2. Go to the Data Details section.
  3. Enter the path for the files or folders and include the %pattern% to specify where the files are located. 

    The %pattern% specifies a folder structure and defines which files from the included folders should be included in the data link partition. This feature cannot be used on direct file names.

    For example, instead of using /data/archive/newsvine/Newsvine_Users_20140104.txt, you can use newsvine/%pattern% when the path prefix is set to /data/archive.

    More examples:

    (tick) /Users/MattSmith/Desktop/Geo_coords/%pattern%/geodata.csv

    (error) /Users/MattSmith/Desktop/Geo_coords/%pattern%.csv)

    You can also include a / (slash) character to be able to distinguish different folder names for the different components of the date partition. 
    Additionally, a fixed character set can be included inside ' (single quotes).

    Examples:

    /user/database/date=20150101/data.txt

    becomes File or Folder value: user/database/date=%pattern%/*

    Partition field: yyyyMMdd

    For a particular folder structure:

    /user/database/db_date_yr=2015/db_date_mo=01/data.txt

    becomes File or Folder value: user/database/db_date_yr=%pattern%/*

    Partition field: yyyy/'db_date_mo='MM

    Or File or Folder value: user/database/%pattern%/*

    Partition field: 'db_date_yr='yyyy/'db_date_mo='MM

    To partition all files in a single folder where the file names contains the metadata:

    /user/database/data-2015-01-01-version1.txt

    use File or Folder value: /user/database/data-%pattern%-version*.csv
    Partition field: yyyy''MM''dd

  4. Scroll down to time based partitions and select the ON setting.
  5. In the Partition Pattern box enter a date format expression like 'yyyy/MM/dd/HH/mm/SS' which will replaces the %pattern% placeholder in the file path. In the above example, you would use 'Newsvine_Users_'yyyMMdd'.txt'.

    Note

    Keep the following points in mind when determining granularity:

      • The selected granularity of a partition can affect the performance of downstream workbooks.
      • If the defined partitions contain few records, job performance might be slow in downstream workbooks.
      • Using a granularity of minutes or seconds can cause the simple partition filter to be unreadable. Datameer X recommends only using these partition granularities with the time window and advanced partition filter.
      • Using a granularity of minutes or seconds is not available for import jobs.
  6. Click Next when you have finished and save the file.

...

After partitioning your data during an import an import or file upload, open this data in a workbook. (See Working with Workbooks for more details.) 

Static and dynamic partitions

...

When a partition is created in Datameer for an import job or data X for an import job or data link, there are four different partition resolutions that can be configured: YEAR, MONTH, DAY, or HOUR. The $partition variable is a <date type> object in Datameer. This variable is set to the start of the partition window. Depending on the resolution of the partitions, the $partition variable takes on different date field type values. 

...

It is also possible to export only the desired partition or partitions during exportduring export.

Simple partition filter

During export you can choose to export only certain partitions.

...

After a partition has been configured it may be necessary to change the partition's resolution. You can change the resolution for import jobs and file for import jobs and file uploads. Changing the resolution requires a few simple steps.

...

If you are repartitioning data from a migration job, it can affect resources on your cluster and use extra disk space. The repartitioning leaves the existing data alone until the migration job is finished and housekeeping executes. The data takes up at least twice what it currently does after the migration has run and housekeeping hasn't. Datameer's partitioning logic always performs dynamic-based partitioning and needs to read the records to write the partition files. Datameer X regenerates the samples, so there is a separate sample per partition, which requires Datameer X to read through the data. If you have a yearly migration job, you can make it run faster by partitioning by smaller amounts of time, such as a month or a day so more tasks can be run in parallel.

Note
titleAs of Datameer X 7.4
Note that if your input job is partitioned and you have changed the partition or schema, when you reach the last step of the Import Job or file upload wizard, clicking the Save Copy As button in fact does not re-import the data, rather it triggers a migration job instead.