Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Syntax

GROUPCOUNTDISTINCT(<any>)

Description

Counts the distinct values of a group.

...

Warning

Known issue:

When working with the GROUPCOUNTDISTINCT function on a dataset with a very high number of records in a group, out of memory exceptions might occur.

Cause:

In order to determine the true distinct value, GROUPCOUNTDISTINCT shifts all mapped information to disk on one reducer. This can cause disk space and performance/memory issues on large datasets. This is a known limitation that Datameer X is working to resolve.

Solution:

In order to work around this issue, an intermediary sheet along with a combination of GROUPBY and GROUPCOUNT functions is suggested as described below.

Original:

Sheet 1: user_id, product 

Sheet 2: GROUPBY(Sheet1.product), GROUPCOUNTDISTINCT(Sheet2.user_id)

Workaround:

Sheet 1: user_id, produt 

Sheet 2: groupby(Sheet1.product), groupby(Sheet1.user_id) 

Sheet 3: group by(Sheet2.product), groupcount

This work around gives the same desired output of the GROUPCOUNTDISTINCT function.

...