Syntax
GROUPCOUNTDISTINCT(<any>)
Description
Counts the distinct values of a group.
...
Warning |
---|
Known issue: When working with the GROUPCOUNTDISTINCT function on a dataset with a very high number of records in a group, out of memory exceptions might occur. Cause: In order to determine the true distinct value, GROUPCOUNTDISTINCT shifts all mapped information to disk on one reducer. This can cause disk space and performance/memory issues on large datasets. This is a known limitation that Datameer X is working to resolve. Solution: In order to work around this issue, an intermediary sheet along with a combination of GROUPBY and GROUPCOUNT functions is suggested as described below. Original: Sheet 1: user_id, product Sheet 2: GROUPBY(Sheet1.product), GROUPCOUNTDISTINCT(Sheet2.user_id) Workaround: Sheet 1: user_id, produt Sheet 2: groupby(Sheet1.product), groupby(Sheet1.user_id) Sheet 3: group by(Sheet2.product), groupcount This work around gives the same desired output of the GROUPCOUNTDISTINCT function. |
...