GROUPCOUNTDISTINCT

GROUPCOUNTDISTINCT

Syntax

GROUPCOUNTDISTINCT(<any>)

Description

Counts the distinct values of a group.

This is an aggregate function

Example

Given the following data:

Customer_

Name

Item

Customer_

Name

Item

2535

Jon

Zamioculcas

2888

Mike

Spider Lily

2535

Jon

Datura

2535

Jon

Dahlia

5788

Jon

Snapdragon

2888

Mike

Marguerite

2535

Jon

Windflower

2888

Mike

Elephant's Ear

2535

Jon

Chinese Evergreen

5788

Jon

Begonia

5788

Jon

Starfish Plant

2535

Jon

Hare's Foot Fern

2535

Jon

Venus Flytrap

3545

Monique

Red Flame Ivy

First create a group using GROUPBY(#RawData!Name)

Name

Name

Jon

Mike

Monique


Then use the GROUPCOUNTDISTINCT(#RawData!Customer_), and the results are a count of each time an individual value appears in that column in relation to the GROUPBY() column.

Name

Customer_COUNTDISTINCT

Name

Customer_COUNTDISTINCT

Jon

2

Mike

1

Monique

1

Known issue:

When working with the GROUPCOUNTDISTINCT function on a dataset with a very high number of records in a group, out of memory exceptions might occur.

Cause:

In order to determine the true distinct value, GROUPCOUNTDISTINCT shifts all mapped information to disk on one reducer. This can cause disk space and performance/memory issues on large datasets. This is a known limitation that Datameer is working to resolve.

Solution:

In order to work around this issue, an intermediary sheet along with a combination of GROUPBY and GROUPCOUNT functions is suggested as described below.

Original:

Sheet 1: user_id, product 

Sheet 2: GROUPBY(Sheet1.product), GROUPCOUNTDISTINCT(Sheet2.user_id)

Workaround:

Sheet 1: user_id, produt 

Sheet 2: groupby(Sheet1.product), groupby(Sheet1.user_id) 

Sheet 3: group by(Sheet2.product), groupcount

This work around gives the same desired output of the GROUPCOUNTDISTINCT function.