Data 360: Incorrect count and sum due to Key collision

Due to the key conflicts described above, Individual data model object (DMO) has a lot of data with the same Primary Key (PK) which leads to incorrect profiles being included in a segment when that segment joins together profiles with engagement data.

Assume a customer has these different streams and data lake objects (DLOs) of individual data.

As a result of the above DLOs, the resulting Individual DMO is created. We preserve all of the data with their associated data sources (data lineages). This is a great data lake foundation.

The right data lake foundation preserves all the data and data lineages without any assumption about whether they are in reality unique or represent the same person. However, while all the information is there to uniquely identify each row in this view, Salesforce Data 360 doesn’t currently use the information to uniquely interpret each row of data.

Take a look at this email engagement DMO example:

Because the relationship between the Individual DMO and the Email Engaegment DMO is a 1:N one via the Individual ID, when you join them, the following happens. Let’s just take a look at Stan (Individual 2) as an example:

A simple join thinks that:

The first row of Individual 2 has one click
The second row of Individual 2 has one click
The third row of Individual 2 has one click

This is where the problem happens. While there’s only 1 click, Salesforce Data 360 now thinks there are 3 clicks, all clicked by Individual ID 2. As one Individual ID 2 is indistinguishable to our system from another Individual ID 2, the result is that our system thinks there are 3 clicks for Individual ID 2. Everywhere our system is querying the above data, this problem can show up, including in Segmentation, Calculated Insights, and Query APIs.

This problem also manifests in Unified Individual. Let’s say the customer configures a rule that says if Individual ID (via what’s being mapped to our “Party Identification” table) matches, then the individual rows of data are same person, i.e. the “Unified Individual”. By configuring this rule, the customer tells us that the 3 Johns and 3 Stans are in reality the same person. They also configure the Reconciliation rules for each of the attributes - First Name, Gender and Point Balance - in case of conflicts (e.g. Stan vs Stanislav). We create a whole bunch of unified tables (for Individual, for Contact Point Email, Contact Point Phone etc.) and the bridge tables between the unified data and the un-unified data.

Just for individual, we create this:

The above 2 formulates what business users call an Identity Graph. Just looking at Stan, in a simplified view:

So suppose there’s a segment, segmenting on Unified Individual, with this criteria:

Count of email open or click > 2

Using a simple join of the data, the segment query thinks Stan has 3 clicks and qualifies him into the segment when in reality he only had 1 click, so he shouldn’t have qualified to be in the segment.

If the query has understood that the Individual ID 2 on the email engagement data refers to one specific individual only, it would know that it only has 1 click and doesn’t qualify for the segment.

Solutions Approaches

Solution approach focuses on removing the duplication in keys to drive correct results, depending upon the duplication in underlying keys different approaches can be adopted.

Approach #1: Create Unique Join keys: This approach works if there is no key duplication within a single data source [Example: When an individual is mapped to subscribers in the marketing cloud, there is no duplication in subscriber keys]. The key uniqueness while doing the join is achieved by using customer source record id and data source key in the join condition

Approach #2: Deduplicating source record ID’s with calculated insights. This approach is less desired then Approach 1 given this involves creating a calculated insight for deduplicating the keys, however can be useful if source record id duplication exists in a single source.

These approaches are described below

Prerequisite:

DBeaver or any other SQL tool connected to Salesforce Data 360 Instance

How to identify key conflicts or duplicate keys in an Individual Identity Link object ?

select SourceRecordId__c, count(*) as duplicate__c

FROM IndividualIdentityLink__dlm

group by

SourceRecordId__c

Having COUNT(*)>1

Let's look at one of the duplicate records

SELECT ssot__DataSourceObjectId__c, UnifiedRecordId__c,

SourceRecordId__c

FROM IndividualIdentityLink__dlm

where SourceRecordId__c ='102120181'

As you can see SourceRecordID__c has duplicate values, which is expected however, if you now join on SourceRecordId__c, Example [Engagement DMO →(JOIN)-->Identity Link→(JOIN) -->UnifiedIndividual] the resulting aggregates (Sum and COUNT) will produce incorrect values

Approach 1: Use Data Source Object ID as a join key

Scenario: IndividualID’s are coming from multiple sources and IndividualID is not duplicated within a single source:

In order to mitigate these issues, we suggest that customers use key qualifier to avoid the cross join issue resulting from duplicate join keys. The best key qualifier to use is ssotDataSourceObjectIdc.

Example: Here is the example query with aggregates with Calculated Insights and /or query & Tableau

Select

count(ssot__SalesOrder__dlm.ssot__Id__c) as order_count__c,

UnifiedIndividual__dlm.ssot__Id__c as customer__c

from

ssot__SalesOrder__dlm

join

ssot__Individual__dlm

ssot__SalesOrder__dlm.ssot__SoldToCustomerId__c = ssot__Individual__dlm.ssot__Id__c

join

IndividualIdentityLink__dlm

ssot__Individual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.SourceRecordId__c

and ssot__Individual__dlm.ssot__DataSourceObjectId__c = IndividualIdentityLink__dlm.ssot__DataSourceObjectId__c

JOIN

UnifiedIndividual__dlm

UnifiedIndividual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.UnifiedRecordId__c

group by

UnifiedIndividual__dlm.ssot__Id__c

Example: In Insights builder : Select Data Source Object ID as additional Join key

Approach 2: Create a CI deduplicating the identities

select

COUNT(ssot__Individual__dlm.ssot__Id__c) as subscriber_count__c,

ssot__Individual__dlm.ssot__Id__c as sourceID__c,

UnifiedIndividual__dlm.ssot__Id__c as Unified__c

from

ssot__Individual__dlm

join

IndividualIdentityLink__dlm

ssot__Individual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.SourceRecordId__c

JOIN

UnifiedIndividual__dlm

IndividualIdentityLink__dlm.UnifiedRecordId__c = UnifiedIndividual__dlm.ssot__Id__c

Group by

ssot__Individual__dlm.ssot__Id__c,

UnifiedIndividual__dlm.ssot__Id__c,

IndividualIdentityLink__dlm.SourceRecordId__c

use this CI to join with the engagement data to count correct number of records using Metrics on Metrics feature

SELECT

COUNT(ssot__SalesOrder__dlm.ssot__OrderNumber__c) as sales_order_count__c,

CI_Individual_UnifiedIndividual__cio.Unified__c as unified__c

FROM

ssot__SalesOrder__dlm

JOIN

CI_Individual_UnifiedIndividual__cio

ssot__SalesOrder__dlm.ssot__SoldToCustomerId__c = CI_Individual_UnifiedIndividual__cio.sourceID__c

Group BY

unified__c

With this approach there are couple of things to consider

CI takes about 4 hours to compute so Identities could be stale by 4 hours
As Calculated Insights builder does not support metrics on metrics, this workaround is only available in CI and query service

Segmentation

In Segmentation, For new segments created, there shouldn't be any issue. To fix the existing segments:

If customers have this issue, they should Edit Rules on the segment with problem, and click Save button. This will ensure the data is in sync again.

Data 360: Incorrect count and sum due to Key collision

Due to the key conflicts described above, Individual data model object (DMO) has a lot of data with the same Primary Key (PK) which leads to incorrect profiles being included in a segment when that segment joins together profiles with engagement data.

Approach 1: Use Data Source Object ID as a join key

Scenario: IndividualID’s are coming from multiple sources and IndividualID is not duplicated within a single source:

In order to mitigate these issues, we suggest that customers use key qualifier to avoid the cross join issue resulting from duplicate join keys. The best key qualifier to use is ssotDataSourceObjectIdc.

Example: Here is the example query with aggregates with Calculated Insights and /or query & Tableau

Select

Approach 2: Create a CI deduplicating the identities

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Data 360: Incorrect count and sum due to Key collision

Due to the key conflicts described above, Individual data model object (DMO) has a lot of data with the same Primary Key (PK) which leads to incorrect profiles being included in a segment when that segment joins together profiles with engagement data.

Approach 1: Use Data Source Object ID as a join key

Scenario: IndividualID’s are coming from multiple sources and IndividualID is not duplicated within a single source:

In order to mitigate these issues, we suggest that customers use key qualifier to avoid the cross join issue resulting from duplicate join keys. The best key qualifier to use is ssot__DataSourceObjectId__c.

Example: Here is the example query with aggregates with Calculated Insights and /or query & Tableau

Select

Approach 2: Create a CI deduplicating the identities

In order to mitigate these issues, we suggest that customers use key qualifier to avoid the cross join issue resulting from duplicate join keys. The best key qualifier to use is ssotDataSourceObjectIdc.