When the same profile is ingested from multiple sources in Data 360 (formerly Data Cloud), it causes duplicate records in the individual identity table. This Knowledge Article is outlined to explain the behavior and provide some solution approaches
Assume a customer has these different streams and data lake objects (DLOs) of individual data.
As a result of the above DLOs, the resulting Individual DMO is created. We preserve all of the data with their associated data sources (data lineages). This is a great data lake foundation.
The right data lake foundation preserves all the data and data lineages without any assumption about whether they are in reality unique or represent the same person. However, while all the information is there to uniquely identify each row in this view, Salesforce Data 360 doesn’t currently use the information to uniquely interpret each row of data.
Take a look at this email engagement DMO example:
Because the relationship between the Individual DMO and the Email Engaegment DMO is a 1:N one via the Individual ID, when you join them, the following happens. Let’s just take a look at Stan (Individual 2) as an example:
A simple join thinks that:
This is where the problem happens. While there’s only 1 click, Salesforce Data 360 now thinks there are 3 clicks, all clicked by Individual ID 2. As one Individual ID 2 is indistinguishable to our system from another Individual ID 2, the result is that our system thinks there are 3 clicks for Individual ID 2. Everywhere our system is querying the above data, this problem can show up, including in Segmentation, Calculated Insights, and Query APIs.
This problem also manifests in Unified Individual. Let’s say the customer configures a rule that says if Individual ID (via what’s being mapped to our “Party Identification” table) matches, then the individual rows of data are same person, i.e. the “Unified Individual”. By configuring this rule, the customer tells us that the 3 Johns and 3 Stans are in reality the same person. They also configure the Reconciliation rules for each of the attributes - First Name, Gender and Point Balance - in case of conflicts (e.g. Stan vs Stanislav). We create a whole bunch of unified tables (for Individual, for Contact Point Email, Contact Point Phone etc.) and the bridge tables between the unified data and the un-unified data.
Just for individual, we create this:
The above 2 formulates what business users call an Identity Graph. Just looking at Stan, in a simplified view:
So suppose there’s a segment, segmenting on Unified Individual, with this criteria:
Using a simple join of the data, the segment query thinks Stan has 3 clicks and qualifies him into the segment when in reality he only had 1 click, so he shouldn’t have qualified to be in the segment.
If the query has understood that the Individual ID 2 on the email engagement data refers to one specific individual only, it would know that it only has 1 click and doesn’t qualify for the segment.
Solutions Approaches
Solution approach focuses on removing the duplication in keys to drive correct results, depending upon the duplication in underlying keys different approaches can be adopted.
Approach #1: Create Unique Join keys: This approach works if there is no key duplication within a single data source [Example: When an individual is mapped to subscribers in the marketing cloud, there is no duplication in subscriber keys]. The key uniqueness while doing the join is achieved by using customer source record id and data source key in the join condition
Approach #2: Deduplicating source record ID’s with calculated insights. This approach is less desired then Approach 1 given this involves creating a calculated insight for deduplicating the keys, however can be useful if source record id duplication exists in a single source.
These approaches are described below
Prerequisite:
DBeaver or any other SQL tool connected to Salesforce Data 360 Instance
How to identify key conflicts or duplicate keys in an Individual Identity Link object ?
select SourceRecordId__c, count(*) as duplicate__c
FROM IndividualIdentityLink__dlm
group by
SourceRecordId__c
Having COUNT(*)>1
Let's look at one of the duplicate records
SELECT ssot__DataSourceObjectId__c, UnifiedRecordId__c,
SourceRecordId__c
FROM IndividualIdentityLink__dlm
where SourceRecordId__c ='102120181'
As you can see SourceRecordID__c has duplicate values, which is expected however, if you now join on SourceRecordId__c, Example [Engagement DMO →(JOIN)-->Identity Link→(JOIN) -->UnifiedIndividual] the resulting aggregates (Sum and COUNT) will produce incorrect values
count(ssot__SalesOrder__dlm.ssot__Id__c) as order_count__c,
UnifiedIndividual__dlm.ssot__Id__c as customer__c
from
ssot__SalesOrder__dlm
join
ssot__Individual__dlm
ON
ssot__SalesOrder__dlm.ssot__SoldToCustomerId__c = ssot__Individual__dlm.ssot__Id__c
join
IndividualIdentityLink__dlm
ON
ssot__Individual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.SourceRecordId__c
and ssot__Individual__dlm.ssot__DataSourceObjectId__c = IndividualIdentityLink__dlm.ssot__DataSourceObjectId__c
JOIN
UnifiedIndividual__dlm
on
UnifiedIndividual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.UnifiedRecordId__c
group by
UnifiedIndividual__dlm.ssot__Id__c
Example: In Insights builder : Select Data Source Object ID as additional Join key
select
COUNT(ssot__Individual__dlm.ssot__Id__c) as subscriber_count__c,
ssot__Individual__dlm.ssot__Id__c as sourceID__c,
UnifiedIndividual__dlm.ssot__Id__c as Unified__c
from
ssot__Individual__dlm
join
IndividualIdentityLink__dlm
on
ssot__Individual__dlm.ssot__Id__c = IndividualIdentityLink__dlm.SourceRecordId__c
JOIN
UnifiedIndividual__dlm
ON
IndividualIdentityLink__dlm.UnifiedRecordId__c = UnifiedIndividual__dlm.ssot__Id__c
Group by
ssot__Individual__dlm.ssot__Id__c,
UnifiedIndividual__dlm.ssot__Id__c,
IndividualIdentityLink__dlm.SourceRecordId__c
use this CI to join with the engagement data to count correct number of records using Metrics on Metrics feature
SELECT
COUNT(ssot__SalesOrder__dlm.ssot__OrderNumber__c) as sales_order_count__c,
CI_Individual_UnifiedIndividual__cio.Unified__c as unified__c
FROM
ssot__SalesOrder__dlm
JOIN
CI_Individual_UnifiedIndividual__cio
ON
ssot__SalesOrder__dlm.ssot__SoldToCustomerId__c = CI_Individual_UnifiedIndividual__cio.sourceID__c
Group BY
unified__c
With this approach there are couple of things to consider
Segmentation
In Segmentation, For new segments created, there shouldn't be any issue. To fix the existing segments:
If customers have this issue, they should Edit Rules on the segment with problem, and click Save button. This will ensure the data is in sync again.
000395103

We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.