Data Leakage Alert

Data leakage occurs when the model's training data contains the information you’re trying to predict. Leakage results in models that score optimistically high in training but perform less accurately on live data. It can mislead the model into learning patterns that are unavailable in practice, thereby compromising predictions.

Actions to Consider

Investigate the variable for data leakage. If found, exclude the variable from the model.

Detection Methodology

Model Builder displays an alert when it detects an explanatory variable that always has the same outcome.

Example

A lender wants to predict if a customer will default on a loan. To achieve this, the lender builds a binary model with these input variables.

age
income
credit score
loan amount
loan status
payment history
loan final payment date

After training, the model displays an alert because data leakage was detected. The values for the “loan status” and “payment history” variables are determined only after the loan is approved and partially repaid. Including such information leaks future data that wouldn’t realistically be available when the model makes predictions. To resolve the issue, here are some actions to consider.

Exclude the “loan status", “payment history", and “loan final payment date” input variables from the training set.
Use variables known at the time of the loan approval instead of “age", “income", “credit score", and “loan amount".
Retrain the model with an updated dataset.

Did this article solve your issue?

Let us know so we can improve!

Data Leakage Alert

Actions to Consider

Detection Methodology

Example

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Data Leakage Alert

Actions to Consider

Detection Methodology

Example