You are here:
Outliers Alert
Outliers are data points that significantly differ from the majority in a dataset. The alert indicates that at least one outlier was detected and that there's a presence of uncommonly small or large numbers. Outliers can skew model performance, reduce accuracy, and lead to incorrect forecasts.
Actions to Consider
There are two types of outliers.
- incorrect values due to data entry errors, processing errors, or other issues.
- correct values that reflect an extraordinary, non-recurring, or infrequent event.
Investigate to determine which outliers to keep and which ones to exclude from your model. Exclude incorrect values and consider fixing them before model training.
Detection Methodology
For a variable, Model Builder:
- calculates the global mean and global standard deviation of its values.
- designates an outlier as any value that's greater than, or less than five standard deviations away from the global mean.
Example
A retailer wants to predict unit sales for different products. To achieve this, the retailer builds a regression model with these input variables.
- product ID
- unit price
- day of the week
- store location
- stock availability
- competitor unit price
- promo event
After model training, an alert displays because a high outlier score is detected for a product. Most products sell between 10 and 100 units per day but records show sales over 1000 units per day. To resolve the issue, here are some actions to consider.
- Remove data points that may have resulted from errors.
- Include the outliers score during training to help the model learn patterns that indicate regular sales amounts. Then, use the data to identify transactions that significantly deviate from the norm.
- Retrain the model with an updated dataset.

