You are here:
High Frequency Values Alert
High frequency values indicate that most values in a variable are in the same category. It indicates that the variable provides little predictive value and can limit its contribution to the model.
Actions to Consider
Update the variable with a more even distribution of data values. Alternatively, consider removing the variable from the model.
Detection Methodology
Model Builder displays an alert for a variable when a single value in the dataset occurs at least 90% of the time (relative frequency of 0.9 or higher).
Example
An e-commerce retailer wants to predict its customers' propensity to buy a brand across different regions. To achieve this, the retailer builds a regression model with these input variables.
- product ID
- customer ID
- sales channel
- frequency of purchase
- region
After model training, an alert displays because the data indicates that 95% of customers who are likely to purchase the brand are in the USA, and only 5% are in Canada. The model assumes that the USA is the best region for the brand's customers. However, because the "region" variable has low variability, this variable isn't meaningful for predictions. To resolve the issue, here are some actions to consider.
- Use one value and remove the less dominant one. For example, consider customer data for the USA and not Canada. Or, else remove the "region" variable from the dataset.
- Retrain the model with an updated dataset that has a better representation of data values for other regions in addition to the USA.

