Loading
About Salesforce Data 360
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Multicollinearity Alert

          Multicollinearity Alert

          Multicollinearity indicates that two or more variables in a dataset are highly correlated (for example, “city” and “postal code”). Because these variables can have a duplicate impact on the outcome, high collinearity can lead to overfitting—the model performs well on training data but could perform poorly on data that the model hasn't been exposed to yet.

          Actions to Consider

          To improve results, choose just one variable. Use the most descriptive field (for example, “city”) to make insights more easily interpretable.

          Detection Methodology

          Model Builder displays a data alert when the Cramér’s V algorithm that tests for multicollinearity, returns a value of 0.5 or higher for two variables.

          Example

          A real estate agency wants to predict house prices for new listings. To achieve this, the agency builds a regression model with these input variables.

          • house size (sq. ft)
          • number of bedrooms
          • number of bathrooms
          • age of house
          • neighborhood median income
          • renovation status

          After model training, an alert displays because the ”house size,” “number of bedrooms,” and “number of bathrooms” variables are highly correlated. As a result, the model can’t determine the importance of each variable, leading to unreliable coefficients. To resolve the issue, here are some actions to consider.

          • Exclude one of the correlated variables to improve generalization on data that isn’t yet trained. For example, include “house size” and exclude "number of bedrooms" from the dataset.
          • Use variable selection techniques, such as principal component analysis (PCA) or lasso regression, to reduce redundancy.
          • Retrain the model with an updated dataset.
           
          Loading
          Salesforce Help | Article