Loading
CRM Analytics
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Get Started with Data Prep

          Get Started with Data Prep

          Excellent data preparation is fundamental to the success of solution implementations powered by Einstein Discovery.

          Note
          Note Einstein Discovery stories are now models. We wish we could snap our fingers to update the name everywhere, but you can expect to see the previous name in a few places until we replace it.

          What is Data Prep?

          Data prep is the process of preparing your data so that it’s optimized for Einstein Discovery to:

          • analyze and produce useful insights into your data
          • train models that derive useful predictions and improvements for making business decisions and improving outcomes

          Data preparation involves aggregating and optimizing data associated with the outcome variable you're investigating, along with potential explanatory variables that can influence the outcome. Data preparation is also a process of iterative refinement. As you dig deeper into your data, new clues emerge. Discoveries can cause you to reassess previous assumptions and adjust your data prep implementation accordingly.

          Investing in Data Prep for Your Solution

          Data scientists typically invest significant time and effort to plan and prepare their data. They know how much the quality of the output depends on the design and quality of the input.

          You aren’t able to match the wizard-level knowledge, skills, and training that data scientists bring to the data prep process, but you can still succeed if you:

          • have domain knowledge of the data associated with the business outcome you’re trying to optimize
          • use CRM Analytics’s extensive data integration capabilities to aggregate, clean, and populate the dataset with data that’s optimized for analysis
          • apply common data prep techniques to produce high-quality data that is accurate, complete, representative of your business operations, and relevant to your solution

          Even if you aren’t a data scientist, you can improve your results by applying basic principles to help you implement your solution.

          Use CRM Analytics's Data Integration Capabilities

          Einstein Discovery relies on data stored in CRM Analytics datasets. CRM Analytics provides a variety of powerful tools you can use to prepare (extract, load, and transform) your data. That way, you can populate a dataset with information that is optimized for Einstein Discovery to consume. To learn more, see Integrate and Prepare Data for Analysis, including Get Started with Data Integration.

          Einstein Discovery Helps with Data Prep

          Einstein Discovery provides the following capabilities to help you improve your data for analysis:

          • Quality alerts notify you when Einstein Discovery detects a possible problem in your data. You can remedy data issues in two main ways:
            • fixing the issue in your CRM Analytics dataset by using data prep tools to automate the fix
            • correcting the issue in your model by using model settings
            If you fix the issue in the dataset, then the automated corrections are reapplied whenever you refresh the data. If you fix the issue in the model, then corrections are applied during model creation or at prediction time. Model fixes do not affect data in the dataset. To learn more, see Handle Quality Alerts.
          • Feature selection involves deciding which explanatory variables to include in your model. Ideally, you want your model to have an optimized set of explanatory variables that best explain variations in the outcome variable. Using automated feature selection and correlations found in your data, Einstein Discovery can suggest which variables to include in—or omit from—your model. That way, you can focus your data prep efforts on the variables you’re analyzing.
          • If certain observations don’t meet your filter criteria, filters let you selectively exclude them from a model. For example, in model settings, you can specify number and date ranges, and you can omit categories from analysis.
          • Transformations allow you to fix data issues in your model. For example, fuzzy matching allows you to improve category groupings by fixing spelling variations in categorical values.

          Data Preparation and Iterative Improvement

          Data prep is not a once-and-done process. In fact, the first set of data you produce is likely to expose a variety of issues you need to address: missing values, incorrect values, outliers that are correct or incorrect, redundant information you need to weed out, and other challenges. Fortunately, CRM Analytics and Einstein Discovery give you extensive capabilities to sift through your data, identify and fix problems, and produce optimized data—all without needing to write a line of code. And you can automate the cleansing process with recipes, data flows, job schedulers, and transformations so you can apply the same fixes to fresh data.

          Data prep can continue after your solution has been deployed. As new insights are revealed, it is common to experiment by adding or changing aspects of the input data. You can schedule analysis to continually add new data to your model incrementally. It is also common to periodically update your model variables and fields with new information or better focused business questions.

           
          Loading
          Salesforce Help | Article