Loading
Agentforce and Einstein Generative AI
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          About Knowledge/RAG Quality Data and Metrics

          About Knowledge/RAG Quality Data and Metrics

          Learn about the metrics used to score retrievals. This feature collects data at run time and calculates quality scores for RAG-powered knowledge retrievals.

          Required Editions

          Available in: Lightning Experience
          Available in: Enterprise, Performance, Unlimited, and Developer Editions. Required add-on licenses vary by agent type.
          Note
          Note RAG quality metrics are supported for individual and ensemble retrievers.

          RAG Quality Metrics

          Use quality metrics to track run-time performance, view trends, and identify areas to improve in RAG-powered solutions. RAG quality metrics help you identify problem patterns, conduct root cause analysis, and fine-tune your RAG configuration.

          Metric Name Description
          Context Precision Measures the relevance of the retrieved context, calculated based on both the question and context. What proportion of the right knowledge was found?
          Faithfulness Measures the factual consistency of the generated answer against the given context. How well did the agent use the retrieved knowledge?
          Answer Relevance Measures how pertinent the generated answer is to the given prompt. How completely and how well is the question answered?

          Common Patterns in RAG Quality Metrics

          Pattern Indication Investigate
          High Faithfulness, Low Context Relevance

          The answer is grounded in the retrieved context, but that context isn’t relevant to the query. As a result, the answer relevance is also likely low.

          This symptom likely indicates a problem in the retrieval.

          • Does the content actually exist in the knowledge store?
          • Is the number of returned results sufficiently high?
          • Are the correct result fields selected?
          • Is the search string well formed?
          • For non-English content, is the multilingual embedding model selected?
          Low Faithfulness, High-Context Relevance

          The answer isn’t grounded in the context, even though that context is relevant to the query. Answer relevance is also likely low.

          This symptom likely indicates a problem in the LLM generation. It’s possibly due to a shortcoming in prompt engineering. Example: A failure of the LLM failing to give sufficiently strong instructions to follow the provided context.

          • Is the prompt template well written?
          • Is the LLM sufficiently capable to perform the required reasoning task? If not, select a different LLM or LLM version.
          High Faithfulness and High-Context Relevance, Low-Answer Relevance

          The answer is grounded in the context. That context is relevant to the query. However, the answer relevance is still low.

          This symptom likely indicates that insufficient context was retrieved to fully answer the query. The problem is likely in the retrieval, particularly in the recall of the retrieval.

          • Does the content actually exist in the knowledge store?
          • Is the number of returned results sufficiently high?
          • Are the right result fields selected?

          Billing Considerations

          Collecting and scoring RAG quality metrics increases your org’s credit consumption rate, including LLM calls and Data 360 features. To learn more, see:

           
          Loading
          Salesforce Help | Article