About Knowledge/RAG Quality Data and Metrics
Learn about the metrics used to score retrievals. This feature collects data at run time and calculates quality scores for RAG-powered knowledge retrievals.
Required Editions
| Available in: Lightning Experience |
| Available in: Enterprise, Performance, Unlimited, and Developer Editions. Required add-on licenses vary by agent type. |
RAG Quality Metrics
Use quality metrics to track run-time performance, view trends, and identify areas to improve in RAG-powered solutions. RAG quality metrics help you identify problem patterns, conduct root cause analysis, and fine-tune your RAG configuration.
| Metric Name | Description |
|---|---|
| Context Precision | Measures the relevance of the retrieved context, calculated based on both the question and context. What proportion of the right knowledge was found? |
| Faithfulness | Measures the factual consistency of the generated answer against the given context. How well did the agent use the retrieved knowledge? |
| Answer Relevance | Measures how pertinent the generated answer is to the given prompt. How completely and how well is the question answered? |
Common Patterns in RAG Quality Metrics
| Pattern | Indication | Investigate |
|---|---|---|
| High Faithfulness, Low Context Relevance | The answer is grounded in the retrieved context, but that context isn’t relevant to the query. As a result, the answer relevance is also likely low. This symptom likely indicates a problem in the retrieval. |
|
| Low Faithfulness, High-Context Relevance | The answer isn’t grounded in the context, even though that context is relevant to the query. Answer relevance is also likely low. This symptom likely indicates a problem in the LLM generation. It’s possibly due to a shortcoming in prompt engineering. Example: A failure of the LLM failing to give sufficiently strong instructions to follow the provided context. |
|
| High Faithfulness and High-Context Relevance, Low-Answer Relevance | The answer is grounded in the context. That context is relevant to the query. However, the answer relevance is still low. This symptom likely indicates that insufficient context was retrieved to fully answer the query. The problem is likely in the retrieval, particularly in the recall of the retrieval. |
|
Billing Considerations
Collecting and scoring RAG quality metrics increases your org’s credit consumption rate, including LLM calls and Data 360 features. To learn more, see:
- Agentforce: Flex Credits Billable Usage Types
- Data 360: Billing Considerations for Audit and Feedback

