Choose Evaluations
Evaluations measure your AI agent’s performance across key areas. Default evaluations include response accuracy, subagent assertion, and action assertion. To better match your testing goals, you can add quality-focused evaluations like completeness, coherence, conciseness, latency, and instruction adherence. Selecting the right mix of evaluations gives you the insights you need to strategically refine and improve your agent.
Required Editions
![]() This article applies to: |
Legacy Agentforce Testing Center in Setup |
![]() This article doesn’t apply to: |
New Testing Center in Agentforce Studio (Beta) |
| Available in: Lightning Experience |
| Available in: Enterprise, Performance, Unlimited, and Developer Editions. Required add-on licenses vary by agent type. |
| User Permissions Needed | |
|---|---|
| To create tests in the Testing Center: | Manage AI Agents AND the required permissions for your agent type |
-
Select the evaluations you want the test to run. Default evaluations are always tested, but
you can select Response Quality Metrics to focus your tests on specific areas. Response quality
evaluations measure the Agentforce Testing Center's three key quality criteria: accuracy
(correct information), relevance (addressing intent and context), and voice and tone (assessing
style and brand alignment).
Evaluation Definition Default Evaluations Response Evaluation We send the test utterance to the agent and record the response. An LLM judge determines the expected response, compares it to the agent's response, and assigns a score. Subagent Assertion AI-generated test utterances are designed to trigger your agent to select their assigned subagents. So, the subagent assertion evaluation determines if your agent selected the correct subagents based on the utterance. Action Assertion Similar to the subagent assertion, AI-generated test-utterances are designed to trigger your agent to select specific actions. The action assertion checks that the agent selects the right actions and includes all of the expected actions based on the utterance. Response Quality Evaluations Completeness Checks whether the agent sufficiently covered the desired content, including all important pieces of information that are expected from the utterance. Coherence Assesses if the response was transformed into grammatically correct, conversational language. For example, if information has been taken from Salesforce objects and delivered as raw data like JSON structures. Conciseness Evaluates if the response is short but accurate. Latency Test execution time, measured in milliseconds. Instruction Adherence Checks how well the agent interprets and fully follows the subagent instructions, addressing key points and providing required information. Learn more about Instruction Adherence in Help. - Check over your test selections and click Save & Run. Generating tests consumes credits, so make sure you are happy with your configuration before running.
When Agentforce finishes generating the tests, they’ll appear in the Testing Center. When the status is Ready to Run, to run the tests directly click Run Tests, or download them as a CSV to make edits. After editing, upload the CSV back into the Testing Center to run tests on your updated cases.
Generative AI can generate test cases for Account, Lead, Opportunity, and Contact objects. With specific instructions in the description, it can also create test cases for custom objects and the Answer Questions with Knowledge action. For example, for custom objects, use a description similar to this: “Generate test cases for [custom object name]. Sample utterances are: [sample utterance related to custom object], [sample utterance related to custom object].” To have AI create test cases for the Answer Questions with Knowledge action, use a description similar to this: “Generate test cases for the Answer Questions with Knowledge action. Use the following knowledge articles for generating tests: [title of an article], [title of an article].”



