Scorers and Custom Scorers (Beta)
Scorers (Beta) are evaluation components in Agentforce Studio that analyze agent sessions and produce scores, dimensions, and measures for Agentforce Optimization and Analytics. Pair Salesforce standard scores with custom evaluations for your KPIs, create custom scorers using Next Gen Testing, apply them to sessions, and use the outputs in Optimization and Analytics to prioritize agent improvements.
Required Editions
| Available in: Enterprise, Performance, and Unlimited Editions with an Einstein for Sales, Einstein for Platform, Einstein for Service, Einstein 1 Service, or Einstein GPT Service add-on. To purchase add-ons, contact your Salesforce account executive. |
Scorers are a beta service that is subject to the Beta Services Terms at Agreements - Salesforce.com or a written Unified Pilot Agreement if executed by Customer, and applicable terms in the Product Terms Directory. Use of this beta service is at the Customer's sole discretion.
Where Scorers Appear in Optimization and Analytics
- Next Gen Testing — Create custom scorers as part of a new or an existing test suite, refine and publish it.
- Scorer Hub (Agentforce Studio) — Central place to view, activate, and manage scorers.
- Sessions and Intents table — Sessions show scorer results; filter and analyze by custom scorer dimensions.
- Session Page — Sessions show associated scorer labels and scores.
- Analytics dashboards — Custom scorer measures appear as metrics for reporting.
Considerations and Limitations
- Available in Lightning Experience in Enterprise, Performance, and Unlimited editions with Salesforce Foundations or Agentforce 1 Edition where Agent Optimization and Analytics are enabled.
- Custom Scorers are in Beta and require no additional license beyond Agentforce.
- During Beta, only session-level scorers are supported.
- Custom scorers are not supported for Agentforce Employee Agent (AEA) agent types.
- SDR agent type appears in the Scorer Hub but not in the Optimization UI agent dropdown.
- Standard scorers (A&D, Quality Score) are created when STDM is provisioned. If they are missing, check for a provisioning issue.
- You must clone a standard scorer before editing it; direct edit is not supported during Beta.
- Expression-based scorers require a Boolean output type.
- Numeric scorers on a 0–1 scale are not recommended because LLMs perform poorly on very narrow numeric ranges.
- Testing Center sessions are not scored by custom scorers. Testing Center sessions don't receive an end timestamp, so the data action that triggers custom scorer evaluation is never fired.
Overview
Scorers evaluate agent sessions and generate scores, dimensions (Text values), and measures (Numeric values). They're part of Agentforce Optimization and Observability. They run automatically against a configurable percentage of sessions and surface results in Optimization and Analytics, so admins and developers can improve agent performance continuously.
There are two kinds of scorers:
- Standard scorers are provided by Salesforce out of the box.
- Custom scorers are defined by your organization to evaluate business-specific criteria.
Standard Scorers
Standard scorers are pre-built evaluations that are created automatically with STDM provisioning. They run on agent sessions without additional customer configuration.
Current Standard Scorers
- Abandonment Score — Indicates whether the customer ended the session prematurely before the agent resolved their issue.
- Deflection Score — Indicates whether the agent successfully deflected an interaction from requiring human escalation.
- Quality Score — Measures overall session quality based on predefined Salesforce criteria.
Standard scorers use a reserved location in the user interface and are separate from the custom scorer list.
You can't edit a standard scorer directly. Clone it to create a new custom version with modified logic.
Custom Scorers
Custom scorers let you define evaluation logic and apply it to sessions. They use an LLM-as-a-judge approach (prompt-based evaluation) or expression-based logic to label, score, or classify sessions against your criteria.
You create custom scorers in Next Gen Testing (NGT) in Agentforce Studio. See Set Up and Run Tests in Agentforce Studio (Beta). For more information, see Next Gen Testing on Slack.
Custom scorers run against sessions and surface custom scores and dimensions in Observability.
How Custom Scorers Work
A custom scorer is backed by a prompt template that runs automatically as a batch job when the session ends. After a scorer runs on a session, the session is associated with that scorer, and labels and scores appear in Optimization and Analytics.
- A scorer is a prompt template that runs on a configurable percentage of sessions (sample rate).
- Scores are applied at session end, not during the live conversation. Sessions must receive an end timestamp for the scoring data action to fire.
- Results appear in Observability as custom measures.
- You can also trigger scorers manually by using the
executeAgentforceScoresJobinvocable action. - Scorers don't affect the agent's latency or performance during live conversations because scoring runs after the session ends.
Scorer Configuration
Key Fields
| Field | Description |
|---|---|
scorerApiName |
Unique API name for the scorer. |
status |
Lifecycle status of the scorer (for example, Draft or Available). |
isDraft |
Boolean. Indicates whether this scorer definition is a draft. |
versionNumber |
Integer version of the scorer definition. |
engine |
Evaluation engine: LLM-based or expression-based. |
inputScope |
Granularity: session-level (default for Beta and general availability) or interaction-level (planned). |
dataType |
Output type for the scorer (for example, Text or Numeric). |
scorerValues / valuesSpecification |
Provide exactly one. Defines output labels or the numeric range for the scorer. |
promptTemplateRef |
Reference to the prompt template for LLM-based evaluation. The template must be active. |
agentApiName |
Optional. Associates the scorer with a specific agent. Only one agent API name is allowed. |
samplingRate |
Percentage of sessions the scorer runs against. |
Validation Rules
scorerApiNamemust not already exist.promptTemplateRefmust exist and be active.- For Text data type: scorer values must exist; exactly one value must have
isFallback: true; exactly one must haveisSystemFallback: true(only whenisFallbackis also true). - For Numeric data type: no fallback values; values must parse as valid doubles. If you use
valuesSpecification:stepmust be greater than 0;minmust be less thanmax;thresholdis optional but must be betweenminandmax. - Maximum number of scorer values: 101.
Output Types
- Text (labels) — The scorer assigns one of a set of predefined string labels to each session (for example, Resolved, Abandoned, or Escalated). Use for categorical classification.
- Numeric — The scorer returns a number in a defined range (for example, 0–5). Suited to quality ratings and continuous scores. LLMs typically perform better on discrete scales (such as 0–5) than on narrow 0–1 ranges.
- Boolean (stretch or future) — Returns true or false. Supported for expression-based scorers.
Run Scorers in Production
Activate from Scorer Tab
After you create and test a custom scorer, activate it from the Scorer Hub in Agentforce Optimization. Activation lets the scorer run on live sessions at the configured sample rate.
| Access and view Agentforce Optimization | Assign the Access Agentforce Optimization and Data Cloud User permission sets |
| Use Scorer Tab (Beta) | Assign the Agentforce Scorer Beta permission set (no additional license beyond Agentforce is required) |
| Create a scorer | Assign the Next Gen Testing (NGT) access permission set |
| Activate a scorer | Assign the AgentforceScorerActivation permission set (included in the admin profile by default) |
Run from Salesforce Flow
We provide a new invocable action, TriggerAgentBulkScoring, to execute custom
scorers asynchronously on historical sessions. This action accepts multiple sessions and scorers
and is accessible via:
- Salesforce Flow
- REST API
- Apex
Action Name: TriggerAgentBulkScoring
| Input Parameter | Type | Description |
|---|---|---|
InputIds |
List<String> | List of up to 500 unique session IDs. |
InputScope |
Enum | Scope of the scoring: Session, Moment, or Interaction. |
ScoresApiNames |
List<String> | Up to 10 developer names for the scorers to be executed. |
Important Notes
- All session IDs and
scorerApiNamesmust belong to the same agent. - Ensure the scorer version is Available before execution.
REST API
Send a POST request to the standard action REST endpoint.
/services/data/v66.0/actions/standard/triggerAgentBulkScoring
Example payload:
{
"inputs": [
{
"inputIds": ["019c9442-b760-7d62-837c-8ab80ecc0fc2"],
"inputScope": "Session",
"scorerApiNames": ["language_classifier"]
}
]
}Use the executeAgentforceScoresJob invocable action in Flow. The action accepts
a list of session IDs and scorers and submits a job to the evaluation pipeline. Common uses
include:
- Running scorers on historical sessions
- Batch analysis and custom reporting
- Integration with Agentforce Grid
Metadata API
Create and manage custom scorers with the Salesforce Metadata API so you can deploy scorers in pro-code and DevOps workflows, including CI/CD pipelines.
Use Cases
| Use Case | User Goal | How Scorers Help |
|---|---|---|
| Monitor agent quality | Understand how well the agent resolves customer issues | Standard scorers (Quality Score, A&D) run automatically and surface results in Analytics without configuration |
| Evaluate business-specific outcomes | Measure custom KPIs (for example, topic classification or compliance) | Custom scorers apply your LLM prompts to sessions and expose results as custom dimensions |
| Iterate and improve agents | Find patterns in weak sessions and refine instructions | Observability shows scorer results per session so you can drill into flagged interactions |
| Test scorers before production | Confirm a new scorer behaves as expected before activation | Testing Center in Agentforce Studio supports creating, running, and refining scorers against test cases. |
| Run scorers at scale | Evaluate large batches of historical sessions | executeAgentforceScoresJob supports batch processing and custom
reporting |
| Manage scorers as code | Deploy scorers through CI/CD | Metadata API support enables pro-code creation and management |
| Customize evaluation criteria | Adjust or extend out-of-the-box scoring | Clone a standard scorer to create a custom version with modified prompt logic |

