You are here:
Safety and Security - Prompt Injection Detection Control
Detects and mitigates prompt injection attacks where users attempt to overwrite system instructions to force the AI into unintended or malicious behaviors.
Control Name
Einstein Trust Layer - Prompt Injection Detection
Control Overview
Detects and mitigates prompt injection attacks where users attempt to overwrite system instructions to force the AI into unintended or malicious behaviors.
Description
Monitors the prompt journey to identify adversarial patterns, such as "ignore previous instructions" or "system override" commands, blocking or flagging the request before it reaches the LLM.
Recommended Configuration
Enable "Prompt Injection Detection" in the Einstein Trust Layer settings. Make sure that the Einstein Audit Trail is enabled to log events for injection attempts.
Security Impact
Prevents the AI from being reprogrammed by a user to leak internal data, generate prohibited content, or bypass the ethical boundaries established in the Prompt Template.
Business Impact
Maintains the integrity of AI-driven business processes and prevents operational disruptions caused by users manipulating AI logic for personal or malicious gain.
Security Risk If Not Configured
The LLM may follow malicious user instructions over the System Prompt, leading to unauthorized data disclosure, social engineering, or the execution of unauthorized workflows.
Threat Scenarios
Prompt Injection: A user tricks the AI into performing unwanted actions, leading to data leakage.
Estimated CVSS Score Range
Critical (9.0–10.0).
Risk Impact Considerations
Extreme risk for agentic workflows where the AI can execute actions (for example, updating records or sending emails) based on user prompts.
Higher Risk When
Using external/third-party LLMs without the Salesforce Trust Layer intermediary, or when Prompt Templates are poorly constructed with least context.
Low Risk When
Prompt injection detection is active, the org uses Salesforce-hosted models (which have built-in defense), and Least Privilege is applied to the AI's data access.
Business and Integration Considerations
Strict detection may occasionally flag complex, legitimate user prompts as adversarial. Admins should monitor the Einstein Trust Layer audit feedback data to fine-tune sensitivity.
Security Health Review Guidance
Security Health Review scans the Einstein Trust Layer Setup to confirm that prompt injection detection is enabled.
Who Is Impacted
Security teams, AI architects, developers, and end-users interacting with Prompt Builder or Agentforce agents.

