Prompt Injection Detection (Beta)

Prompt Injection, sometimes referred to as “jailbreak,” is a large language model (LLM) vulnerability. Prompt injections are attempts to make the LLM do something that it isn’t designed to do. Hackers can create prompts that attempt to override the system policies or manipulate the LLM into doing something unintended.

Required Editions

Available in: Enterprise, Performance, and Unlimited Editions with an Einstein for Sales, Einstein for Platform, Einstein for Service, Einstein 1 Service, or Einstein GPT Service add-on. To purchase add-ons, contact your Salesforce account executive.

Prompt injections can be introduced directly in the prompt text or indirectly through data sources included in the prompt.

Salesforce Prompt Injection Detection models recognize these types of prompt injection attacks:

Type	Description
Pretending (role-play)	Prompts that instruct the AI to adopt a different system persona with malicious intent, and use deceptive or misleading language to manipulate it in social engineering attacks
Privilege escalation or attempts to change system rules	Prompts that have harmful commands to get around or change system rules and bypass safety training for language models, including attacks that break security restrictions such as Do Anything Now (DAN) jailbreak attacks
Prompt leakage intent	Prompts designed to gather sensitive information from the language model, such as the system policies and knowledge documents, to gain unauthorized information
Encoding attacks	Using obfuscated or hidden messages in prompts to make a language model produce malicious, unaligned, or toxic content
Privacy attacks	Prompts that try to get personal or confidential information to gain unauthorized access to data or misuse the information
Malicious code generation	Prompts that attempt to produce harmful computer code, such as malware, viruses, or tools or software designed to commit fraud and other malicious intent.

If prompt injection is detected, the content is rated and the score is logged in the audit trail and stored in Data 360. You can view prompt injection scores in Data 360 DMOs.

Important Although our prompt injection detection models have shown to be effective during internal testing, it's important to note that no model can guarantee 100% accuracy. With trust as our priority, we're dedicated to the ongoing evaluation and refinement of our models.

Prompt Injection Detection (Beta)

Required Editions

See Also

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Prompt Injection Detection (Beta)

Required Editions

See Also