Prompt Injection, sometimes referred to as “jailbreak,” is a large language model (LLM) vulnerability. Prompt injections are attempts to make the LLM do something that it isn’t designed to do. Hackers can create prompts that attempt to override the system policies or manipulate the LLM into doing something unintended.
Required Editions
Available in: Enterprise, Performance, and Unlimited
Editions with an Einstein for Sales, Einstein for Platform, Einstein for Service,
Einstein 1 Service, or Einstein GPT Service add-on. To purchase add-ons, contact
your Salesforce account executive.
Prompt injections can be introduced directly in the prompt text or indirectly through data sources included in the prompt.
Salesforce Prompt Injection Detection models recognize these types of prompt injection attacks:
Type
Description
Pretending (role-play)
Prompts that instruct the AI to adopt a different system persona with
malicious intent, and use deceptive or misleading language to manipulate
it in social engineering attacks
Privilege escalation or attempts to change system rules
Prompts that have harmful commands to get around or change system
rules and bypass safety training for language models, including attacks
that break security restrictions such as Do Anything Now (DAN) jailbreak
attacks
Prompt leakage intent
Prompts designed to gather sensitive information from the language
model, such as the system policies and knowledge documents, to gain
unauthorized information
Encoding attacks
Using obfuscated or hidden messages in prompts to make a language
model produce malicious, unaligned, or toxic content
Privacy attacks
Prompts that try to get personal or confidential information to gain
unauthorized access to data or misuse the information
Malicious code generation
Prompts that attempt to produce harmful computer code, such as
malware, viruses, or tools or software designed to commit fraud and
other malicious intent.
If prompt injection is detected, the content is rated and the score is logged in the audit
trail and stored in Data 360. You can view prompt injection scores in Data 360 DMOs.
Important Although our prompt injection detection models have shown to be effective during internal testing, it's important to note that no model can guarantee 100% accuracy. With trust as our priority, we're dedicated to the ongoing evaluation and refinement of our models.
We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required Cookies
Always Active
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional Cookies
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising Cookies
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.