Loading
Salesforce now sends email only from verified domains. Read More
Agentforce and Einstein Generative AI
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Prompt Injection Detection (Beta)

          Prompt Injection Detection (Beta)

          Prompt Injection, sometimes referred to as “jailbreak,” is a large language model (LLM) vulnerability. Prompt injections are attempts to make the LLM do something that it isn’t designed to do. Hackers can create prompts that attempt to override the system policies or manipulate the LLM into doing something unintended.

          Required Editions

          Available in: Enterprise, Performance, and Unlimited Editions with an Einstein for Sales, Einstein for Platform, Einstein for Service, Einstein 1 Service, or Einstein GPT Service add-on. To purchase add-ons, contact your Salesforce account executive.

          Prompt injections can be introduced directly in the prompt text or indirectly through data sources included in the prompt.

          Salesforce Prompt Injection Detection models recognize these types of prompt injection attacks:

          Type Description
          Pretending (role-play) Prompts that instruct the AI to adopt a different system persona with malicious intent, and use deceptive or misleading language to manipulate it in social engineering attacks
          Privilege escalation or attempts to change system rules Prompts that have harmful commands to get around or change system rules and bypass safety training for language models, including attacks that break security restrictions such as Do Anything Now (DAN) jailbreak attacks
          Prompt leakage intent Prompts designed to gather sensitive information from the language model, such as the system policies and knowledge documents, to gain unauthorized information
          Encoding attacks Using obfuscated or hidden messages in prompts to make a language model produce malicious, unaligned, or toxic content
          Privacy attacks Prompts that try to get personal or confidential information to gain unauthorized access to data or misuse the information
          Malicious code generation Prompts that attempt to produce harmful computer code, such as malware, viruses, or tools or software designed to commit fraud and other malicious intent.

          If prompt injection is detected, the content is rated and the score is logged in the audit trail and stored in Data 360. You can view prompt injection scores in Data 360 DMOs.

          Important
          Important Although our prompt injection detection models have shown to be effective during internal testing, it's important to note that no model can guarantee 100% accuracy. With trust as our priority, we're dedicated to the ongoing evaluation and refinement of our models.
           
          Loading
          Salesforce Help | Article