Loading
Agentforce and Einstein Generative AI
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Content Safety and Security

          Content Safety and Security

          Einstein Trust Layer includes a set of policies to help detect potentially harmful content and malicious attacks that attempt to compromise the safety and security of AI applications.

          Required Editions

          Available in: Enterprise, Performance, and Unlimited Editions with an Einstein for Sales, Einstein for Platform, Einstein for Service, Einstein 1 Service, or Einstein GPT Service add-on. To purchase add-ons, contact your Salesforce account executive.

          Harmful Content

          Harmful content refers to information that can have detrimental effects on individuals or communities. In the context of generative AI models, harmful content refers to data in prompts or responses that negatively impact mental health, behavior, or well-being. Harmful content can include toxic material such as hate and violence or content that has unfair or discriminatory patterns.

          Large Language Models (LLMs) can inadvertently generate harmful content due to several reasons:

          • Prompt influence: The language in the prompt directly influences the model’s output. For example, if the prompt contains offensive or harmful phrases, the model can incorporate similar language in its response.
          • Training data: LLMs learn from vast amounts of data, which can include biased or toxic content. If the training data contains harmful language, the model can inadvertently reproduce it.
          • Contextual patterns: LLMs generate responses based on statistical patterns in the data. If harmful language appears frequently in similar contexts, the model can replicate those patterns.
          • Fine-tuning and transfer learning: Fine-tuning LLMs on specific tasks can introduce biases. Transfer learning from unrelated domains can also affect content generation.

          Einstein Trust Layer uses machine learning models to identify harmful content in generative AI applications and features.

          Prompt Injections

          Prompt injections are attempts to make the LLM do something that it isn’t designed to do. Hackers can create prompts that attempt to override the system policies or manipulate the LLM into doing something unintended.

          Salesforce provides prompt defense mechanisms to help mitigate risks posed by malicious attacks.

          Prompt injection detection together with the system policies provide an in-depth approach to prompt defense. Prompt injection defense is consistently applied to all user prompts, bolstering security in Agentforce and embedded AI applications.

          • We have built-in system policies to help limit hallucinations and decrease the likelihood of unintended or harmful outputs by the LLM.
          • Prompt injection detection is used to help detect malicious attacks that attempt to manipulate the LLMs into doing something it wasn’t designed to do.
           
          Loading
          Salesforce Help | Article