Loading
Feature degradation | Gmail Email delivery failureRead More
Agentforce and Einstein Generative AI
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Large Language Model Limits

          Large Language Model Limits

          Understand limits for supported large language models (LLMs) from multiple providers for embedded features, such as Prompt Builder. Limits for each model include requests per minute and token limits.

          Rate Limits

          There’s a default rate limit of 1,000 LLM generation requests per minute (RPM) per org for many models in production orgs. However, depending on a model’s usage and the available capacity from model providers, the RPM limit can vary by model. For the RPM limit per org for individual models, see the Salesforce-Managed Model Limits section.

          For sandboxes, the default rate limit is 500 generation requests per minute per org.

          Salesforce-Managed Model Limits

          This table lists the limits for Salesforce-managed models that are available for embedded features, such as Prompt Builder.

          Model Provider Model Family Maximum Requests Per Minute (RPM) Maximum Tokens Per Minute (TPM)1 Maximum Input Tokens Maximum Output Tokens
          Bedrock (Amazon) Nova Lite 500 3 million 300,000 5,000
          Bedrock (Amazon) Nova Pro 500 3 million 300,000 5,000
          Bedrock (Anthropic) Claude Haiku 4.5 250 3 million 200,000 8,192
          Bedrock (Anthropic) Claude Opus 4.5 300 1 million 200,000 8,192
          Bedrock (Anthropic) Claude Opus 4.6 (Beta) 300 1 million 1,000,000 128,000
          Bedrock (Anthropic) Claude Opus 4.7 (Beta) 300 1 million 1,000,000 128,000
          Bedrock (Anthropic) Claude Sonnet 4 100 3 million 200,000 8,192
          Bedrock (Anthropic) Claude Sonnet 4.5 500 3 million 200,000 8,192
          Bedrock (Anthropic) Claude Sonnet 4.6 500 3 million 200,000 8,192
          Bedrock (NVIDIA) Nemotron 3 Nano 30B (Beta) 1,000 5 million 256,000 8,192
          OpenAI and Azure OpenAI GPT-4o (GPT 4 Omni) 1,000 5 million 128,000 16,384
          OpenAI GPT-4o Mini 1,000 5 million 128,000 16,384
          OpenAI and Azure OpenAI GPT-4o-mini (GPT 4 Omni Mini) 1,000 5 million 128,000 16,384
          OpenAI and Azure OpenAI GPT-4.1 500 2 million 128,000 32,768
          OpenAI and Azure OpenAI GPT-4.1 Mini 500 2 million 128,000 32,768
          OpenAI and Azure OpenAI GPT-5 500 2 million 272,000 128,000
          OpenAI and Azure OpenAI GPT-5 Mini 500 2 million 272,000 128,000
          OpenAI and Azure OpenAI GPT 5.1 300 2 million 272,000 128,000
          OpenAI and Azure OpenAI GPT 5.2 300 2 million 272,000 128,000
          OpenAI and Azure OpenAI GPT 5.4 250 2 million 1,050,000 128,000
          OpenAI and Azure OpenAI GPT 5.4 Mini (Beta) 250 2 million 400,000 128,000
          OpenAI and Azure OpenAI GPT 5.5 (Beta) 250 2 million 1,050,000 128,000
          OpenAI and Azure OpenAI O3 500 2 million 200,000 100,000
          OpenAI and Azure OpenAI O4 Mini 500 2 million 200,000 100,000
          Vertex AI (Google) Gemini 2.5 Flash 250 2 million 1,048,576 65,536
          Vertex AI (Google) Gemini 2.5 Flash Lite 250 2 million 1,048,576 65,536
          Vertex AI (Google) Gemini 2.5 Pro 250 1 million 1,048,576 65,536
          Vertex AI (Google) Gemini 3 Flash (Beta) 100 2 million 1,048,576 65,536
          Vertex AI (Google) Gemini 3 Pro (Beta) 50 1 million 1,048,576 65,536
          Vertex AI (Google) Gemini 3.1 Flash Lite (Beta) 100 2 million 1,048,576 65,536
          Vertex AI (Google) Gemini 3.1 Pro (Beta) 50 1 million 1,048,576 65,536

          1Maximum Tokens Per Minute (TPM) is measured against the sum of input and output tokens.

          Data Masking Token Limit

          When data masking is turned on in the Einstein Trust Layer, all models are currently limited to a context size of 65,536 tokens. To turn off data masking and use the full context window, see Set up Einstein Trust Layer.

           
          Loading
          Salesforce Help | Article