You are here:
Large Language Model Limits
Understand limits for supported large language models (LLMs) from multiple providers for embedded features, such as Prompt Builder. Limits for each model include requests per minute and token limits.
Rate Limits
There’s a default rate limit of 1,000 LLM generation requests per minute (RPM) per org for many models in production orgs. However, depending on a model’s usage and the available capacity from model providers, the RPM limit can vary by model. For the RPM limit per org for individual models, see the Salesforce-Managed Model Limits section.
For sandboxes, the default rate limit is 500 generation requests per minute per org.
Salesforce-Managed Model Limits
This table lists the limits for Salesforce-managed models that are available for embedded features, such as Prompt Builder.
| Model Provider | Model Family | Maximum Requests Per Minute (RPM) | Maximum Tokens Per Minute (TPM)1 | Maximum Input Tokens | Maximum Output Tokens |
|---|---|---|---|---|---|
| Bedrock (Amazon) | Nova Lite | 500 | 3 million | 300,000 | 5,000 |
| Bedrock (Amazon) | Nova Pro | 500 | 3 million | 300,000 | 5,000 |
| Bedrock (Anthropic) | Claude Haiku 4.5 | 250 | 3 million | 200,000 | 8,192 |
| Bedrock (Anthropic) | Claude Opus 4.5 | 300 | 1 million | 200,000 | 8,192 |
| Bedrock (Anthropic) | Claude Opus 4.6 (Beta) | 300 | 1 million | 1,000,000 | 128,000 |
| Bedrock (Anthropic) | Claude Opus 4.7 (Beta) | 300 | 1 million | 1,000,000 | 128,000 |
| Bedrock (Anthropic) | Claude Sonnet 4 | 100 | 3 million | 200,000 | 8,192 |
| Bedrock (Anthropic) | Claude Sonnet 4.5 | 500 | 3 million | 200,000 | 8,192 |
| Bedrock (Anthropic) | Claude Sonnet 4.6 | 500 | 3 million | 200,000 | 8,192 |
| Bedrock (NVIDIA) | Nemotron 3 Nano 30B (Beta) | 1,000 | 5 million | 256,000 | 8,192 |
| OpenAI and Azure OpenAI | GPT-4o (GPT 4 Omni) | 1,000 | 5 million | 128,000 | 16,384 |
| OpenAI | GPT-4o Mini | 1,000 | 5 million | 128,000 | 16,384 |
| OpenAI and Azure OpenAI | GPT-4o-mini (GPT 4 Omni Mini) | 1,000 | 5 million | 128,000 | 16,384 |
| OpenAI and Azure OpenAI | GPT-4.1 | 500 | 2 million | 128,000 | 32,768 |
| OpenAI and Azure OpenAI | GPT-4.1 Mini | 500 | 2 million | 128,000 | 32,768 |
| OpenAI and Azure OpenAI | GPT-5 | 500 | 2 million | 272,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT-5 Mini | 500 | 2 million | 272,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT 5.1 | 300 | 2 million | 272,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT 5.2 | 300 | 2 million | 272,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT 5.4 | 250 | 2 million | 1,050,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT 5.4 Mini (Beta) | 250 | 2 million | 400,000 | 128,000 |
| OpenAI and Azure OpenAI | GPT 5.5 (Beta) | 250 | 2 million | 1,050,000 | 128,000 |
| OpenAI and Azure OpenAI | O3 | 500 | 2 million | 200,000 | 100,000 |
| OpenAI and Azure OpenAI | O4 Mini | 500 | 2 million | 200,000 | 100,000 |
| Vertex AI (Google) | Gemini 2.5 Flash | 250 | 2 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 2.5 Flash Lite | 250 | 2 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 2.5 Pro | 250 | 1 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 3 Flash (Beta) | 100 | 2 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 3 Pro (Beta) | 50 | 1 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 3.1 Flash Lite (Beta) | 100 | 2 million | 1,048,576 | 65,536 |
| Vertex AI (Google) | Gemini 3.1 Pro (Beta) | 50 | 1 million | 1,048,576 | 65,536 |
1Maximum Tokens Per Minute (TPM) is measured against the sum of input and output tokens.
Data Masking Token Limit
When data masking is turned on in the Einstein Trust Layer, all models are currently limited to a context size of 65,536 tokens. To turn off data masking and use the full context window, see Set up Einstein Trust Layer.

