Large Language Model Limits

Understand limits for supported large language models (LLMs) from multiple providers for embedded features, such as Prompt Builder. Limits for each model include requests per minute and token limits.

Rate Limits

There’s a default rate limit of 1,000 LLM generation requests per minute (RPM) per org for many models in production orgs. However, depending on a model’s usage and the available capacity from model providers, the RPM limit can vary by model. For the RPM limit per org for individual models, see the Salesforce-Managed Model Limits section.

For sandboxes, the default rate limit is 500 generation requests per minute per org.

Salesforce-Managed Model Limits

This table lists the limits for Salesforce-managed models that are available for embedded features, such as Prompt Builder.

Model Provider	Model Family	Maximum Requests Per Minute (RPM)	Maximum Tokens Per Minute (TPM)¹	Maximum Input Tokens	Maximum Output Tokens
Bedrock (Amazon)	Nova Lite	500	3 million	300,000	5,000
Bedrock (Amazon)	Nova Pro	500	3 million	300,000	5,000
Bedrock (Anthropic)	Claude Haiku 4.5	250	3 million	200,000	8,192
Bedrock (Anthropic)	Claude Opus 4.5	300	1 million	200,000	8,192
Bedrock (Anthropic)	Claude Opus 4.6 (Beta)	300	1 million	1,000,000	128,000
Bedrock (Anthropic)	Claude Opus 4.7 (Beta)	300	1 million	1,000,000	128,000
Bedrock (Anthropic)	Claude Sonnet 4	100	3 million	200,000	8,192
Bedrock (Anthropic)	Claude Sonnet 4.5	500	3 million	200,000	8,192
Bedrock (Anthropic)	Claude Sonnet 4.6	500	3 million	200,000	8,192
Bedrock (NVIDIA)	Nemotron 3 Nano 30B (Beta)	1,000	5 million	256,000	8,192
OpenAI and Azure OpenAI	GPT-4o (GPT 4 Omni)	1,000	5 million	128,000	16,384
OpenAI	GPT-4o Mini	1,000	5 million	128,000	16,384
OpenAI and Azure OpenAI	GPT-4o-mini (GPT 4 Omni Mini)	1,000	5 million	128,000	16,384
OpenAI and Azure OpenAI	GPT-4.1	500	2 million	128,000	32,768
OpenAI and Azure OpenAI	GPT-4.1 Mini	500	2 million	128,000	32,768
OpenAI and Azure OpenAI	GPT-5	500	2 million	272,000	128,000
OpenAI and Azure OpenAI	GPT-5 Mini	500	2 million	272,000	128,000
OpenAI and Azure OpenAI	GPT 5.1	300	2 million	272,000	128,000
OpenAI and Azure OpenAI	GPT 5.2	300	2 million	272,000	128,000
OpenAI and Azure OpenAI	GPT 5.4	250	2 million	1,050,000	128,000
OpenAI and Azure OpenAI	GPT 5.4 Mini (Beta)	250	2 million	400,000	128,000
OpenAI and Azure OpenAI	GPT 5.5 (Beta)	250	2 million	1,050,000	128,000
OpenAI and Azure OpenAI	O3	500	2 million	200,000	100,000
OpenAI and Azure OpenAI	O4 Mini	500	2 million	200,000	100,000
Vertex AI (Google)	Gemini 2.5 Flash	250	2 million	1,048,576	65,536
Vertex AI (Google)	Gemini 2.5 Flash Lite	250	2 million	1,048,576	65,536
Vertex AI (Google)	Gemini 2.5 Pro	250	1 million	1,048,576	65,536
Vertex AI (Google)	Gemini 3 Flash (Beta)	100	2 million	1,048,576	65,536
Vertex AI (Google)	Gemini 3 Pro (Beta)	50	1 million	1,048,576	65,536
Vertex AI (Google)	Gemini 3.1 Flash Lite (Beta)	100	2 million	1,048,576	65,536
Vertex AI (Google)	Gemini 3.1 Pro (Beta)	50	1 million	1,048,576	65,536

¹Maximum Tokens Per Minute (TPM) is measured against the sum of input and output tokens.

Data Masking Token Limit

When data masking is turned on in the Einstein Trust Layer, all models are currently limited to a context size of 65,536 tokens. To turn off data masking and use the full context window, see Set up Einstein Trust Layer.

Large Language Model Limits

Rate Limits

Salesforce-Managed Model Limits

Data Masking Token Limit

See Also

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Large Language Model Limits

Rate Limits

Salesforce-Managed Model Limits

Data Masking Token Limit

See Also