Considerations and Limitations for Speech to Text in Agentforce

Before using the Speech to Text action, review the considerations and limitations.

View supported editions.
This article applies to:	Agentforce Employee Agent, Agentforce Builder, Flow, REST API, Apex
This article doesn’t apply to:	Legacy Chat, Standard Messaging Channels

General Considerations

The Speech to Text agent action isn’t supported in the Government Cloud.
The Speech to Text action converts supported audio files into text and returns the transcript as output.
The Speech to Text action can be added to any subagent and any agent.
This action is exposed through the invocable actions API and is available through agent actions, REST, Flow, and Apex.
Audio in unsupported or mixed languages might not be fully recognized.
This action can be used in Flow and automation scenarios.
This action is subject to invocation limits based on available memory in your organization. Requests that exceed the limit may fail.
Processing time may vary based on audio length, file size, audio quality, and content complexity.
Error handling can be managed using Flow fault paths.
Files larger than 5 MB aren't supported.
Translation isn't supported.
Real-time microphone capture isn't supported.
Speaker identification, timestamps, and confidence scores aren't available.
When adding the Speech to Text action to a subagent, ensure that the subagent instructions prompt for either a valid audio file ID or a file name. If a file name is provided, the subagent must include the Query Records action to resolve the file name to an audio file ID before invoking Speech to Text action.

Item	Supported Value
File Formats	MP3, WAV, FLAC, OGG, OGA, AMR, MPEG, MPGA
Maximum File Size	5 MB
Input Type	Audio File ID
Output	Converted Text

The Speech to Text action automatically detects the language of the audio.
Language detection of the audio is automatic and can’t be configured.
Mixed-language or unsupported-language audio may work, but transcription accuracy may vary.
Transcription quality may vary depending on the detected language. If you plan to use the transcript in Agentforce downstream features, verify that the detected language is supported. See Generative AI Supported Languages.
This action transcribes audio to text only and doesn’t perform translation. If the transcript will be used in Agentforce conversations or actions, verify that the transcript language is supported. See Generative AI Supported Languages.

Digital Wallet Card	usage type	Description	notes
Flex Credits	Speech to Text	Speech to Text converts audio input into text using Automatic Speech Recognition (ASR) models. Flex credit usage is metered based on the duration of audio processed, measured in seconds.	Refer to Flex Credits Billable Usage Types

If the Speech to Text action can't process a request, it returns an error code and a message.

Error code	Description	Recommended action
REQUIRED_FIELD_MISSING	A required parameter is missing.	Specify a value for the required parameter and try again.
INVALID_INPUT	The request contains invalid input.	Verify that the input values, including the audio file ID and format, are valid and try again.
LIMIT_EXCEEDED	The audio file exceeds the supported size or the request limit was exceeded.	Use a smaller audio file or retry the request later.
RECORD_NOT_FOUND	No audio file was found for the provided ContentDocument ID.	Verify that the ContentDocument ID is correct and accessible.
INSUFFICIENT_ACCESS_OR_READONLY	Access to the feature or audio file isn’t available.	Contact your Salesforce admin to get the required access.
UNKNOWN_EXCEPTION	The audio file couldn’t be transcribed or an unexpected error occurred.	Retry the action. If the issue persists, contact your Salesforce admin.
INVALID_INPUT_FORMAT	The audio file format isn’t supported.	Use a supported audio format such as MP3, WAV, FLAC, OGG, OGA, AMR, MPEG, or MPGA.

Use these errors to configure fault handling in Flow or error handling in Apex.

Did this article solve your issue?

Let us know so we can improve!