Considerations for Speech To Text Integration
Review these important considerations for setting up speech-to-text. These include changing session limits and silence detection, designing the user interface and theme, stopping sessions automatically, handling errors, and setting up offline language packs.
Required Editions
Available in: Lightning Experience Available in: Enterprise and Unlimited Editions where Consumer Goods Cloud is enabled |
Speech-to-Text Session Limits and Customization
You can configure the speech duration and silence detection settings to fit your needs.
- Session Limits: You can start only one transcription session each time you open the Speech-to-Text screen.
- Speech Duration: Recording will stop automatically if the total speech length reaches
the set limit. You can choose any duration between 1 and 180 seconds. By default, the
limit is set to 180 seconds (3 minutes). See
maxDurationSec. - Silence Detection: Recording will automatically stop if there’s no sound for a set
amount of time. You can choose a threshold from 1 to 30 seconds. By default, recording
stops after 15 seconds of silence. See
silenceTimeoutSec. - UI Text: You can customize the title and the information text.
User Interface and Theming
When updating your branding, keep these design constraints in mind.
You can customize the App Bar that appears when Speech-to-Text opens. These elements can’t be themed and will keep their default appearance:
- Info text
- Send, Start Listening, and Stop Listening buttons
- Input field
- Alert modals
The input text field doesn’t show a scrollbar if the transcribed text goes beyond the visible area. This is expected behavior.
Operational Behavior
The transcription session ends automatically for users in these situations.
| Trigger Type | Scenario |
|---|---|
| Manual stop | The user taps Stop Listening. |
| Total timeout | The total speech duration reaches the configured limit. In the default setting, you can speak for up to 15 seconds. |
| Silence threshold | The user has been silent for more than 180 seconds, which is the default time limit. |
| Focus change | The user taps the text field to manually edit the text. |
| App in background | The app moves to the background when the user goes to the home screen or switches to another application. |
| Cancellation | The user stops the transcription by tapping Back and confirming the choice. |
System and Platform Specifics (Android)
These technical requirements and constraints are for Android.
- Offline Priority: By default, the Voice SDK uses the offline transcription even if there’s an internet connection available.
- Device Requirements: The device must have built-in speech recognition. Compatibility with older Android versions may vary, as testing on older physical devices is limited.
- Deep Links: Deep links are disabled when the Speech-to-Text screen is open.
Error Handling and Language Support
If transcription doesn’t work, an error appears and users can switch to manual text entry. This may happen when your:
- Device doesn’t support speech recognition.
- Device doesn’t support offline speech recognition.
- The selected language is not supported or not available for offline recognition.
- The required language model isn’t downloaded on the device.
Language Pack Setup
Sales reps can enable offline recognition by downloading language packs to their Android mobile device and setting them up. For detailed instructions, check the documentation for your specific mobile device.

