Loading

Surrogate Pair Characters Are Treated as Two Characters

Date de publication: Sep 10, 2025
Description

Issue

Although surrogate pair characters (e.g., "𠮟", "𠀋") display as 1 character, formula string functions (LEFT, LEN, MID, RIGHT, etc.) treat them as 2 characters internally.

  • LEN("𠮟") returns 2, not 1.

  • Extracting the high surrogate via LEFT("𠮟", 1) results in "?" because the character cannot be interpreted correctly.

Cause

Surrogate pair characters consist of a combination of two values: a "high surrogate" and a "low surrogate." Consequently, the Salesforce formula engine counts these as two separate characters.

Scenario: 

For example, a Salesforce admin building a formula to count characters in a text field containing Japanese kanji extended characters notices that LEN() returns an unexpected count.

Résolution

To correctly manipulate surrogate pair characters in formulas, you must treat them as two characters.

For example, to retrieve the surrogate pair character "𠮟", use the following syntax: LEFT("𠮟", 2)

Ressources supplémentaires

This behavior is also observed in the String class. Please refer to the following article for more details.

See also:

Numéro d’article de la base de connaissances

005166849

 
Chargement
Salesforce Help | Article