Print this page

How do CJK delimiters work in SOSL?

Knowledge Article Number 000198636
Description In Chinese, Japanese, and Korean (CJK), words are delimited by pairs of CJK-type characters.
Please provide examples to understand these delimiters
Resolution The CJK character is detected based on unicode character map, and if CJK character detects continuously more than 3 chars, it is tokenized into character pairs. For instance, the data is "一二三四五"(no space between characters), it's tokenized into [一二][二三][三四][四五] known as bigrams both indexing and querying. 




promote demote