Print this page

Why are search results different between Production and Sandbox for Japanese characters

Knowledge Article Number 000233421
Description Issue: 
Customers may find search results returned by global search are different between NA, EU and CS instances compared to AP instances for Japanese characters when the language is set to Japanese.

Resolution Cause: 
AP1, AP2, AP3 and AP4 uses a new Japanese tokenizer which is only used when the user language is set to Japanese.

For example,  there is one record with field value "横河電機株式会社" and the user language is Japanese.  The user tries to search with query string "横河".   First of all,  the field value will be tokenized as [横河] [電機] [株式] [会社] regardless old or new Japanese tokenizer.   If the org is not on an AP instance then the old Japanese tokenizer is used and the query string will be tokenized as [横河] and the record will return via global search.  On the other hand if the organization is located on an AP instance then the query string will be tokenized as [横] [河] by the new Japanese tokenizer and the record will not return via global search. 

If user language is in English and searches for 横河, it will be tokenized in both Chinese([横河]) and Japanese( [横] [河]). So the query 横河 becomes [横河] [横] [河] and it matches the record.

Please note this scenario(query string did not match with field value even if query string is included in the field value) is very common for quite a few words because of morphology tokenization.

Additional reference:
How Does Search Handle Terms in Chinese, Japanese, Korean, and Thai?

promote demote