In search systems utilizing hybrid scoring, users may notice that search results receive high scores even if they do not appear highly relevant to the input query. This behavior is expected because normalization is applied to scores regardless of the number of results returned, leading to relatively high hybrid scores.
When considering vector scores, embeddings such as e5-large-v2 are used. These embeddings are trained through contrastive learning on large, multi-domain datasets. As a result, even moderately related inputs can achieve high cosine similarity scores (typically in the range of 0.7 to 0.9). The model is intentionally designed to be more forgiving of partially related texts compared to older models.
It is important to keep in mind:
The absolute hybrid score should not be used to determine a document's relevance in isolation.
Hybrid and vector scores are intended to measure how relevant a result is relative to other results, not the absolute relevance of the document itself.
Use hybrid and vector scores for comparison between results, not as standalone indicators of relevance.
If vector scores seem unexpectedly high, consider the nature of the embedding model and the multi-domain training that could contribute to high similarity.
004693442

We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.