Data 360: Understanding High Hybrid and Vector Scores in Search Results

Udgivelsesdato: Oct 29, 2025

Beskrivelse

In search systems utilizing hybrid scoring, users may notice that search results receive high scores even if they do not appear highly relevant to the input query. This behavior is expected because normalization is applied to scores regardless of the number of results returned, leading to relatively high hybrid scores.

When considering vector scores, embeddings such as e5-large-v2 are used. These embeddings are trained through contrastive learning on large, multi-domain datasets. As a result, even moderately related inputs can achieve high cosine similarity scores (typically in the range of 0.7 to 0.9). The model is intentionally designed to be more forgiving of partially related texts compared to older models.

It is important to keep in mind:

- The absolute hybrid score should not be used to determine a document's relevance in isolation.
- Hybrid and vector scores are intended to measure how relevant a result is relative to other results, not the absolute relevance of the document itself.

Løsning

Use hybrid and vector scores for comparison between results, not as standalone indicators of relevance.
If vector scores seem unexpectedly high, consider the nature of the embedding model and the multi-domain training that could contribute to high similarity.

Vidensartikelnummer

004693442

Løste denne artikel dit problem?

Giv os besked, så vi kan forbedre os!

Data 360: Understanding High Hybrid and Vector Scores in Search Results

How Data 360 Hybrid Search Combines Keyword and Vector Retrieval to Elevate the Search Experience

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List