Loading

Data 360: Understanding High Hybrid and Vector Scores in Search Results

Udgivelsesdato: Oct 29, 2025
Beskrivelse

In search systems utilizing hybrid scoring, users may notice that search results receive high scores even if they do not appear highly relevant to the input query. This behavior is expected because normalization is applied to scores regardless of the number of results returned, leading to relatively high hybrid scores.

When considering vector scores, embeddings such as e5-large-v2 are used. These embeddings are trained through contrastive learning on large, multi-domain datasets. As a result, even moderately related inputs can achieve high cosine similarity scores (typically in the range of 0.7 to 0.9). The model is intentionally designed to be more forgiving of partially related texts compared to older models.

It is important to keep in mind:

    • The absolute hybrid score should not be used to determine a document's relevance in isolation.

    • Hybrid and vector scores are intended to measure how relevant a result is relative to other results, not the absolute relevance of the document itself.

Løsning
  • Use hybrid and vector scores for comparison between results, not as standalone indicators of relevance.

  • If vector scores seem unexpectedly high, consider the nature of the embedding model and the multi-domain training that could contribute to high similarity.

Vidensartikelnummer

004693442

 
Indlæser
Salesforce Help | Article