Loading
Salesforce now sends email only from verified domains. Read More
About Salesforce Data 360
Table of Contents
Select Filters

          No results
          No results
          Here are some search tips

          Check the spelling of your keywords.
          Use more general search terms.
          Select fewer filters to broaden your search.

          Search all of Salesforce Help
          Optimizing Search Indexes: Field Selection and Chunking

          Optimizing Search Indexes: Field Selection and Chunking

          When you create a search index in the advanced builder, you can optimize your search index to deliver more accurate results by paying attention to the field selection and chunking strategies you use.

          Indexing Text Fields

          If you want to add text fields when creating a search index, select text fields with longer, free-text content. You can even index multiple text fields from a DMO. For example, if you select the Description, Summary, Content, and Resolution fields from a DMO, Data 360 stores all corresponding vectors in the same search index.

          You can separate vectors on the basis of the DataSource__c field in the Index DMO. The DataSource__c field contains the original field name. Because this field is in the Index DMO, you can use it in a retriever’s prefilter. For example, you can configure a retriever to evaluate queries on semantic similarity to a specific field only, such as Description and not Resolution.

          Avoid selecting too many similar fields or redundant fields (for example, Summary, Title, and Description). Doing so can lead to decreased recall if your retriever doesn’t have prefilters on DataSource__c. Because these fields all likely contain the same, or very similar information, at least three chunks (one chunk for each field) from the same document can appear highly ranked in the query results. These bring the same information to the LLM, and if you configure the retriever to retrieve, for example, nine results, only three documents will be represented in the results. This reduces variation in your search results and can lead to documents being missed.

          We recommended that when two or more fields represent the same content, but in a different form, select the field with the longest text, such as Description. Consider prepending that field with a shorter, more condensed version, such as Title.

          Tip
          Tip Don’t select categorical columns as index fields. Categorial data is single-word or two-word descriptors that map to a picklist in Salesforce. To create useful results, semantic search requires a longer textual scope and more context.

          Using Prepend Fields

          One way to optimize your chunking strategy is to use prepend fields to add context to chunks and make them easier to identify. For example, suppose you have a chunk that contains a sequence of troubleshooting steps. By prepending that chunk with the Title field that contains the text “How to Fix Device X When It Shows Behaviour Y,” you make it easier to identify that content as relevant to a user’s question. Prepending fields like Title or Product Name add those values to a chunk, which makes them visible in prompt augmentation or in the Data 360 Query Editor.

          Adjusting Chunk Size

          Another way to optimize chunking is to tune the chunk size.

          When you create a default search index, Data 360 uses the semantic-based passage extraction markers to chunk your content into pieces as small as possible. Data Cloud then lumps the chunks back together until it reaches the chunk size you specify, or the default maximum chunk size (512 tokens).

          You’ll find this requires some experimentation, as the optimal chunk size and strategy varies per RAG or agent implementation.

          For more information, refer to How the Max Token Setting Affects Chunking.

          Optimizing Chunks for Retrieval

          When planning the size of your chunks for retrieval, consider the information density and organizational structure of the content you’re chunking. Remember that one chunk results in one vector. All the content in the chunk is represented in this single vector. Consider how many words are needed to adequately understand the meaning of a chunk. Will 400 to 500 words work? Or can fewer words sufficiently capture a self-contained, piece of information (possibly enhanced with field prepending or chunk enrichment)? Those are the kinds of questions that should come up in your planning.

          Optimizing Chunks for Prompt Augmentation

          You should also consider chunking from a prompt augmentation perspective. How many chunks does your LLM need to generate a sufficiently usable response? Is a small, individual factoid useful enough, or does your LLM require more context?

          For UDMO-based search indexes, augmentation of content typically relies on chunk size, in which case, chunks need to be larger to include extra context.

          For DMO-based indexes, you have more options because you can use additional fields for augmentation. It’s even possible to augment the prompt using the original document (for example, a knowledge article) instead of a single chunk. This increases the resolution of your prompt resolution, so consider the context window of your LLM in relation to the selected number of results. But keep in mind that such prompts increase the cost of response generation: in other words, if you increase the prompt and response size, you consume more Einstein Requests.

           
          Loading
          Salesforce Help | Article