Create a Hybrid Search Index with Advanced Setup

You are here:

Create a Hybrid Search Index with Advanced Setup

Configure a hybrid search index for a data model object (DMO) or an unstructured data model object (UDMO) to provide relevant information to generative AI applications. Data 360 breaks up the referenced data into semantically related chunks and generates searchable vectors, and then generates a vector index and a keyword index.

Required Editions

Available in: All Editions supported by Data 360. See Data 360 edition availability.

User Permissions Needed
To create a search index configuration:	Permission set: Data Cloud Architect

User Permissions Needed

To create a search index configuration:

Permission set:

Data Cloud Architect

Before creating a search index, review the search index considerations in the Search Reference. Familiarize yourself with the concepts of chunking unstructured data, vectorization, and the chunk and index data model objects.

Enable Agentforce for Generative AI Capabilities (Optional)

Some capabilities in search index, including image processing, enriched index, and LLM-based content processing and parsing requires generative AI features available through Agentforce. If you want to use those capabilities, contact your admin to enable Agentforce in your org. Enable Amazon Bedrock Models (Amazon and Anthropic) in Einstein Setup to use enriched indexing.

Select a Search Type and Source Object

To get started specify the type of search you want to perform and select a source DMO.

From App Launcher, select Data Cloud.
Click Search Index > New.
Click Advanced Setup > Next.
On the Select Source DMO page, select Hybrid Search.
Select a data space and a source object.

Note A DMO that has mappings from one or more external DLOs can only be used as the data source if caching is enabled for all those external DLOs. For more information see, Caching in Data Federation.
Click Next.

Set Up Parsing and Visual Data Preprocessing

Next, extract and prepare text, metadata, tables, and images from unstructured documents for chunking.

Note Parsing and preprocessing options are available only for PDF, HTML, DOCX, and PPTX file types from UDMOs, not for data model objects (DMOs), including those that have a file attachment UDMO.

Important You cannot select both LLM-based parsing and LLM-based Visual Data Preprocessing options for a single search index. See LLM-based parsing and preprocessing for more information about these options.

On the Parsing page, choose one of these options.

Default Parser	Extract text with built-in settings.
LLM-based Parser	Extract all text, images, and other visual elements using an LLM.
Docling Parser	Extract text from images, tables, and document layouts using Docling Parser.

If you choose Default Parser, click Next.
If you choose to parse your documents using an LLM, complete these steps.
1. From the dropdown list, select the appropriate model.
2. Use the default prompt as is, or edit it for your specific use case.
3. Click Next.
If you choose the Docling parser, click Next.
By default, additional pre-processing is turned-off. Go to setting up chunking strategies to enable image processing to parse images.

If you chose the default parser, on the Preprocessing page, choose one of these options.

No Preprocessing	Use parsed text and visual data as is.
LLM-based Visual Data Preprocessing	Capture context from visual data using an LLM. (Available only if you selected the default parser in the previous step.)

If you choose No pre-processing, click Next.
If you choose to pre-process your documents using an LLM, complete these steps.
1. From the dropdown list, select the appropriate model.
2. Use the default prompt as is, or edit it for your specific use case.
3. Click Next.

Set Up Chunking and Vectorization

Next, set up chunking for your data and select an embedding model to create searchable vectors for each chunk.

If your search index is on a DMO, on the Chunking page, select fields to chunk and the chunking strategy.
1. To add or remove fields, click Manage Fields and save your work.
  Search Indexes don't support date and datetime fields for chunking. Only numerical and text fields are available for chunking data.
2. (Optional) Update the chunking strategy and modify the settings.
3. If your search index is on a DMO of a Salesforce object with file attachments that you want to include, click Include Attachments and select the ContentDocumentVersion UDMO.
4. Add a file extension or set a chunking strategy based on the file type. To add a file extension for a new file type, click Add File Extension, and then optionally update the chunking strategy and modify the settings for an existing file type.
5. (Optional) To add additional fields to a chunk, select the Prepend fields to each chunk toggle and select a field from the list. You can add a maximum of two fields.
6. To enrich generated chunks with additional metadata, turn on the Enrich Content Chunks toggle.
7. Click Next.
If your search index is on a UDMO, add a file extension or set a chunking strategy based on the file type.
1. To add a file extension for a new file type, click Add File Extension. If you have images to process, add the JPEG, JPG, and PNG file extensions.
2. (Optional) Update the chunking strategy and modify the settings for an existing file type.
  For image files with extensions, such as JPEG, JPG, or PNG, image processing is enabled by default. To include image processing for PDF files, turn on image processing.
  When you select the Docling parser, optionally, turn-on LLM-based image processing to parse images using an LLM. For more information on supported file formats see, Docling Parser.
  If you selected LLM-based parsing or LLM-based visual data preprocessing in previous steps, image processing doesn't apply even if you turn it on.
3. (Optional) To add additional fields to a chunk, select the Prepend fields to each chunk toggle and select a field from the list. You can add a maximum of two fields.
4. To enrich generated chunks with additional metadata, turn on the Enrich Content Chunks toggle.
5. To turn-on LLM-based image processing with the docling parser, turn-on image processing.
If you have audio or video files to transcribe, click Configure, and in the transcription window, select the following:
1. Select the default transcription model from the dropdown.
2. To add a timestamp to each speaker entry, select Include timestamp.
Click Next.
On the Vectorization page, from the dropdown list, select an embedding model. If your search index includes images to process, select the appropriate embedding model (beta).

The model determines how Data 360 measures your unstructured data objects for semantic relevance when building search results.
Click Next.

Configure Fields for Filtering and Ranking Factors

To enhance the precision and relevance of search results, specify fields that users can use to filter their searches.

On the Fields for Filtering page, optionally add fields from the Source object or related objects to provide more ways to filter a search. You can select up to ten filter fields.

Data 360 picks up the pre-filter fields when you generate vector embeddings. If you change the values in those pre-filter fields, Data 360 picks them up when you generate or refresh those vector embeddings. The cardinality of object relationship between the source object and the related object must be either 1:1 or N:1.

For example, if the indexed unstructured data includes case conversation transcripts, filtering on the Status field in the Case object can improve the relevance of semantic search results when searching the conversation transcripts.
Click Next.
(Optional) To better rank the hybrid search results, click Add Ranking Factor.
For example, you can select RECENCY as the ranking factor and choose the LastModifiedDate field in the DMO to rank the search results. For the object contributing to the ranking factor, you can select the object that the search index is built on or a related object. If you select a related object, select the field for the relationship.
You can add up to two ranking factors, recency of records and popularity of records.

Review and Build the Index

Finally, review the configuration and the target data model objects.

Review the configuration and the target data model objects, and click Save.

When you build the search index, Data 360 creates a hybrid search index for the data. Data 360 also creates a default retriever that you can manage in AI Models. Retrievers are used in prompt templates.

Create a Hybrid Search Index with Advanced Setup

Required Editions

Enable Agentforce for Generative AI Capabilities (Optional)

Select a Search Type and Source Object

Set Up Parsing and Visual Data Preprocessing

Set Up Chunking and Vectorization

Configure Fields for Filtering and Ranking Factors

Review and Build the Index

See Also

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Create a Hybrid Search Index with Advanced Setup

Required Editions

Enable Agentforce for Generative AI Capabilities (Optional)

Select a Search Type and Source Object

Set Up Parsing and Visual Data Preprocessing

Set Up Chunking and Vectorization

Configure Fields for Filtering and Ranking Factors

Review and Build the Index

See Also