Enriched Indexing of Content Chunks

When you create a vector or hybrid search index from unstructured data in Data 360, you can enrich the generated content chunks by extracting metadata from the content. Extracted metadata includes keywords, entities, topic overviews, questions answered by the content, and content summaries.

Chunk enrichment greatly improves retrieval accuracy, especially in cases in which you can’t use prepend fields (for example, in a UDMO path) and in question and answer agent actions. Enriched content chunks provide an alternative to intensive content curation because LLM-generated content helps improve the identification of the right chunks.

To use enriched content chunks, first create a search index and enable enriched content chunks. In AI Models, create a new custom retriever. When prompted for the search index, select a search index that was created using the enrich chunks option.

To get the most value from an enriched index, access it through a retriever. To use the metadata chunks, a retriever applies extra logic, such as reranking and intent alignment. Calling the index directly through search APIs bypasses the retriever and results in retrieval behavior closer to standard vector or hybrid search. Although introducing a retriever sometimes adds latency, it enables higher retrieval accuracy by improving how queries map to relevant, answer-ready content.

Note

Use of this feature in Data Cloud consumes Flex Credits. For more information see, Billing Considerations for Enriched Index. Contact your account executive for more information.

Contents of Enriched Chunks

When you enable enriched content on a vector or hybrid search index, Data 360 generates three chunks: one chunk that contains the original chunk text, one chunk that contains metadata text, and one chunk that contains questions that the chunk can answer. Retrievers access these chunks to use in RAG and AI agent workflows.

Note Generated metadata doesn’t include personal identifying information (PII).

Chunk Type	Metadata Type	Description
METADATA	Keywords	Extracts keywords that uniquely identify the content.
	Entities	Identifies and lists verbatim any of the following: Locations Organizations Dates Time Currencies Amounts Percent Products Platforms mentioned
	Topics	Extracts main topics discussed in the chunk. Includes important entities and keywords in the topic names.
	Title	Generates a concise and informative title summarizing the content.
	Summary	Provides a summary of the content, highlighting key topics and entities. If possible, aim to encompass a minimum of 100 words and a maximum of 250 words.
QUESTION	Questions	If applicable, compiles a comprehensive set of detailed questions that the content in the text chunk can answer.

Data Processing for Enriched Indexing

Enriched indexes are supported in all regions where Data 360 is available. Data for enriched indexes is processed through Amazon Bedrock managed services in these regions.

us-west-2: US West (Oregon)
us-east-1: US East (Virginia)
eu-central-1: Europe (Frankfurt)
eu-central-2 Europe (Zurich)
eu-west-3: Europe (Paris)

Enable Amazon Bedrock Models (Amazon and Anthropic) in Einstein Setup to use enriched indexing. For more information, see Amazon Bedrock documentation.

Billing Considerations for Enriched Index
Enriched indexing impacts the consumption of credits used for billing for orgs operating Data 360 under a Data Coud license. With enriched search indexes, you are processing unstructured data and making calls to a LLM to generate enriched chunks.

Enriched Indexing of Content Chunks

Contents of Enriched Chunks

Data Processing for Enriched Indexing

See Also

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Enriched Indexing of Content Chunks

Contents of Enriched Chunks

Data Processing for Enriched Indexing

See Also