LLM-based Parsing and Preprocessing

Use LLM-based parsing and preprocessing to extract and prepare text, metadata, tables, and images from unstructured documents for chunking.

LLM-based parsing converts text and metadata from unstructured document formats into structured or semi-structured representations. For example, parsing identifies document structures (headings, paragraphs, or tables), removes irrelevant content (headers, footers, or HTML tags), and further prepares the text for pre-processing and chunking.

LLM-based visual data preprocessing prepares multimodal elements, such as images and tables for chunking. For example, preprocessing extracts context from images and extracts data from tables while maintaining the context and relationships within the table data.

You cannot select both LLM-based parsing and LLM-based Visual Data Preprocessing options for a search index. Use LLM-based parsing for documents that contain rich visual content—such as images, charts, and tables —throughout the documents. In such cases, processing the entire document holistically provides better context and understanding.

Alternatively, use LLM-based visual data preprocessing when you want to specifically process structured and visual elements like tables, charts, and images, rather than the full content of a document.

Note NOTE: Parsing and preprocessing options are available only for PDF, HTML, DOCX, and PPTX file types from UDMOs, not for data model objects (DMOs), including those that have a file attachment UDMO.

Did this article solve your issue?

Let us know so we can improve!

LLM-based Parsing and Preprocessing

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

LLM-based Parsing and Preprocessing