You are here:
LLM-based Parsing and Preprocessing
Use LLM-based parsing and preprocessing to extract and prepare text, metadata, tables, and images from unstructured documents for chunking.
LLM-based parsing converts text and metadata from unstructured document formats into structured or semi-structured representations. For example, parsing identifies document structures (headings, paragraphs, or tables), removes irrelevant content (headers, footers, or HTML tags), and further prepares the text for pre-processing and chunking.
LLM-based visual data preprocessing prepares multimodal elements, such as images and tables for chunking. For example, preprocessing extracts context from images and extracts data from tables while maintaining the context and relationships within the table data.
You cannot select both LLM-based parsing and LLM-based Visual Data Preprocessing options for a search index. Use LLM-based parsing for documents that contain rich visual content—such as images, charts, and tables —throughout the documents. In such cases, processing the entire document holistically provides better context and understanding.
Alternatively, use LLM-based visual data preprocessing when you want to specifically process structured and visual elements like tables, charts, and images, rather than the full content of a document.

