You are here:
Parsing and Preprocessing Content
Prepare unstructured data for chunking by using Data 360 content parsing and preprocessing methods. Parsing and preprocessing identify document structures, such as headings, paragraphs, or tables, remove irrelevant content, and prepare the text for chunking and indexing.
- LLM-based Parsing and Preprocessing
Use LLM-based parsing and preprocessing to extract and prepare text, metadata, tables, and images from unstructured documents for chunking. - Docling Parser
The Docling parser is a third-party intelligent document parsing framework that brings together multiple open-source models for layout understanding and table extraction. Optionally, you can combine the Docling parser with image processing by using Large Language Models (LLMs) to enhance responses from information available in images.

