Create a Document Schema Configuration Using Auto-Extraction
Configure a document schema to represent data extracted from your source files into a Data Lake Object (DLO). Use the auto-extraction option in the document schema builder to have an LLM define the structure of the document schema. The builder auto-populates all fields, tables, and columns.
If you choose, you can create Document AI configurations using the Data Cloud Connect
API. See the Data Cloud Connect API reference for
details.
Before you begin, ensure that you’ve followed the unstructured data workflow and
have created an unstructured data model object (UDMO) to reference your source
files.
Before getting started, turn on Einstein to use generative AI features in Document AI.
After you turn on Einstein, we’ll need a few minutes while we sync Einstein and Data
Cloud.
System admin: From Setup, in the Quick Find box, enter Einstein
Setup, and then select Einstein Setup.
Note If you can’t find Einstein Setup, ensure that your org meets the
prerequisites for any generative AI features you plan to use. For more support, contact your
Salesforce Account Executive (AE).
Enable Turn on Einstein.
Create an Output DLO
Configure an output DLO to represent the schema definition of the data extracted from your
source file.
From the App Launcher, select Data Cloud.
Click Process Content | Document Al |
New.
Click From a Source Object.
Select a UDMO that contains documents for processing, and click
Next.
Use the toggles to select which file types are available for your document schema
configuration, and click Next.
Select which LLM you want to use, and click Next.
On the Configure a Data Lake Object page, click Add.
Click From Scratch, and in the New Data Lake Object dialog, supply values for the DLO Name
and API Name, and then click Next.
Auto-extract the Document Schema
Use the document schema builder to auto-extract the document schema, including fields, tables,
and columns.
In the document schema builder, click Using auto-extraction and
then click Next.
Document Al auto-extracts all data from the source and auto-populates a document
schema.
Click Test and verify the results of the extraction.
Click Save.
You can view the document schema configuration on the details page.
We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required Cookies
Always Active
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional Cookies
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising Cookies
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.