Extracting Data from Digital and Scanned Documents

You are here:

Extracting Data from Digital and Scanned Documents

Extract structured data from invoices, purchase orders, forms, and other unstructured or semi-structured documents using AI-powered analysis in Flow. Automate data entry and processing workflows by converting document content into structured information that can be used in your business processes. Use human review workflows to validate extracted data and ensure accuracy for critical business decisions.

Required Editions

Available in: Lightning Experience

View supported editions.

This feature requires the MuleSoft for Flow: IDP add-on. Professional Edition requires the API access add-on. To purchase, contact your Salesforce account executive.

Document processing features require Einstein generative AI turned on in Setup, and Data 360 provisioned and enabled for your org.

MuleSoft for Flow: IDP features used with Agentforce require the Foundations or Agentforce 1 edition. To purchase these editions, contact your Salesforce account executive.

How Flow Extracts Data from Documents

The document processing pipeline consists of these steps:

Configuration: Create document processing configurations in the Automation App to define the extraction rules and output structure for each type of document you want to process. These configurations specify which fields to extract, their data types, and optional instructions to help Einstein understand the document structure.
Processing: Use the Extract Data from Document action in your flows to submit documents for analysis. The action uses the specified document processing configuration to extract data and returns the results as structured information that you can use in subsequent flow elements. The action returns a dynamic Apex class containing all the extracted data, which you can then use in subsequent flow elements or send for human review if you have a review framework in place.
Review: Implement review workflows using screen flows to validate extracted data and handle cases where the confidence score is low or manual verification is required. This ensures data quality and provides human oversight for critical business processes.

Document Processing Configurations

A Document Processing Configuration defines the specific fields, tables, and columns to extract from documents. You can also provide human-language instructions to help Einstein better understand the document structure.

For example, you can define fields such as Invoice Number and include instructions such as "Extract the invoice number from the field next to the vendor name in the document."

Confidence Scores and Data Quality

Einstein assigns confidence scores to indicate how certain it is about the accuracy of each extracted field. High confidence scores indicate reliable extraction; lower scores suggest manual review.

Configure review workflows in screen flows to route documents for human review when confidence scores fall below your specified thresholds.

Review Frameworks and Human-in-the-Loop

Review workflows let you validate extracted data before your business processes use it. Use them when:

The confidence score for extracted data is low
Business rules require manual approval
You want to validate data accuracy before making decisions
Compliance requirements mandate human oversight

The review framework consists of an orchestration that manages the overall process and a screen flow that provides the interface for human review.

Human Review Workflow Based on Flow Type

The structure of your review workflow varies depending on the type of flow you use:

Same flow path: The Extract Data from Document action and the Review Extracted Data screen are on the same flow path (for example, in a screen flow or autolaunched flow). Content Document ID and Document Processing Configuration ID auto-populate, and variable passing is minimal. This pattern is simpler and is used in the tasks and in the example that uses a single flow with file upload. See Review Contract Data and Update Records Example.
Different paths (record-triggered and approval): When the flow is record-triggered (for example, when a file is attached to a record), you cannot add a screen element in the same flow. You use an approval action that calls a flow orchestration, and the orchestration calls the screen flow. Extraction runs in the parent flow; review runs in the screen flow. Because they are on different paths, you must create variables at each stage to pass data (content document ID, extraction output, reviewed data) between the parent flow, orchestration, and screen flow. The user sees the review screen in the Approvals experience. See Process and Review Attached Documents Example.

Supported Document Types

This capability can process various types of business documents, including:

Invoices and receipts
Purchase orders and contracts
Forms and applications
Reports and statements
Scanned documents and images

The AI models can extract text fields (names, addresses, descriptions), numeric values (amounts, quantities, percentages), dates and timestamps, and structured data (tables, forms) from both digital documents, scanned images, and handwritten text.

Example

Here are specific examples of how organizations use document processing:

Invoice Processing: Extract invoice number, vendor name, amount, and due date from PDF invoices to automatically create or update invoice records in Salesforce.
Purchase Order Management: Process purchase orders to automatically update inventory systems and trigger reorder workflows when stock levels fall below thresholds.
Application Routing: Extract form data from loan applications or job applications to automatically route them to the appropriate teams based on content analysis.
Contract Compliance: Validate contract terms and automatically trigger approval workflows when specific clauses are detected.

Define What Data to Extract from Your Documents
Document Processing Configurations define the structure and rules for extracting data from your documents. By creating these configurations, you define what information to look for and how to organize the extracted data. This step is essential before you can process documents in your flows or set up review workflows.
Set Up Document Extraction and Routing to Human Review
Add the Extract Data from Document action to your flow, configure the document and configuration inputs, store the output, and add a decision that routes to human review when your conditions are met.
Build the Review Interface for Extracted Document Data
Create a screen flow with the Review Extracted Data component so reviewers can validate and modify extracted data before the flow continues.
Route Reviewed Data to Records and Business Processes
Call the screen flow from the orchestration, branch on review outcomes, and add actions to update records, send notifications, or trigger other processes with the reviewed data.
Example Document Processing Workflows
Walk through end-to-end examples that combine document extraction with human review, from a single flow path to record-triggered flows with approval and orchestration.
Document Processing Limits and Quotas
Limits for document processing in flows align with Data 360 where applicable. Prompt limits and other document processing-specific quotas are documented in the MuleSoft document processing documentation.

Did this article solve your issue?

Let us know so we can improve!

Extracting Data from Digital and Scanned Documents

Required Editions

How Flow Extracts Data from Documents

Document Processing Configurations

Confidence Scores and Data Quality

Review Frameworks and Human-in-the-Loop

Human Review Workflow Based on Flow Type

Supported Document Types

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Extracting Data from Digital and Scanned Documents

Required Editions

How Flow Extracts Data from Documents

Document Processing Configurations

Confidence Scores and Data Quality

Review Frameworks and Human-in-the-Loop

Human Review Workflow Based on Flow Type

Supported Document Types