When grounding an Agentforce Service Agent in website content, there are multiple architectural options to consider. This article summarises the available options and outlines key considerations.
Before you start: Regardless of approach, website content hygiene is a prerequisite. Outdated, duplicate, or contradictory content on the source site will directly degrade agent accuracy.
How it works: Uses an XML sitemap as the crawl manifest. Predictable and efficient method to index exactly the URLs you specify.
When to use: Website has (or can build and maintain) a valid XML sitemap.
Setup: Web Content (Sitemap) Connector
Key considerations:
How it works: Starts from a base URL and follows links automatically. Index content by crawling links on the website pages. No sitemap required.
When to use: Site is mostly static HTML and is structured with clear internal linking.
Setup: Web Content (Crawler) Connector
Key considerations:
How it works: Configured via Agentforce Data Library in Setup. Connects to third-party search providers (BrightData, OpenAI, You.com) to query live or recently-indexed web content. Use domain filtering to restrict results to the domains you want.
When to use: Website content changes frequently and crawling is not viable; content is already well-indexed publicly.
Setup: Use Web Search; Search the Web Standard Action
Key considerations:
Middleware (e.g., MuleSoft) pulls content from the website's Content Management System (CMS) API. The content is transformed and then ingested into Data 360 via the Ingestion API. The agent is grounded via a Data 360 retriever.
When to use: Maximum accuracy and freshness required; CMS API is accessible; development resources are available.
Setup: Data 360 Ingestion API; Steps to Connect Data 360 with Mulesoft
Key Considerations:
Here is an overview of the different options. When selecting an approach, consider additional factors such as retrieval latency, maintenance effort, credit consumption, support for content rendered by client-side JavaScript, and the limitations documented for each connector.
| Option | Setup Effort | JavaScript Support | Content Freshness | Accuracy Control |
| 1 - Data 360 Web Content Connector - Sitemap | Medium | No | On-crawl | High |
| 2 - Data 360 Web Content Connector - Crawler | Low | No | On-crawl | Medium |
| 3 - Web Search (Data Library or Search the Web Action) | Low | Yes | Cached or near-live (varies by provider) | Medium |
| 4 - CMS API → Middleware → Data 360 Ingestion API | High | Yes | Configurable | High |
005336080

We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.