Loading

Grounding Agentforce Agents in Website Content: Options & Key Considerations

Publish Date: May 14, 2026
Description

When grounding an Agentforce Service Agent in website content, there are multiple architectural options to consider. This article summarises the available options and outlines key considerations.

Before you start:
Regardless of approach, website content hygiene is a prerequisite. Outdated, duplicate, or contradictory content on the source site will directly degrade agent accuracy.

Resolution

Option 1: Data 360 Web Content Connector - Sitemap

How it works: Uses an XML sitemap as the crawl manifest. Predictable and efficient method to index exactly the URLs you specify. 
When to use: Website has (or can build and maintain) a valid XML sitemap.
Setup: Web Content (Sitemap) Connector
Key considerations:

  • No JavaScript rendering. The Web Content (Sitemap) connector doesn’t support content that is dynamically rendered via JavaScript at runtime. Any content rendered in this manner isn't ingested. What you see is what gets indexed.
  • Maintenance burden. Someone must keep the sitemap current. For sites managed by a third party, this adds a coordination dependency.
  • Find more limitations in Web Content (Sitemap) Connector Limitations

Option 2: Data 360 Web Content Connector - Crawler

How it works: Starts from a base URL and follows links automatically. Index content by crawling links on the website pages. No sitemap required.
When to use: Site is mostly static HTML and is structured with clear internal linking.
Setup: Web Content (Crawler) Connector
Key considerations:

  • Maximum crawl depth is 8 levels. Deeper content structures require a sitemap or alternative approach.
  • Missed pages. Pages not reachable via the link graph (e.g., dynamically generated links) will not be crawled.
  • Find more limitations in Web Content (Crawler) Connector Limitations

Option 3: Web Search (Data Library or Search the Web Action)

How it works: Configured via Agentforce Data Library in Setup. Connects to third-party search providers (BrightData, OpenAI, You.com) to query live or recently-indexed web content. Use domain filtering to restrict results to the domains you want.
When to use: Website content changes frequently and crawling is not viable; content is already well-indexed publicly.
Setup: Use Web Search; Search the Web Standard Action
Key considerations:

  • Cached and outdated content. Providers return search-engine-indexed content, which may lag behind the live site. Dead links and stale pages are a real risk. Provider freshness varies.
  • Content hygiene dependency. Contradictory or duplicated content on the website will surface in results. The fix is at the source, not in the agent.
  • Performance may vary depending on the search provider. Test multiple providers to see which one produces the best results. Retrieval latency might be slow compared to pre-ingesting website content via Data 360 Web Content connectors.

Option 4: CMS API → Middleware → Data 360 Ingestion API

Middleware (e.g., MuleSoft) pulls content from the website's Content Management System (CMS) API. The content is transformed and then ingested into Data 360 via the Ingestion API. The agent is grounded via a Data 360 retriever.
When to use: Maximum accuracy and freshness required; CMS API is accessible; development resources are available.
Setup: Data 360 Ingestion API; Steps to Connect Data 360 with Mulesoft
Key Considerations:

  • CMS API must exist and be accessible. Not all CMS platforms include API support. Re-ingestion must be triggered on content change.
  • Highest build and maintenance cost. Requires middleware development, ingestion pipeline maintenance, and schema management.

Here is an overview of the different options. When selecting an approach, consider additional factors such as retrieval latency, maintenance effort, credit consumption, support for content rendered by client-side JavaScript, and the limitations documented for each connector.

OptionSetup EffortJavaScript SupportContent FreshnessAccuracy Control
1 - Data 360 Web Content Connector - SitemapMediumNoOn-crawlHigh
2 - Data 360 Web Content Connector - CrawlerLowNoOn-crawlMedium
3 - Web Search (Data Library or Search the Web Action)LowYesCached or near-live (varies by provider)Medium
4 - CMS API → Middleware → Data 360 Ingestion APIHighYesConfigurableHigh

 

Knowledge Article Number

005336080

 
Loading
Salesforce Help | Article