Generate a Robots.txt File with Business Manager for B2C Commerce
To create a robots.txt for one or more sites individually, use Business Manager. This robots.txt file is served to any requesting crawlers from the application server. It's stored as a site preference and can be replicated from one instance to another.
Required Editions
| Available in: B2C Commerce |
In addition to traditional search engine crawlers such as Googlebot, AI-powered crawlers now request your robots.txt file. These include bots used for large-language-model training (GPTBot, ClaudeBot), AI-powered search results (OAI-SearchBot, PerplexityBot, Claude-SearchBot), and user-initiated AI queries (ChatGPT-User, Claude-User, Perplexity-User).
When you select Custom robots.txt definition, the sample text includes
User-agent entries for these crawlers. Major crawlers honor robots.txt
directives, though there can be minor differences in how specific directives are interpreted.
For the most current list of AI crawler user-agent tokens, see the documentation published by
each provider.
- You can write up to 50,000 characters to this file in Business Manager.
- URIs are case-sensitive, and "/robots.txt" string must be all lower-case.
- Blank lines aren’t permitted within a single record in the "robots.txt" file.
- There must be exactly one
User-agentfield per record. The robot should be liberal in interpreting this field. - A case-insensitive substring match of the name without version information is recommended.
- If the value is "*", the record describes the default access policy for any robot that hasn’t matched any of the other records.
- It isn't allowed to have multiple such records in the "/robots.txt" file.
- The "Disallow" field specifies a partial URI that isn't to be visited. This can
be a full path, or a partial path; any URI that starts with this value won’t be
retrieved. For example,
disallows both /help.html and /help/index.html, whereasDisallow: /help
would disallow /help/index.html but allow /help.html. An empty value for Disallow, indicates that all URIs can be retrieved.Disallow: /help/ - At least one
Disallowfield must be present in the robots.txt file.
- Click App Launcher
, and then select Merchant Tools | Site | SEO & Discoverability | Robots
- Select the instance type to create a robots.txt file. If you want to create a robots.txt file for a Production instance, you can do so on a Staging instance and replicate the site preferences, where the robots.txt file definition is stored, from the Staging instance to the Production instance.
-
Select one of these options:
- Use the robots.txt file from a deployed cartridge: Use Google Search Console or another third-party tool to generate your robots.txt file. Add the file to a cartridge on your site path. There can only be one robots.txt file per site. If you want to generate a robots.txt file using another tool and upload it to your cartridge. This option is most useful if you want to use the same robots.txt file for multiple sites. This is not recommended, because usually you want to have different settings for different instance types. For example, you don't want your sandbox or staging sites to be crawled, but you do want your production sites to be crawled. This can cause issues when replicating code to production. This option is only selected before a site goes live to test the robots.txt file.
- Define an instance type-specific robots.txt (recommended): Use this option to have B2C Commerce generate a robots.txt file for you or specify a custom robots.txt file for each of your instances.
-
If you selected Define an instance type-specific robots.txt, select one of these
options:
- All spiders are allowed to access any static resources (recommended for Production): Use this if you want your storefront to be crawled and available to external search engines, such as Google. This generates a site-specific robots.txt file that indicates spiders can crawl the static resources for the site.
- All spiders are disallowed to access any static resources (recommended for Staging): Use this if you don't want your storefront to be crawled and available to external search engines, such as Google. This generates a site-specific robots.txt file that indicates to spiders that they shouldn't crawl the static resources for the site.
-
Custom robots.txt definition: Use this option if you want to
control which parts of your storefront are crawled and available to search engines and
AI crawlers. When you select this option, Business Manager populates the text area with a
sample robots.txt file. All entries in the sample are commented out. Uncomment and customize
the entries for your site.
The following sample text was introduced in B2C Commerce 26.1 and includes user-agent entries for AI crawlers:
################################################################################ ## SAMPLE robots.txt - Uncomment and customize as needed for your site ################################################################################ ## To allow access to all crawlers, including AI crawlers, use a general rule # User-agent: * ## To allow or deny access to specific crawlers, enable them individually # User-agent: Googlebot # User-agent: OAI-SearchBot # User-agent: GPTBot # User-Agent: ChatGPT-User # User-Agent: ClaudeBot # User-Agent: Claude-SearchBot # User-Agent: Claude-User # User-Agent: Perplexity-User # User-Agent: PerplexityBot ## Example rules to allow access to content pages, but not customer ## specific or functional pages: # Allow: / # Disallow: */account/ # Disallow: */cart/ # Disallow: */login/ # Disallow: */wishlist/ # Disallow: */Login-Show/ # Disallow: */Registration-Shopper/ # Disallow: */Checkout/ # Disallow: */Order/ # Disallow: */Order-History/ # Disallow: */Order-Confirmation/ # Disallow: */Order-Track/ ## A Crawl-delay can help throttle server impact. AI bots may not honor ## this, but this is good practice. # Crawl-delay: 10 ## To help search engines find your content, add a pointer to your sitemap. # Sitemap: https://www.yourdomain.com/sitemap.xmlIf your storefront is built on SFRA or SiteGenesis, we recommend also adding the search refinement URL parameter directives shown in the postrequisites to your custom file.
- Click Apply.
- Select Administration | Sites | Manage Sites | site | Cache.
- In the Static Content and Page Caches section, click Invalidate.
For information on where to upload your robots.txt file, see Upload a Robots.txt File.
If your storefront is built on SFRA or SiteGenesis, we recommend adding the following
lines to your robots.txt. These directives prevent crawlers from indexing filtered or sorted
search result pages, which helps avoid duplicate content and keeps search indexes focused on
your primary product and category pages.
These parameters are specific to the SFRA and SiteGenesis URL format and don't apply
to PWA Kit storefronts, which use SCAPI query parameters such as
refine and sort.
# Search refinement URL parameters (SFRA and SiteGenesis)
Disallow: /*pmin*
Disallow: /*pmax*
Disallow: /*prefn1*
Disallow: /*prefn2*
Disallow: /*prefn3*
Disallow: /*prefn4*
Disallow: /*prefv1*
Disallow: /*prefv2*
Disallow: /*prefv3*
Disallow: /*prefv4*
Disallow: /*srule*If your storefront is built on PWA Kit (Composable Storefront), we recommend adding
the following lines instead. PWA Kit uses SCAPI query parameters (refine,
sort, and offset) in its
/search and /category/ routes.
# Search refinement, sort, and pagination URL parameters (PWA Kit)
Disallow: /search?*refine*
Disallow: /category/*?*refine*
Disallow: /search?*sort*
Disallow: /category/*?*sort*
Disallow: /search?*offset*
Disallow: /category/*?*offset*Set the Googlebot crawl rate to Low through Google Search Console, as Google
ignores the crawl-delay directive in robots.txt, outlined in https://support.google.com/webmasters/answer/48620?hl=en. Most AI crawlers also ignore the Crawl-delay directive. To manage
crawl rates for AI bots, consult each provider's documentation.

