Loading

Preventing OpenAI's GPTBot from scanning Marketing Cloud Cloudpages

게시 일자: Aug 8, 2023
상세 설명
OpenAI has introduced GPTBot, a web crawler to improve AI models.(e.g., ChatGPT) and provide AI-generated answers to questions (or prompts)

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

For OpenAI's web browsing plugin, calls to websites will be made from the 23.98.142.176/28 IP address block.
*More could always be added.

At this time Marketing Cloud has SEO settings that will not allow search engines to index / crawl through a Cloudpage.

You can find these SEO settings by going to the Page Properties in the top right corner when viewing the Cloudpage > click the Settings Icon > click Advanced Settings > SEO
  1. Check "Do Not Allow search engines to view this page."
  2. Check "Do not Allow search engines to follow links on this page."

In addition to this, you can prevent the GPTBot from accessing your Cloudpages if you do not want OpenAI using your content in any way by utilizing robots.txt. This is the same protocol you would use to block GoogleBot, BingBot or other web crawlers.

Please note that this information is an example and any actual implementation would need to be completed by the customer. Marketing Cloud Support cannot assist with implementing or troubleshooting this portion of your code.

CloudPages can be set on a page-by-page basis with the SSJS HTTPHeader function example :
<script runat="server">
  Platform.Load("Core","1");
  HTTPHeader.SetValue("X-Frame-Options","SAMEORIGIN");
  HTTPHeader.SetValue("X-Content-Type-Options", "nosniff");
  HTTPHeader.SetValue("X-Robots-Tag", "noindex");
  HTTPHeader.SetValue("Content-Security-Policy", "default-src https:");
  HTTPHeader.SetValue("Strict-Transport-Security", "max-age=10");
  HTTPHeader.SetValue("X-Random-Option-I-Made","HelloWorld");
</script>


References -
Knowledge 기사 번호

000396055

 
로드 중
Salesforce Help | Article