robots.txt, ChatGPT, and writefreely
With a robots.txt file, you can block bots, such as search engine crawlers, from accessing parts or the entirety of your website. While you may want to let the usual search engines in, so that others can find your site, you may want to block Large Language Models (LLMs) such as ChatGPT from gobbling up your content.
Here’s what you should add to your robots.txt file to stop ChatGPT, specifically:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
The robots.txt file is just a plain text file that goes into the root directory of your webserver, next to where, typically, an index.html file or similar for the homepage would reside (if you’re not familiar with that file, Wikipedia has you covered).
writefreely and the robots.txt
That is, unless you are running writefreely (as the site that you are currently reading does). With the default setup, writefreely will serve all requests, including that for the robots.txt file. And since the writefreely software doesn’t know about the robots.txt, it will throw an error.
Assuming your writefreely blog is at https://example.com, then typing https://example.com/robots.txt into a browser’s address bar should bring up the robots.txt file. If, instead, you’re seeing an error message:
404 page not found
… then your site needs the following addition to the nginx configuration file for your blog:
location /robots.txt {
alias /home/www/writefreely/robots.txt;
}
You should already have a location directive for the CSS and images of writefreely. Just put the above directive below that section. Make sure to include the full local path (in the file system on your server) behind the alias. If you put your robots.txt next to the writefreely binary, you can copy the part before the “static” from the existing location directive.
Then, all you have to do is to tell your nginx to use the new configuration:
systemctl reload nginx
… and it’s goodbye to ChatGPT gobbling up your writing.