A robots.txt file is a text file that tells search engine crawlers which URLs or pages to crawl or access. The file is primarily used to indicate useless pages that might overwhelm your servers with crawler requests.
Note: robot.txt can cause search engines to fail to access your site completely if not used correctly. So make sure your casing, and spellings are correct.
How to create a robots.txt file
File format and rules
- It must be a text file
- Each site should have only one robots.txt file
- The file must be located at the root of the website, e.g https://nerdlify.com/robots.txt
- Subdomains can also have their own robots.txt file.
- The text file must be UTF-8 encoded
Adding rules to robots.txt file.
Robots.txt rules give crawlers instructions on which part of a website to crawl and not to crawl.
Consists of one or more groups and each group have its own rules, one directive per line. Each group starts with a User-agent which specifies the target of the group.
Example of User-agent:
# Block only Googlebot User-agent: Googlebot Disallow: / # Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /
Robots.txt in Django
Go into your main app templates, and create a robots.txt file.
User-Agent: * Disallow: /?p=* Disallow: /search/?q= Sitemap: https://example.com/sitemap.xml
Now we can link the next robots.txt with a URL via the Django TemplateView.
from django.views.generic.base import TemplateView urlpatterns = [ #other stuff path("robots.txt", TemplateView.as_view(template_name="main/robots.txt", content_type="text/plain")), ]
Now if you visit yourdomain.com/robots.txt you will see the file displayed. Don't forget to update the contents of the file for the sitemap.xml and the Disallow rules.