How to Set Up a Custom Robots.txt File for Better SEO

Creating a custom robots.txt file can offer numerous benefits for your website's Search Engine Optimization (SEO). This file serves as a signal to search engine crawlers about which pages they are allowed to crawl and index, enabling better control over the content that gets discovered in search results. In this post, we will walk you through the process of setting up a custom robots.txt file.

Prerequisites:

Basic knowledge of HTML/CSS/JavaScript or your Content Management System (CMS).
FTP or SSH access to the server where your website is hosted.
Text editor or Integrated Development Environment (IDE) of your choice.

Creating a Custom `robots.txt` File:

Plan and research: Begin by identifying which pages on your site you want to block from search engines. Consult with your team or consult the Google Search Console to determine if there are any duplicate, outdated, or low-quality content that can be removed.
Create a new file: Use a plain text editor (like Notepad, Sublime Text, Atom, or Visual Studio Code) to create a new file named robots.txt. Save the file with a .txt extension in the root directory of your website.
Define user-agent rules: The first step is defining which search engine crawlers you want to apply the rules to. You can use common search engine names as user-agents such as Googlebot, Bingbot, or DuckDuckBot. For example:

User-agent: Googlebot

Setting Up Rules

There are several types of rules you can define for a user agent in your robots.txt file:

Allowing access to specific pages (Allow)

Use the Allow: directive to allow search engine crawlers access to specific folders, files, or URLs. For example:

User-agent: Googlebot
Allow: /folder1/
Allow: /folder2/index.html

Disallowing access to pages (Disallow)

Use the Disallow: directive to block search engine crawlers from accessing specific folders, files, or URLs. For example:

User-agent: Googlebot
Disallow: /folder1/subfolder/
Disallow: /old_page.html

Using wildcards (*) and subdirectories (/)

Use asterisks (*) to include or exclude entire directories, while a trailing slash (/) signifies the root directory. For example:

User-agent: Googlebot
Disallow: /images/
Disallow: /*/
Allow: /

Testing Your `robots.txt` File

After creating your custom robots.txt file, upload it to the root directory of your website using FTP or SSH. Use the Google Search Console's "Fetch as Googlebot" feature to check if search engine crawlers can access your robots.txt file and that your defined rules are being applied correctly.

By setting up a custom robots.txt file, you can control which pages on your website are accessible to search engines, potentially improving your SEO efforts and overall online presence.

Published January, 2015