Importance of robot.txt file in SEO

In this article I will explain what is robot.txt file, how to create and optimize robot.txt file for search engine optimization

Importance of robot.txt file for your website

Robots.txt file help search engine robots to understand which page they should visit and which files they shouldn’t visit.

When you have 2 version of a page and you want only one version to be indexed then you can use robots.txt file to disallow robots to index or crawl that copied web page, which will help you from getting penalized for duplicate content.

You don’t want to show your website sensitive data then you can add disallow statement for that web page in robots.txt file.

Most of the search engines are avoid metatags now so robots metatag can simply go unnoticed. So its better to use robots.txt file.

You can exclude images, stylesheets and javascript from indexing and can save some bandwidth by telling spiders to keep away from these items.

Before we create a robot.txt file, we first know what is this robot.txt file?

Robots.txt file is a text file which is created by webmaster to instruct search engine robots, how they should crawl their website and which page is available for index and which is not. This is known as Robots Exclusion Protocol.

How robots visit your website?

Robots first visit your website robot.txt file where they get instruction about which page they can crawl and which they shouldn’t.

Before visiting your website home page, that is your domain name e.g. www.yourwebsite.com, robots first visit www.yourwebsite.com/robots.txt

What code should we write to enable and disable my website crawling by robots?

Here I will explain the working of code

"User-agent: *" means this section applies to all robots.

"Disallow: /" tells the robot that it should not visit any pages on the site.

you can include comment in robots.txt file by putting the # sign at the beginning of the line

To stop robot from accessing entire server copy below code, save it in text file or in notepad with the name robots.txt.

# all user agent are disallowed to see the whole website

User-agent: *

Disallow: /

To allow all robots to visit your website completely you must write this code in your text file. You can also create an empty robot.txt file in order to give complete access to robots to crawl your website.

User-agent: *

Disallow:

If you want to exclude a single robot from your website, then you can use the following code

User-agent: BadBot

Disallow: /

Here I will tell how you can exclude all robots from part of the server

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /junk/

You can write the name of robot which you want to allow for accessing your website

User-agent: Google

Disallow:

User-agent: *

Disallow: /

Sitemap parameter

User-agent: *

Disallow: Sitemap: http://www.yourwebsite.com/non-standard-location/sitemap.xml

Mistakes in creating robots.txt file

Mostly there are typos error which can cause blunder. So beware of such mistakes.

Logical error

User-agent: *
Disallow: /temp/
User-agent: Googlebot
Disallow: /images/
Disallow: /temp/
Disallow: /cgi-bin/

Here in above example robot visit the file and see user-agent: * which means for all robots and then it disallow temp file to crawl robots. But here Googlebot will do the same and it will disallow temp and will index all other files. Googlebot will not read files to the end.

About Rahul Rana

I am an experienced web designer and SEO expert. I am running my own online marketing company i.e "SEO To Web Design". Thanks for reading, keep sharing online marketing and web design knowledge.

SEO to Web Design

Which is best?

Sponsor

Labels

About Me

Blog Archive

Importance of robot.txt file in SEO