Importance of robot.txt file in SEO
In this article I will explain what is robot.txt file, how
to create and optimize robot.txt file for search engine optimization
Importance of robot.txt file for your website
Robots.txt file help search engine robots to understand
which page they should visit and which files they shouldn’t visit.
When you have 2 version of a page and you want only one
version to be indexed then you can use robots.txt file to disallow robots to
index or crawl that copied web page, which will help you from getting penalized
for duplicate content.
You don’t want to show your website sensitive data then you
can add disallow statement for that web page in robots.txt file.
Most of the search engines are avoid metatags now so robots
metatag can simply go unnoticed. So its better to use robots.txt file.
You can exclude images, stylesheets and javascript from
indexing and can save some bandwidth by telling spiders to keep away from these
items.
Before we create a robot.txt file, we first know what is this robot.txt file?
Robots.txt
file is a text file which is created by webmaster to instruct search engine
robots, how they should crawl their website and which page is available for
index and which is not. This is known as Robots Exclusion Protocol.
How robots visit your website?
Robots first
visit your website robot.txt file where they get instruction about which page
they can crawl and which they shouldn’t.
Before
visiting your website home page, that is your domain name e.g.
www.yourwebsite.com, robots first visit www.yourwebsite.com/robots.txt
What code should we write to enable and disable my website crawling by robots?
Here I will
explain the working of code
"User-agent: *" means this section
applies to all robots.
"Disallow: /" tells the robot that
it should not visit any pages on the site.
you can include comment in robots.txt file by
putting the # sign at the beginning of the line
To stop robot from accessing entire server copy below code, save it in text file or in notepad with the name robots.txt.
# all user agent are disallowed to see the whole website
User-agent: *
Disallow: /
To allow all robots to visit your website completely you must
write this code in your text file. You can also create an empty robot.txt file
in order to give complete access to robots to crawl your website.
User-agent: *
Disallow:
If you want to exclude a single robot from your website, then you
can use the following code
User-agent: BadBot
Disallow: /
Here I will tell how you can exclude all robots from part of the
server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
You can write the name of robot which you want to allow for
accessing your website
User-agent: Google
Disallow:
User-agent: *
Disallow: /
Sitemap
parameter
User-agent: *
Disallow: Sitemap: http://www.yourwebsite.com/non-standard-location/sitemap.xml
Mistakes in creating robots.txt file
Mostly there
are typos error which can cause blunder. So beware of such mistakes.
Logical error
User-agent: *Disallow: /temp/
User-agent: Googlebot
Disallow: /images/
Disallow: /temp/
Disallow: /cgi-bin/
Here in
above example robot visit the file and see user-agent: * which means for all
robots and then it disallow temp file to crawl robots. But here Googlebot will
do the same and it will disallow temp and will index all other files. Googlebot
will not read files to the end.
Love to see your robot.txt file explanation. I will pass this url to others to learn about robots.txt Good Job. Keep it up
ReplyDeleteThank you Sushant, Keep sharing
Delete