The SEO Basics - Robots.txt
October 12, 2012 |
While the SEO industry may be largely about making sure search engines see and rank your webpages, sometimes it is important to prevent webpages from showing up on the search results pages. Robots.txt files are designed to address this need. To fully understand what robots.txt is, why it is important, and how to use it, read on.
What is Robots.txt?
To explain what a robots.txt command is, we should first go over the basics of how search engines work. In order for a search engine to rank a website in its search results, that site needs to be part of that engine's index of webpages. In order for a website to be in the index of webpages, it needs to be found first by the search engine.
For example, Dave owns a website, and just published three articles online, on the topic of the best beaches to surf in Hawaii, and he wants to know what needs to happen so that his website can be found on a Google results page when people search for "surfing in Hawaii.” What Dave does not know is that his page needs to be indexed, crawled, and served by Google. Google sends out messengers to read websites, which are programs called bots. These bots crawl – or read – websites. Then, Dave’s website can be added to Google’s index. When the time comes, if Dave’s site has the appropriate Meta description tags, title tags, and quality content, Google should realize that his content should be displayed when "surfing in Hawaii” is searched for on Google.
In short, The process that leads to being ranked on a search result is as follows (in general): first a website is published, then a "bot” finds the webpage and determines its content by reading or "crawling” it, the webpage then becomes a part of the search engine index, and then finally the site can be ranked in a search result.
The robots.txt command is an instruction, or command, which tells the search engine bot not to crawl a webpage and/or directory. You can apply this command to a webpage if you decide that you do not want Google or any other search engine to crawl your webpage. Reasons for not wanting Google or others to crawl a webpage could be the protection of confidential data, or if your site is under development and you do not wish for it to be found online. Should you choose to use robots.txt, you should be aware that Google will not crawl your webpage, but your URL may still show up in Google results. Occasionally, Google can take listings from online directories, such as dmoz, to provide a description of the URL of a page that has a robots.txt tag defined for it.
Why is it Important For SEO?
Robots.txt is important for SEO because it provides webmasters with another strategy through which a website can gain some degree of privacy from search engine bots. Strategies such as these are important because it is very possible that there may be some elements of a website which contain information that the public should not be privy to, but the remainder of the website may be fit to appear in search results. The robots.txt file can be applied even to specific URLs so that a website which would like to be mostly crawlable can specify precise pages that it wishes the bots not to crawl. Moreover, should you be in the middle of developing a webpage, it would not appear professional to have a half made website appear on search engine results. It also may take some of the steam out of the PR that you likely want to gain from the official launch of the site. Keeping certain things away from web results is important for your business in general and for SEO, because while it is important to be seen in the search engine results pages, there are times when not being seen is equally important.
How Can I Use Robots.txt?
Robots.txt tags can be used to either not allow a bot to crawl a large part of a site, or a very small portion of it. It is important to know that robots.txt is not a security measure
, and can be compromised. There are malicious bots that might not necessarily respect robots.txt instructions and crawl websites despite the command. For such cases where malicious crawling is a danger to confidential data, certain SEO professionals such as Matt Cutts
of Google recommend that other methods are used to protect the data, such as password protection. Moreover, although Google may not crawl your site, they may provide a link to it on search engines, and take dmoz listings as the description of your webpage.