What exactly is a bot like Googlebot

Googlebot and other Spiders

Posted by

Spider, Bot, Robot, Crawler - Googlebot, Yahoo Slurp, MSNbot

Googlebot, Yahoo Slurp, and MSNbot and similar spiders, bots, and crawlers are the programs that harvest information for search engines.

For anyone tracking statistics on their website, Googlebot, MSNbot, and Yahoo Slurp can be welcomed guests. These three search engine bots gather (harvest) information about your page for their respective search engine. Seeing these spiders more often is also desirable because this means that you are being indexed more often and more likely to show up quickly in the SERPs (search engine results page).

A spider is nothing more than a computer program that follows certain links on the web and gathers information as it goes. For example, Googlebot will follow HREF or SRC tags to find pages and images that are associated with any given site. Because these crawlers are merely computer programs, they aren't always the smartest of creatures and may get caught in endless loops built by dynamically created webpages.

Robots.txt


While having Googlebot index your site more quickly is almost always a good thing, there are times when you don't want certain pages or images indexed. Most "reputable" spiders will obey a directive given by the robots.txt file. This file is document that tells spiders what they may and may not index. You can also explicitly instruct a robot not to follow any of the links on a page by the following meta tag:META NAME="Googlebot" CONTENT="nofollow".

Because of how these bots work and the importance they place on text links, many people have begun placing keyword filled text links to their website in their signatures on blogs and other comment sections. To reduce the impact that these have, you can instruct spiders not to follow one specific link by placing the following in the anchor tag:rel="nofollow". This will reduce the outgoing number of links and help you to maintain your pagerank.

Bad SPAM bots


Now just as in life, not all bots are good. There are "bad" bots that don't care about your robots.txt and are only out there to harvest your email address. To fight these "bad" SPAM bots, some people use javascript to "hide" their email addresses. However, anything that can be written to avoid a bad bot can be broken by an even worse bot. One company is fighting bots by giving them just what they want, email addresses, and lots of them. However, they are all email addresses of known SPAMers. I found the sight to be quite clever.

Hopefully this will clear up some confusion as to what a bot, crawler, spider is and how they go about collecting information. If you have any questions, post them below and we will try to answer as quickly as possible. If you need help with SEO (search engine optimization), we would love to help show you ways to increase the frequency and number of times Googlebot, Yahoo Slurp, and MSNbot index your site.