Tag Archives: Spiders

How does Google work?

Its a start. Google shares everything, but still doesn’t tell you much in absolutely detail, as normal. They are making SEO accessible to the masses, by explaining to the lay-man about crawling and indexation on the basics of retrieving and recalling web pages. Shame he didn’t go into personalisation and how social signals are influencing that.

Without further-ado, here is the latest video presented by Matt Cutts.

How to slow down a robot

There is a flipant answer here about turning it off/remove batteries etc.

But, what do you do if you are having some server troubles or a bot is hitting you really hard. There are 2 obvious options open to you.

  1. Use your verified Google/Bing webmaster tools account and press the buttons, or
  2. To use an entry in your Robots.txt file

Option 1 is straightforward, so for the rest of this short post, lets focus on option 2. You can find some more details on the Bing community site. In short all (reputable) engines have signed up to the Robots Exclusion Protocol (REP). So…

How to set the crawl delay parameter

In the robots.txt file, within the generic user agent section, add the crawl-delay directive as shown in the example below:

User-agent: *
Crawl-delay: 1

and replace as necessary if you want to apply to any specific bot e.g. msnbot/googlebot etcetera.

User-agent: msnbot
Crawl-delay: 1

What speed should I set?

They suggest no slower than 10, or it could affect their ability to stay on your site.

Crawl-delay setting

Index refresh speed

No crawl delay set

Normal

1

Slow

5

Very slow

10

Extremely slow

Good luck and happy bot management.

Interesting reading 2009-05-01

Books image on adrianland.co.ukOn a new in my reader is Thats SEO.  Today this post about the role of your IP address in your SEO efforts.  It defines the usual why you need to know where you site is going to reside e.g. “bad neighbourhood” etc etc.  But continues with some explanations of what this actually means.  Too many posts these days, including mine are too brief and dont lay out the context!!  A good read, thank you Raghaven. Oh, and if you want to check to see if your IP is blocked on a number of bad site lists check out what  is my IP address.

If you are ever considering going solo, then reading 10 lessions from a failed start up would be negligent.

On SEOMoz there is some detail, although a pseudo sales pitch, but some interesting facts about what they have seen with their crawl of the web.  Some highlight numbers.  That 2.7% of links are NoFollowed, 73% of these were internal, so site scultping is popular.  I do it.  And 16million pages have the new canonical tag. 

On black hat seo, link to a digest page on recent popular articles, such as “why spam works” ; “How to break captchas”, and more. All in very simple to read articles with a ‘can-do’ attitude.

We all like a good list.  On SEO Optimise they have a non-Google focused list of resources for social meda. Worth checking out.

And as mine are all broken (work ones), its good to look at sitemaps. SEL have published a casestudy. See it here.