Wednesday, July 1, 2009 crawl stats. How fast is fast enough?

Below we've included a screenshot of the crawl stats for, taken from the webmaster interface at Google. As you can see, the page load times for have improved steadily over the last 90 days, and now average a little under 700 milliseconds.

It's also obvious from the graphs that page load time has a direct effect on how many pages the GoogleBot crawls per day (and thus how many pages from end up in the Google index). That leads to a question, and maybe a reader can comment on this post with the answer. Is 684 milliseconds a good page load time, or just average? What do other large sites see in their GoogleBot crawl stats? Will the relationship between GoogleBot crawl speed and page load time hold true if we decrease page load time another 50% (i.e. will the GoogleBot double the number of pages it crawls per day on

OK, maybe that's three questions. But you get the idea.


Ask Bjørn Hansen said...

Be sure to also adjust the "crawl rate" to allow them to go nuts on your server. (I'm not sure I can give exact numbers, but our servers are crawled comparable numbers from your screenshot several times an hour IIRC).

Ask Bjørn Hansen said...

Oh - our average response time is a good bit lower, too -- so maybe that has something to do with it.

I thought you were hosted in Europe, but it looks like you use EC2, so latency to Europe isn't why ... :-) (We're in Los Angeles and Montréal).

Jack DeNeut said...

Ask - Thanks for the tips - I changed the crawl rate and we'll see what happens. I'll publish updated stats here on the blog next month.

We're going to have to change our architecture a bit to get the page load times down further. Right now, we're just using memcached, but with millions of possible URLs on Nelso, the hit rate on the cache never gets above 15%. We're considering pre-rendering all the business detail pages, and only updating the static HTML when something on a business page changes.