Google Reveals It Has Two Ways To Crawl Web Pages
Most people these days understand the general idea of how search engines work. Search engines like Google send out automated bots to scan or “crawl” all the pages on a website, before using their algorithms to sort through which sites are best for specific search queries.
What few outside Google knew until recently, was that the search engine has begun using two different methods to crawl websites – one which specifically searches out new content and another to review content already within its search index.
Google Search Advocate John Mueller revealed this recently during one of his regular Search Central SEO office-hours chats on January 7th.
During this session, an SEO professional asked Mueller about the behavior he has observed from Googlebot crawling his website.
Specifically, the user says Googlebot previously crawled his site daily when it was frequently sharing content. Since content publishing has slowed on this site, he has seen that Googlebot has been crawling his website less often.
As it turns out, Mueller says this is quite normal and is the result of how Google approaches crawling web pages.
How Google Crawls New vs. Old Content
While Mueller acknowledges there are several factors that can contribute to how often it crawls different pages on a website – including what type of pages they are, how new they are, and how Google understands your site.
“It’s not so much that we crawl a website, but we crawl individual pages of a website. And when it comes to crawling, we have two types of crawling roughly.
One is a discovery crawl where we try to discover new pages on your website. And the other is a refresh crawl where we update existing pages that we know about.”
These different types of crawling target different types of pages, so it is reasonable that they also occur more or less frequently depending on the type of content.
“So for the most part, for example, we would refresh crawl the homepage, I don’t know, once a day, or every couple of hours, or something like that.
And if we find new links on their home page then we’ll go off and crawl those with the discovery crawl as well. And because of that you will always see a mix of discover and refresh happening with regard to crawling. And you’ll see some baseline of crawling happening every day.
But if we recognize that individual pages change very rarely, then we realize we don’t have to crawl them all the time.”
The takeaway here is that Google adapts to your site according to your own publishing habits. Which type of crawling it is using or how frequently it is happening are not inherently good or bad indicators of your website’s health, and your focus should be (as always) on providing the smoothest online sales experience for your customers.
Nonetheless, it is interesting to know that Google has made this adjustment to how it crawls content across the web and to speculate about how this might affect its ranking process.
To hear Mueller’s full response (including more details about why Google crawls some sites more often than others), check out the video below:
Leave a ReplyWant to join the discussion?
Feel free to contribute!