Tag Archive for: indexing

If you operate a website that is frequently creating or changing pages – such as an e-retail or publishing site – you’ve probably noticed it can take Google a while to update the search engine with your new content.

This has led to widespread speculation about just how frequently Google indexes pages and why it seems like some types of websites get indexed more frequently than others.

In a recent Q&A video, Google’s John Mueller took the time to answer this directly. He explains how Google’s indexing bots prioritize specific types of pages that are more “important” and limit excessive stress on servers. But, in typical Google fashion, he isn’t giving away everything.

The question posed was:

“How often does Google re-index a website? It seems like it’s much less often than it used to be. We add or remove pages from our site, and it’s weeks before those changes are reflected in Google Search.”

Mueller starts by explaining that Google takes its time to crawl the entirety of a website, noting that if it were to continuously crawl entire sites in short periods of time it would lead to unnecessary strain on the server. Because of this, Googlebot actually has a limit on the number of pages it can crawl every day.

Instead, Googlebot focuses on pages that should be crawled more frequently like home pages or high-level category pages. These pages will get crawled at least every few days, but it sounds like less-important pages (like maybe blog posts) might take considerably longer to get crawled.

You can watch Mueller’s response below or read the quoted statement underneath.

“Looking at the whole website all at once, or even within a short period of time, can cause a significant load on a website. Googlebot tries to be polite and is limited to a certain number of pages every day. This number is automatically adjusted as we better recognize the limits of a website. Looking at portions of a website means that we have to prioritize how we crawl.

So how does this work? In general, Googlebot tries to crawl important pages more frequently to make sure that most critical pages are covered. Often this will be a websites home page or maybe higher-level category pages. New content is often mentioned and linked from there, so it’s a great place for us to start. We’ll re-crawl these pages frequently, maybe every few days. maybe even much more frequently depending on the website.”

Google's John Mueller Courtesy of Google+

John Mueller

Recently I discussed a common issue sites have where a misplaced noindex tag on the front page of a site can keep search engines from crawling or indexing your site. It happens all the time, but it isn’t the only reason your site might not be crawled. The good news is there is little to no long term damage done to your site or your SEO, according to a recent statement from Google’s John Mueller.

Barry Schwartz noticed Mueller had responded to a question on the Google Webmaster Help forums from an employee for a company who had accidentally blocked GoogleBot from crawling and indexing their site. In John Mueller’s words:

From our point of view, once we’re able to recrawl and reprocess your URLs, they’ll re-appear in our search results. There’s generally no long-term damage caused by an outage like this, but it might take a bit of time for things to get back to “normal” again (with the caveat that our algorithms change over time, so the current “normal” may not be the same state as it was before).

So don’t worry to much if you discover you find your site has been having problems with crawling or indexing. What matters is how quickly you respond and fix the problem. Once the issue is solved, everything should return to relatively normal. Of course, as Mueller mentions, you might not return back to your exact same state because these things are always fluctuating.

Android

Source: Google

Smartphones have revolutionized how we browse the web, but most browsing still happens within the same web browsers we have all grown accustomed to. For the most part, we do our searches and actual browsing from Chrome, Safari, or Firefox, while we limit our apps to games, reading the news, or taking care of business. But, that all could change in the near future.

Google announced late last week that they would begin allowing Android app developers to have their app content indexed. That content will then be able to be opened directly through apps on Android devices. It is a large step towards a more seamless user experience on smartphones and tablets, rather than the disjointed experience we currently enjoy.

Googlebot has been improved to be able to index the content of apps, either through a sitemap file or through Google’s Webmaster Tools, though the feature is currently only in the testing phase. This means the indexing is only currently available to a small selection of developers, and signed-in users won’t begin to see the app content in their result for a few weeks.

The update means that searches will be able to return information from app content, which will then open directly in the intended app. For websites which tend to offer the same content on both their website and their app, such as news sites, it means users will be able to pick their desired experience, whether it be from within the browser or within the app.

Jennifer Slegg reports that app developers can sign up to let Google know they are interested in having their apps indexed by filling out an application of interest. Before you do though, you should know that your app must have deep linking enabled, and you will have to provide Google with information about alternate URLs either within their sitemap or in a link element within the pages of their site.

Indexing is only available for Android apps currently, and Google has yet to comment on when or if they will extend the capability to iPhone or Windows apps.

rsz_john_muellerThere is a misconception amongst a small few that Google only wants the absolute best websites and they don’t index websites they think aren’t worth their time or space in their index. In reality, this is far from the truth.

Google is always indexing content and they index pretty much anything they can find. Supposedly, the only thing they don’t index is spam.

SEO Roundtable pointed out that Google’s John Mueller commented in a Google Webmaster Help thread recently saying “unless the content is primarily spam (eg spun / rewritten / scraped content), we’d try to at least have it indexed.”

He was responding to a question about a site bot being fully indexed over a prolonged period of time, which he believes is the result of a bug, though he didn’t have any definite answers until it is shown to the indexing team.

Before anyone gets up in arms, that statement is a little misleading on the aspect of spam. Everyone knows Google still indexes their fair share of spam, and in some cases they even get ranked. Mueller’s comments instead show how Google tries to avoid adding spam to their index, but we it is obvious that they don’t succeed in avoiding indexing all of the junk.

Getting indexed isn’t the same as ranking, but to have any chance of being ranked you have to be indexed.