Tag Archive for: website indexing

To understand and rank websites in search results, Google is constantly using tools called crawlers to find and analyze new or recently updated web pages. What may surprise you is that the search engine actually uses three different types of crawlers depending on the situation with web pages. In fact, some of these crawlers may ignore the rules used to control how these crawlers interact with your site.

In the past week, those in the SEO world were surprised by the reveal that the search engine had begun using a new crawler called the GoogleOther crawler to relieve the strain on its main crawlers. Amidst this, I noticed some asking “Google has three different crawlers? I thought it was just Googlebot (the most well-known crawler which has been used by the search engine for over a decade).”  

In reality, the company uses quite a few more than just one crawler and it would take a while to go into exactly what each one does as you can see from the list of them (from Search Engine Roundtable) below: 

However, Google recently updated a help document called “Verifying Googlebot and other Google crawlers” that breaks all these crawlers into three specific groups. 

The Three Types of Google Web Crawlers

Googlebot: The first type of crawler is easily the most well-known and recognized. Googlebots are the tools used to index pages for the company’s main search results. This always observes the rules set out in robots.txt files.

Special-case Crawlers: In some cases, Google will create crawlers for very specific functions, such as AdsBot which assesses web page quality for those running ads on the platform. Depending on the situation, this may include ignoring the rules dictated in a robots.txt file. 

User-triggered Fetchers: When a user does something that requires for the search engine to then verify information (when the Google Site Verifier is triggered by the site owner, for example), Google will use special robots dedicated to these tasks. Because this is initiated by the user to complete a specific process, these crawlers ignore robots.txt rules entirely. 

Why This Matters

Understanding how Google analyzes and processes the web can allow you to optimize your site for the best performance better. Additionally, it is important to identify the crawlers used by Google and ensure they are blocked in analytics tools or they can appear as false visits or impressions.

For more, read the full help article here.

Thanks to its high-level of adaptability, JavaScript (JS) has been in use in some shape or form for more than 20 years and remains one of the most popular programming languages used to build websites.

However, Google’s Martin Splitt, a webmaster trends analyst, recently suggested that webmasters should begin moving away from the coding language to rank most quickly on search engines.

In an SEO Mythbusting video exploring the topic of web performance and search engine optimization, Splitt and Ada Rose Cannon of Samsung found themselves talking about JavaScript.

Specifically, they discussed how using too much JS can drag down a site’s performance and potentially drag them down in Google’s search index.

How JavaScript Holds Content Back

One of the biggest issues that arise with overusing JS is when sites publish content on a daily basis.

Google uses a two-pass indexing process to help verify content before it is added to the search index. In the case of a JavaScript-heavy page, Google first renders the non-JS elements like HTML and CSS. Then, the page gets put into a queue for more advanced crawling to render the rest of the content as processing resources are available.

This means Java-heavy pages may not be completely crawled and indexed for up to a week after being published.

For time-sensitive information, this can be the difference between being on the cutting-edge and getting left behind.

What You Can Do Instead

Splitt offers a few different techniques developers can use to ensure their site is being efficiently crawled and indexed as new content is published.

One way to get around the issue would be to use dynamic rendering, which provides Google with a static rendered version of your page – saving them the time and effort of rendering and crawling the page themselves.

The best course of action, though, would be to simply rely primarily on HTML and CSS for time-sensitive content.

Splitt takes time to explain that JavaScript is not inherently bad for your SEO or search rankings. Once they are indexed, JS-heavy sites “rank just fine.” The issue is ensuring content is crawled and indexed as quickly and efficiently as possible, so you can always be on the cutting edge.

The discussion gets pretty technical, but you can view the entire discussion in the full video below: