Posts

No matter how bad of shape your website is in, Google will crawl it. Google crawls and indexes seemingly the entire internet. Though we know they may not look as deep into low-quality websites, that doesn’t mean they haven’t at least crawled and indexed the landing page. It takes something truly special to keep Google from crawling and indexing a page, but there are two common mistakes that can actually manage to keep Google away.

Technical SEO is one of the most difficult aspects of optimization to grasp, but if you are making these two simple mistakes, it can keep search engines, especially Google, from correctly indexing your websites. If your site isn’t getting correctly indexed, you have absolutely no chance of ranking well. Until you fix the problem your site is going to be severely crippled, so it is imperative you aren’t ignoring these issues.

1. The 301 Redirects on Your Website are Broken

It is a commonly accepted practice to use 301 redirects after a website redesign. As Free-SEO-News mentioned in their latest newsletter, using these redirects properly allows you to retain the ranking equity you’ve built with your website, rather than having to start again from the bottom.

The problem is when these 301 redirects aren’t implemented properly. Even worse, sometimes properly working redirects can suddenly falter, so you can’t place your faith in the redirects working correctly forever. Code changes, new plugins, or broken databases can cause your working 301’s to begin linking to non-existing pages.

Broken links are an automatic wrecking ball to all your efforts building a solid link portfolio. The best way to ensure that all your links are working is to download a website audit tool, such as SEOprofiler, which automatically checks all of your links and redirects. If your links or redirects suddenly stop working, you will be warned before you start getting punished by the search engines.

2. Rel=canonical Attributes Are Causing Problems

Just as with 301 redirects, the rel=canonical attribute serves a legitimate purpose when used correctly. The attribute can help you avoid problems with duplicate content, but those using the tag without knowing what they are doing can find themselves with some major issues.

Two of the biggest faux pas that we see regularly committed by site owners are to add a rel=canonical attribute which points to the index page to all web pages or to other pages that use the ‘noindex’ attribute. In both scenarios, Google won’t index the web pages at all.

The best advise is to simply stay away from the rel=canonical attribute unless you are absolutely sure what you’re doing. The only proper time to use the attribute is on duplicate pages, and anywhere else will result in significant problems. The problems that can come from using the attribute incorrectly are much worse than those you might see by failing to use the tag on duplicate pages.

Technical SEO can be interesting, but no one likes coming across the same problems time and time again. That’s why it’s shocking how many websites are struggling with the same issues.

Here are some of the most frequent issues that can found while doing a site audit. We also have the solutions, so you can be prepared if you come across any of these issues.

1) Uppercase vs. Lowercase URLs – This happens most often on sites that use .NET. The server is configured to respond to URLs with uppercase letters and doesn’t redirect or rewrite to lowercase versions. This issue is slowly disappearing because search engines are improving a lot at recognizing canonical versions and disregarding copies. Just because it is going away doesn’t mean this issue should be ignored. Search engines still make mistakes doing this, so don’t rely on them.

Luckily, there is a an easy fix for this issue in the form of a URL rewrite module, which solves the issue on IIS 7 servers. There is a convenient option inside the interface that allows you to enforce lowercase URLs. If you do this, a rule is added to the web config file and this problem is gone.

2) Multiple Versions of the Homepage – If you are auditing a .NET website, go check to see if www.example.com/default.aspx exists. Most likely, it does. The page is a duplicate that search engines often find via navigation or XML sitemaps. Other platforms will instead make URLs like www.example.com/index.html or www.example.com/home. Most contemporary search engines automatically fix the problem, but why not make sure there isn’t an issue to be fixed?

The best way to solve this problem begins with doing a crawl of the site and exporting it into a CSV filtered by META title column. Do a search for the homepage title and you’ll quickly spot duplicates of your homepage. An easy fix for these duplicates is to add a 301 redirect version of the page that directs to the correct version.

You can also do a crawl with a tool like Screaming Frog to find internal links that point to the duplicate pages. Then, you can edit the duplicate pages so they direct to the correct URL. Having internal links that go via 301 can cost you some link equity.

3) Query Parameters Added to the End of URLs – This issue is most common on database driven eCommerce websites because there are tons of attributes and filtering options. This means often you will find URLs like www.example.com/product-category?color=12. In that example, the product is filtered by color. Filtering like this can be good for users, but bad for searches. Unless your customers do not search for the specific product using color, the URL is probably not the best landing page to target with keywords.

Another issue that tends to show up on tons of crawls of sites is when these parameters are combined together. The worst is when the parameters can be combined in different orders but return the same content, such as:

www.example.com/product-category?color=12&size=5 

www.example.com/product-category?size=5&color=12

Because both of these have different paths but return the same content, they are seen as duplicate content. It is important to remember Google allocates crawl budget based on PageRank. Make sure your budget is being used efficiently.

To begin fixing this issue, you need to address which pages you want Google to crawl and index. Make this decision based on keyword research and cross reference all database attributes with your core target keywords. You need to figure out what attributes are keywords used to find products. In figuring this out, it is possible to find high search volume for certain keywords, for example “Nike” = “Running Shoes.” If you find this, you want a landing page for “Nike Running Shoes” to be crawlable and indexable. Make sure the database attribute has an SEO friendly URL and ensure that the URLs are part of the navigation structure of your site so that a good flow of PageRank users can find the pages easily.

The next step depends on whether you want the specific attribute indexed or not. If the URLs are not already indexed, add the URL structure to your robots.txt file and test your regex properly to make sure you don’t block anything accidentally. Also, make sure you use the Fetch as Google feature in Webmaster tools. Remember, however, if the URLs are already indexed, adding them to your robots.txt file will not remove them.

If the URLs are indexed, unfortunately you are in need of the rel=canonical tag. If you inherit one of these situations and are not able to fix the core of the issue, the rel=canonical tag covers the issue in hope that it can be solved later. You’ll want to add this tag to the URLs you do not want indexed and point to the most relevant URL you do want indexed.

4) Soft URL Errors – A soft 404 is a page that looks like a 404 but returns a HTTP status code 200. If this happens, the user sees something resembling “Sorry the page you requested cannot be found”, but the code 200 tells search engines that the page is still working. This disconnect can be the source of the issue with pages being crawled and indexed when you don’t want them to be. A soft 404 also means real broken pages can’t be found.

Thankfully, this problem has a very easy fix for any developer who can set the page to return a 404 status code instead of a 200. You can use Google Webmaster tools to find any soft 404s Google has detected. You can also perform a manual check by going to a broken URL and seeing what status code is returned.

5) 302 Redirects Instead of 301 Redirects – Because users won’t be able to tell there is even a problem, this is a pretty easy problem for developers to make. A 301 redirect is permanent. Search engines recognize this and send link equity elsewhere. A 302 redirect is temporary and search engines will expect the original page to return soon, which leaves link equity where it is.

Find 302s by using a deep crawler like Screaming Frog. It allows you to filter by 302s, which you can then check individually. You can then ask your developers to change any that should be 301s.

6) Broken or Outdated Site Maps – XML sitemaps may not be essential, but they are very useful to search engines that make sure they can find all the URLs that matter. XML sitemaps help show the search engines what is important. Letting your sitemap become outdated causes them to contain broken links and miss any new content and URLs. Keeping sitemaps updated is especially important for big sites that add new pages frequently. Bing also penalizes sites with too many issues in their sitemaps.

Audit your current sitemap for broken links. After, speak to your developers about updating your XML sitemap and make it dynamic so that it updates frequently. How frequently depends on your resources, but doing this will save you a lot of trouble later.

It is very possible you will come across other issues while doing an audit, but, hopefully, if you come across any of these, you are now prepared to fix the problem.

 

For more Technical SEO Problems, read this article by Paddy Moogan at SEOmoz.