Tag Archive for: Screaming Frog

Technical SEO can be interesting, but no one likes coming across the same problems time and time again. That’s why it’s shocking how many websites are struggling with the same issues.

Here are some of the most frequent issues that can found while doing a site audit. We also have the solutions, so you can be prepared if you come across any of these issues.

1) Uppercase vs. Lowercase URLs – This happens most often on sites that use .NET. The server is configured to respond to URLs with uppercase letters and doesn’t redirect or rewrite to lowercase versions. This issue is slowly disappearing because search engines are improving a lot at recognizing canonical versions and disregarding copies. Just because it is going away doesn’t mean this issue should be ignored. Search engines still make mistakes doing this, so don’t rely on them.

Luckily, there is a an easy fix for this issue in the form of a URL rewrite module, which solves the issue on IIS 7 servers. There is a convenient option inside the interface that allows you to enforce lowercase URLs. If you do this, a rule is added to the web config file and this problem is gone.

2) Multiple Versions of the Homepage – If you are auditing a .NET website, go check to see if www.example.com/default.aspx exists. Most likely, it does. The page is a duplicate that search engines often find via navigation or XML sitemaps. Other platforms will instead make URLs like www.example.com/index.html or www.example.com/home. Most contemporary search engines automatically fix the problem, but why not make sure there isn’t an issue to be fixed?

The best way to solve this problem begins with doing a crawl of the site and exporting it into a CSV filtered by META title column. Do a search for the homepage title and you’ll quickly spot duplicates of your homepage. An easy fix for these duplicates is to add a 301 redirect version of the page that directs to the correct version.

You can also do a crawl with a tool like Screaming Frog to find internal links that point to the duplicate pages. Then, you can edit the duplicate pages so they direct to the correct URL. Having internal links that go via 301 can cost you some link equity.

3) Query Parameters Added to the End of URLs – This issue is most common on database driven eCommerce websites because there are tons of attributes and filtering options. This means often you will find URLs like www.example.com/product-category?color=12. In that example, the product is filtered by color. Filtering like this can be good for users, but bad for searches. Unless your customers do not search for the specific product using color, the URL is probably not the best landing page to target with keywords.

Another issue that tends to show up on tons of crawls of sites is when these parameters are combined together. The worst is when the parameters can be combined in different orders but return the same content, such as:

www.example.com/product-category?color=12&size=5 

www.example.com/product-category?size=5&color=12

Because both of these have different paths but return the same content, they are seen as duplicate content. It is important to remember Google allocates crawl budget based on PageRank. Make sure your budget is being used efficiently.

To begin fixing this issue, you need to address which pages you want Google to crawl and index. Make this decision based on keyword research and cross reference all database attributes with your core target keywords. You need to figure out what attributes are keywords used to find products. In figuring this out, it is possible to find high search volume for certain keywords, for example “Nike” = “Running Shoes.” If you find this, you want a landing page for “Nike Running Shoes” to be crawlable and indexable. Make sure the database attribute has an SEO friendly URL and ensure that the URLs are part of the navigation structure of your site so that a good flow of PageRank users can find the pages easily.

The next step depends on whether you want the specific attribute indexed or not. If the URLs are not already indexed, add the URL structure to your robots.txt file and test your regex properly to make sure you don’t block anything accidentally. Also, make sure you use the Fetch as Google feature in Webmaster tools. Remember, however, if the URLs are already indexed, adding them to your robots.txt file will not remove them.

If the URLs are indexed, unfortunately you are in need of the rel=canonical tag. If you inherit one of these situations and are not able to fix the core of the issue, the rel=canonical tag covers the issue in hope that it can be solved later. You’ll want to add this tag to the URLs you do not want indexed and point to the most relevant URL you do want indexed.

4) Soft URL Errors – A soft 404 is a page that looks like a 404 but returns a HTTP status code 200. If this happens, the user sees something resembling “Sorry the page you requested cannot be found”, but the code 200 tells search engines that the page is still working. This disconnect can be the source of the issue with pages being crawled and indexed when you don’t want them to be. A soft 404 also means real broken pages can’t be found.

Thankfully, this problem has a very easy fix for any developer who can set the page to return a 404 status code instead of a 200. You can use Google Webmaster tools to find any soft 404s Google has detected. You can also perform a manual check by going to a broken URL and seeing what status code is returned.

5) 302 Redirects Instead of 301 Redirects – Because users won’t be able to tell there is even a problem, this is a pretty easy problem for developers to make. A 301 redirect is permanent. Search engines recognize this and send link equity elsewhere. A 302 redirect is temporary and search engines will expect the original page to return soon, which leaves link equity where it is.

Find 302s by using a deep crawler like Screaming Frog. It allows you to filter by 302s, which you can then check individually. You can then ask your developers to change any that should be 301s.

6) Broken or Outdated Site Maps – XML sitemaps may not be essential, but they are very useful to search engines that make sure they can find all the URLs that matter. XML sitemaps help show the search engines what is important. Letting your sitemap become outdated causes them to contain broken links and miss any new content and URLs. Keeping sitemaps updated is especially important for big sites that add new pages frequently. Bing also penalizes sites with too many issues in their sitemaps.

Audit your current sitemap for broken links. After, speak to your developers about updating your XML sitemap and make it dynamic so that it updates frequently. How frequently depends on your resources, but doing this will save you a lot of trouble later.

It is very possible you will come across other issues while doing an audit, but, hopefully, if you come across any of these, you are now prepared to fix the problem.

 

For more Technical SEO Problems, read this article by Paddy Moogan at SEOmoz.