1 Easy Way to Audit your XML Sitemap |
|
|
How many times have you looked at an XML sitemap and said, “Why are these URLs in here?” One time, I was getting so frustrated doing an indexability audit for my client because Google wasn’t paying attention to canonicals for a variety of reasons. One in particular was the XML sitemap not pointing to the canonical, ... Read more
How many times have you looked at an XML sitemap and said, “Why are these URLs in here?” One time, I was getting so frustrated doing an indexability audit for my client because Google wasn’t paying attention to canonicals for a variety of reasons. One in particular was the XML sitemap not pointing to the canonical, or clean version of the URL, but to the ugly URL with tracking parameters. It begged the question, “If I tell Google of a page through the XML sitemap, and then immediately command that page to give credit to another version through the canonical, what will Google believe? If the site isn’t consistent, how is Google supposed to be consistent?“ Who uses a sitemap generator and calls it good? I know I did. Who audits that sitemap to make sure the generator picked only the good URLs? Hmm… How do you audit thousands and thousands of URLs from an XML sitemap? In addition to this problem, earlier in March, SEOmoz did a Whiteboard Friday with Duane Forrester, the new senior project manager for Bing’s Webmaster Tools (embedded below). About eight minutes in, Forrester talked about how important it is to Bing to have a clean, even “hyper clean” sitemap. An XML sitemap must pass a quality threshold for Bing to recognize, use, and trust it. This means that you don’t want 404 pages, 301 or 302 redirects or URLs with different canonicals.
Again, the question is: How do I easily audit the quality of my XML sitemap? Well, after finally putting some serious thought into it, here is one of the easiest ways for anyone to audit their XML sitemap: Parse the XML sitemapThere are 100 ways to skin a cat, so if you can write a quick script to do this, by all means, go ahead. For the non-nerdy type, there is a pretty simple process using Excel to get all of the URLs from your XML file.
Clean Up the Excel FileHere is the easiest way to strip everything else out of the file so you just have your URLs.
Audit the URLsNow that you have a list of the URLs in your sitemap, it is time to audit them for 404s, 301 and 302 redirects and any canonical issues. I recently discovered this great tool for the job and I am so glad I did. The Screaming Frog SEO Spider tool has got to be one of the “must have” resources for an SEO. It is expensive, but well worth the price if you need to dig into a site (and it works for Mac and PC!). They do offer a free version that will allow you to crawl up to 500 URLs, which will work for this example. So now that we have a list we’re going to upload it to Screaming Frog’s SEO Spider.
The Spider will crawl through your lists and return loads of valuable information (too much to explain in this post). For this exercise, it will list all the HTML status codes for each URL. You can easily filter by HTML status code to see 4XX type errors, 3XX redirects, server errors and more from your XML sitemap.
The tool will also create a list of canonicals that each URL has. I can then go through these to correct any listed URLs that are different from the canonical I want to report to the search engines. To make it extra fluffy, the tool has the ability to export everything back to Excel where you can slice and dice your reports to your heart’s content. This is one of the easiest ways I know to audit an XML sitemap. If you have any other ways to make this easier please feel free to chime in and share. Posted originally: 2011-04-25 11:59:05 |






