What are the common issues found in XML sitemaps? What impact do they have and how do you fix them?
That's what we're going to be discussing today with a lady who when she's not working on projects or writing about SEO, spends her time growing tomatoes, peppers, and herbs. She was recently mentored by Aleyda Solis and the Freelance Coalition for Developing Countries Tech SEO mentorship. And as a freelance contractor and founder of Tech SEO Journal. A warm welcome to the In Search SEO podcast, Katherine Nwanorue.
In this episode, Katherine shares four common XML sitemap issues and how to fix them, including:
Listing ineligible URLs
Unsupported HTML format error
Not declaring a page and its alternate version correctly
Having one larger sitemap for Separate Sections of a Website
Katherine: Hi, David. Thanks for having me.
D: Hey, Katherine. Great to have you here. You can find Katherine over at techseojournal.com. So Katherine, why are XML sitemaps so important?
K: That's a good question. For me, there are two main benefits to having an XML sitemap. For one, it helps search engines find your important pages. If you're dealing with a smaller website that has 500 pages or less, a good internal linking structure, and pages that don't change frequently, then having an XML sitemap is not your priority in this case. But if you're dealing with larger websites, where you have content that changes frequently, poor internal linking structures, and orphaned pages, then having a sitemap makes sense because it can help search engines to find your important pages.
But I'd also like to mention that having an XML sitemap doesn't guarantee indexing. It’s more like a hint, a clue, to search engines. It’s like saying to Google, "Hey, I have these really cool pages that I think should be indexed. Would you mind taking a look at it?”
The second benefit to having an XML sitemap is it helps in troubleshooting SEO issues, particularly indexing issues. But I'll come back to this one.
D: Great, now you said for larger sites. Do you have a certain number of pages in mind where if a site has over a certain number of pages then it's a good idea to use an XML sitemap?
K: Google documentation recommends 500 pages or more, the opposite of a smaller website. But in most cases, 1000 pages or more could be fine.
D: It's great to have definitive numbers to focus on there. So today, you're sharing the four common issues with XML sitemaps. Starting off with number one, listing ineligible URLs.
1. Listing Ineligible URLs
K: Yes, and by ineligible URLs, I mean URLs that generated a 404 error code. Maybe they are blocked by robots.txt, or they have a noindex tag, or they are being redirected. This usually comes up in the case where you likely created a sitemap manually and somehow these error pages made it into the file. Another reason why this comes up is that maybe you created an XML sitemap that doesn't update automatically, it's static. So when you make new pages on your website, it doesn't reflect on the sitemap. This is a problem because instead of crawling your valid pages, search engines are wasting time trying to access your heavily redirected content or pages they shouldn't be going to. And Google has mentioned that if they fail to crawl the sitemap after several attempts, they will eventually stop trying. This defeats the purpose of having an XML sitemap in the first place.
D: So if you keep ineligible URLs incorporated in your XML sitemap, then eventually Google will just ignore your XML sitemap, or perhaps not even trust your whole website as being as authoritative. This takes us up to number two when an XML sitemap generates an unsupported HTML format error.
2. Unsupported HTML Format Error
K: First, I will like to describe what an HTML Sitemap is all about. An HTML Sitemap is its content, links to your pages, and sections on your website. It is usually meant for human users to navigate your site. It is located in the footer section of your website. And yes, in most cases, search engines can also follow these links to find your pages. But that's where the similarity ends with an XML sitemap. An HTML Sitemap does not have a modification date. If you have videos, you might probably not even be able to link those videos to the HTML sitemap.
If you really want to tell search engines about your existing and updated content, then you should stick to an XML sitemap. But in a case where you're trying to submit an XML sitemap and you're getting this error that is in HTML format, then the chances are you're actually submitting a file in HTML format, or your sitemap has errors that that is making it difficult to read.
Another common reason why this comes up is that there's a caching functionality on your website where a plugin, server, or configuration is getting in the way and is instead serving a file in HTML format.
D: Great tips there as well. Are there any benefits to ever having an HTML sitemap and an XML sitemap at the same time or would an XML sitemap by itself suffice?
K: There are benefits to having both. If you can get additional value in something, why not? If your pages have really long crawl depth, where some pages are difficult to reach, then maybe an HTML sitemap can help you get users to pages they should be getting into and can help them navigate your website. On the other hand, an XML sitemap is meant for search engines. So in this case, you're considering both the users and the crawlers. So yes, it's definitely a good idea to have both.
D: And you obviously talked about the importance of coding an XML sitemap correctly. Is there anywhere that you recommend checking to see that your XML Sitemaps are coded correctly?
K: Yes, if you're worried that your XML sitemap might be an HTML file, you should try submitting it on Google Search Console. You will likely get a response that this is an HTML Sitemap. And when that happens, to find out what's really causing this, open up the XML sitemap on your browser, and inspect the page with Chrome dev tools. If there is a plugin, or if it's server caching, you'd likely see a list of things that are getting in the way. Once you find that there is a caching functionality, you can go ahead and change your configurations, clear your cache, and everything should go back to the way it is. But if you're not seeing any caching functionality getting in the way, then you likely have errors. And in this case, you would need to use an XML sitemap validator to check if there are errors or wrongly encoded attributes on your sitemap.
D: Brilliant. And if your XML sitemap is coded correctly, if it's proper XML, is there any danger of having a caching plugin freezing your XML sitemap so it's not updating correctly? Or is that not a danger with an XML file?
K: That is a danger. That’s why I recommended that you make sure there’s no caching in the plug-in. There’s a setting in many plugins where you can cache your sitemaps but you shouldn't as it could cause issues as I mentioned. It could make the XML sitemap come up as an HTML file. It could create so many errors so it's best not to cache your XML sitemap. Please don't. It's just better not to.
D: And number three is not declaring a page and its alternate version correctly.
3. Not Declaring a Page and its Alternate Version Correctly
K: This applies to a case where you're implementing your hreflang tag on your XML sitemap. And in this case, you need to specify the URL you want to be indexed and its alternate versions, including itself. For example, let's say, I have a page for English speakers on my website, and I would like it to be indexed. But at the same time, I have two alternate versions. I have a German version for speakers in Switzerland, and a Chinese version for Chinese users. So to do this I would specify the URL for the English page and then list these three versions. There’s the German version for Switzerland, the Chinese version for China, and the English version itself.
And these tags are reciprocal. So if each referenced alternate version doesn't point back to the other, there is a problem. Your XML sitemap could come up with issues where your hreflang tags are incorrectly interpreted, or they're completely ignored.
D: And that brings us up to number four, which is having one larger sitemap for separate sections of a website.
4. Having one Larger Sitemap for Separate Sections of a Website
K: Right. Currently, Google supports 50,000 URLs in a single sitemap, or a sitemap size of 50 MB uncompressed maximum, whichever one you hit first. That doesn't mean that if you have 50,000 URLs you should list all of them on one page. That is not good practice, because it could make it difficult for you to troubleshoot issues. You would have no idea which sections of your website are having indexing or crawling issues, sections where search engines and crawlers are not getting into.
Ideally, you should segment your sitemaps by sections. For example, if you have an e-commerce website, you could create a single sitemap for your static pages (About Us, terms and conditions, etc.) and then different sitemaps for your category pages. This way you can easily spot issues and filter down in the industry reports on Google Search Console, the sections of your website that are not getting crawled and indexed as they should.
D: Is there a maximum number of sitemaps that you can have for your site?
K: Currently, I can't really put a number to it right now.
D: What about checking and diagnosing your XML sitemaps on a regular basis? Is there something that you should be checking to see if there are errors once a month?
K: I wouldn't recommend once a month. If you really want to know what's going on on your website, you should, you should be checking your Google Search Console, because that's where you can see these errors for sitemap. You should be checking it periodically, daily if you can, or weekly, depending on the number of pages you publish. If you publish content every single day, then it makes sense to be checking it a lot. But if your pages rarely change, then weekly would be fine. But please, you have to check it.
D: People reading, listening, or watching this are thinking that I need to understand more about XML sitemaps. Are there any resources that you can recommend for people to find out more about XML sitemaps?
K: Yes. You should check the Google documentation. They have extensive documentation on what XML sitemap is all about, how to create one, the best practices, and how to manage it for multilingual sites. Almost every piece of information you can find is there.
Pareto Pickle - Having a Good Internal Linking Structure
D: Superb. Let's finish off with the Pareto Pickle. Pareto says that you can get 80% of your results from 20% of your efforts. What's one SEO activity that you would recommend that provides incredible results for modest levels of effort?
K: I'd say good internal linking structure. This is because it's the primary source of URL discovery for search engines. They can follow links within your site to get to important pages on your website. And beyond this, you can also use links to pass SEO value to other pages as well. Or you can also use it to indicate the relative importance of a page over others and the relationship between your pages. So yeah, a good internal linking structure is best.
D: So does having a good internal linking structure not mean that XML sitemaps aren't necessary?
K: Not really. As I mentioned, if you have a good internal linking structure and a smaller website, it is not a priority to have an XML sitemap. But some sites can see more value from an XML sitemap than others. If you have really large pages, you can have a good internal linking structure but who doesn't want more value? Who doesn’t want search engines to get to their pages on time? So an XML sitemap is like a secondary precaution you take while a good internal linking structure is the primary step you take.
D: Is an HTML Sitemap necessary if you have good internal links?
K: If users are still having issues navigating to your website, then an HTML Sitemap makes sense.
D: Understood. So if you're having issues with getting URLs ranked, or perhaps navigation, as you say, then an HTML Sitemap could be good for both users and search engines. But if all of the pages that you want to be indexed are indexed then it's not necessary to have an HTML Sitemap.
K: Yes, it is not necessary. It is nice to have in that case, but not a must-have.
D: Well, I've been your host, David Bain. You can find Katherine over at techseojournal.com. Katherine, thanks so much for being on the In Search SEO podcast.
K: Thank you for having me. It's been a pleasure.
D: And thank you for listening. Check out all the previous episodes and sign up for a free trial of the Rank Ranger platform over at rankranger.com.
Discover how Rank Ranger can enhance your business
All the data in insights you need to dominate the SERPs