May 6, 2021

Google Search Console showing "Sitemap could not be read" and "Couldn't fetch"

The issue

  1. We run an Opencart website which has a built-in, default sitemap available at http://example.com/index.php?route=extension/feed/google_sitemap
  2. This sitemap has been submitted to Google Search Console → Sitemaps for years
  3. Google has stopped fetching it about a year ago, and any attempts to re-submit it under the same URL (or it's http/https and www/non-www variations) resulted in Couldn't fetch displayed in the list, as well as Sitemap could not be read on the drill-down page.

The confusion

  1. This support thread Sitemap could not be read in new GSC says it's a known bug:
    That the new console says 'couldnt fetch' is a bug in the console. Pending is the real status! Alas there is no real way (that I know of!) to tell between them. But can try using the URL Inspection on the sitemap url. it SHOULD say not indexed (it's a sitemap not a page, so shouldn't be indexed!), but can use the Live Test, to check if Googlebot CAN fetch it! Again as its a sitemap, NOT a page, DON'T use the 'Request Indexing' button! If Google can fetch it, it's most likely just Pending, which case just wait.
  2. The above has kept me from experimenting with our sitemap, but after keeping an eye on this for a few months, the status hasn't changed, so I decided to revisit

The solution

  1. Make your sitemap avalilable under /sitemap.xml. I have used the following Nginx rule: rewrite ^/sitemap.xml$ /index.php?route=extension/feed/google_sitemap break;
  2. Explicitly link to it using the robots.txt directive: Sitemap: https://www.example.com/sitemap.xml
  3. If you have a multistore (several domain names served by the same Opencart copy), you may have to have a dynamic robots.txt that replies with e.g. Sitemap: http<?=isset($_SERVER['HTTPS'])?'s':''?>://<?=$_SERVER['HTTP_HOST']?>/sitemap.xml. Make sure it's accessible under the default robots.txt name: rewrite ^/robots.txt$ /robots.php break;.
  4. The newly submitted Sitemap (essentially the same built-in Opencart sitemap submitted under a different URL) got fetched and showed Status=Success in just a few minutes, so there was no need to actually wait.

No comments: