Did you know that Google can choose to ignore non self-referencing canonical URLs if the content doesn’t quite match? Leaving those pages indexed, increasing the risk of duplicated content and scrambling the overall site quality signal. I did suspect something was not quite right when one of the brands I work with decided to go nuts on their canonicalisation strategy for a huge swathe of single/configurable products.
A few weeks down the line I noticed a few URLs with non self-referencing canonical are still receiving organic sessions. As you can imagine I was scratching my head trying to understand why. I decided to do some research and found out I was not alone, it’s happened across many websites.
One of the main reasons why Google could choose to ignore the canonical directive if the content doesn’t match. Which would help to explain the case I briefly touched on earlier in this post. The question now is: at what percentage level would Google’s algo’s presume the onpage content is a match? 50%, 75% and so on? Are the titles, descriptions, and headings impacting factors? I suppose some experimentation is needed here to figure that one out. It would be a great study, so I might take that one up soon.
Another nugget of information, If you have a site with dozens or hundreds of pages with a non self-referencing canonical set. You can use Google Tag Manager (GTM) event tracking to listen out for organic sessions to those pages. That way you’ll be able to figure out if Google has respected your canonical request.
Digging around further led me in the direction of this post: https://www.gsqi.com/marketing-blog/google-ignore-rel-canonical-different-content/
It’s certainly worth the read after you’ve finished up here. The article outlines in more depth, how and why Google chooses to ignore non self-referencing canonical. I’m happy to take questions on this or help your team out if you have the same issue.