I'm a digital marketer who came into the job more through general marketing and content writing. My technical knowledge is still coming along, which I think is what is stumping me with this.
So the issue is I do a regular SEO health check every two weeks for my client's websites. This includes checking for any errors or red flags in Google Search Console, among other things. Recently I helped set up (mostly content) a new website for a client a couple of months ago. This week when doing their check I saw they had almost 60,000 pages indexed in GSC's index status checker. However, if I checked Google's SERPs for that domain I would only find 20ish pages.
Considering the website is new and for a business based out of a small town, that seemed like a red flag. Looking into it, I saw some articles suggest that problem could be due to "canonicalization, duplicate content, automatically generated pages, or that it has been hacked."
So here is what I have done so far, and what I've found.
The website is ecommerce, and the robots.txt file was not fully completed to match the other website locations for the same company. As such, some of the dynamic inventory urls for product listings pages was not disallowed. I have already fixed this so the robots file matches the other businesses.
I also checked the source code for the inventory pages (the website is through an ecommerce website provider, we did not develop the website ourselves) and saw that they did not have proper redirects on the inventory pages for different url forms. For example:
The website did not have redirects set up to the canonical URL for all inventory pages. I have put in requests for these redirects to be fixed.
So that was a really long preface to my question: Have I fixed the problem? I don't know if there is more I should be checking that explains the 60,000 indexed pages in GSC. I'll find out in a few days when GSC's data catches up with the days that the fixes were published, but I would like to know if there is more I should be doing in the meantime.