Why are there so many pages indexed?

by dmishardthough 6 replies
Hi there,

I'm a digital marketer who came into the job more through general marketing and content writing. My technical knowledge is still coming along, which I think is what is stumping me with this.

So the issue is I do a regular SEO health check every two weeks for my client's websites. This includes checking for any errors or red flags in Google Search Console, among other things. Recently I helped set up (mostly content) a new website for a client a couple of months ago. This week when doing their check I saw they had almost 60,000 pages indexed in GSC's index status checker. However, if I checked Google's SERPs for that domain I would only find 20ish pages.

Considering the website is new and for a business based out of a small town, that seemed like a red flag. Looking into it, I saw some articles suggest that problem could be due to "canonicalization, duplicate content, automatically generated pages, or that it has been hacked."

So here is what I have done so far, and what I've found.

The website is ecommerce, and the robots.txt file was not fully completed to match the other website locations for the same company. As such, some of the dynamic inventory urls for product listings pages was not disallowed. I have already fixed this so the robots file matches the other businesses.

I also checked the source code for the inventory pages (the website is through an ecommerce website provider, we did not develop the website ourselves) and saw that they did not have proper redirects on the inventory pages for different url forms. For example:

domainname.com/inventory/
www.domainname.com/inventory/

The website did not have redirects set up to the canonical URL for all inventory pages. I have put in requests for these redirects to be fixed.

So that was a really long preface to my question: Have I fixed the problem? I don't know if there is more I should be checking that explains the 60,000 indexed pages in GSC. I'll find out in a few days when GSC's data catches up with the days that the fixes were published, but I would like to know if there is more I should be doing in the meantime.
#beginners area #indexed #pages
Avatar of Unregistered
  • Profile picture of the author facebollywood
    Hey

    It happened to me also when GSC indexed shows the pages which were not part of my website.
    Although that were very few and I did nothing but waited for Google to show the Red flag and when anyone on those pages might have clicked google showed them in GSC again and there I resubmitted those pages and after sometimes they were removoed automatically by the google.

    Hope so this helps!!!
    {{ DiscussionBoard.errors[11293385].message }}
  • Profile picture of the author tritrain
    It could be associated with more than just the pages themselves, but also media, or anything that is served separately. Does the report show links of the indexed content? If so, take a look at some of it and see if it is for seemingly random things.

    They may need to make changes to more page content, so it is not indexed, such as media files.
    {{ DiscussionBoard.errors[11293388].message }}
  • Profile picture of the author dmishardthough
    Annoyingly it does not show any links, at least I can't figure out how to find it despite looking it up. Do you know how I can find any list of indexed and noindexed content for the site? Are there any tools or programs I can use that can show that?
    {{ DiscussionBoard.errors[11293391].message }}
  • Profile picture of the author tritrain
    Yes, in Google Search Console you should see Google Index, Index Status.

    Make sure you have brought up the correct property (upper right of screen). Otherwise, you may be looking at a version that has not been properly indexed.

    If you do site:<domain> it should bring up all the content that is indexed. To see images, you need to click on the Images tab at the top of the screen, for example.
    {{ DiscussionBoard.errors[11293476].message }}
  • Profile picture of the author dmishardthough
    Hmm, I did the site:domain search for web and image results. For web results it only has 21 pages, which is what it should be. For image results it has far fewer results than comparable websites, so I don't think it is that.

    I can access the Index Status tool in Search Console and see the total number of indexed pages, that's what I was using to see the nearly 60,000 indexed pages in the first place. But I don't see anywhere that shows me exactly what pages are being indexed. I tried all the properties we have (http, https, www, non-www) for the website to make sure but only the one I was checking showed any pages being indexed, and that is the one I was looking at.

    I talked with a co-worker who said he had a similar problem with a few websites in the past, and apparently after a little while the problem went away before he figured out what the problem was and did something about it. We're trying to think if we had any major technical adjustments to all our client sites back then that might have fixed it without us realizing it at the time, but nothing is coming to mind.

    Thanks for your help!
    {{ DiscussionBoard.errors[11293480].message }}
  • Profile picture of the author DABK
    Using a wordpress site? Have plugins? You can end up with many image urls just from the original setup (you know, thumb, small, medium, large). Also, if not properly set up, you can end up with a page for each category, for tags too.

    Or, your site was taken over by someone that's created a tons of pages... then deleted them... or is now hiding them. Or you bought the domain name recently from someone who had their site taken over by a spammer.
    {{ DiscussionBoard.errors[11293507].message }}
Avatar of Unregistered

Trending Topics