Google Web Crawl Problem

10 replies
  • SEO
  • |
Hi Guys,

On checking Diagnostics for my website on google's Webmaster tools page, it says "We had problems crawling the pages listed here". It then gives this page: http://MySite.com/index.html and says 100 pages are linking to it.

Well, firstly, the sites linking to it give this url: http://MySite.com - NOT http://MySite.com/index.html. I should know, I've built those links!

It's true I haven't got a page called http://MySite.com/index.html - but as I've said, the links DON'T point to that url anyway!

Can someone be good enough to help me out. This is very confusing!

Thanks,

Simon
#crawl #google #problem #web
  • Profile picture of the author angilina
    You can simply do a 301 redirect from http://MySite.com/index.html to http://MySite.com.

    I think that can solve the problem?
    {{ DiscussionBoard.errors[712041].message }}
    • Profile picture of the author Unexpected Error
      Originally Posted by angilina View Post

      You can simply do a 301 redirect from http://MySite.com/index.html to http://MySite.com.

      I think that can solve the problem?
      I don't think so, Angilina. Google wants to actually crawl http://MySite.com/index.html but can't - since that page doesn't exist! So a re-direct wouldn't solve the problem that google says it has.

      But the other thing, of course, is WHY it wants to crawl http://MySite.com/index.html from links that point to http://MySite.com!!!
      {{ DiscussionBoard.errors[712106].message }}
      • Profile picture of the author Netcel
        I think a 301 redirect will work.

        In affect you'll be telling Google that http://mysite.com/index.html is actually http://MySite.com so it should stop trying to crawl http://mysite.com/index.html

        Originally Posted by Unexpected Error View Post

        I don't think so, Angilina. Google wants to actually crawl http://MySite.com/index.html but can't - since that page doesn't exist! So a re-direct wouldn't solve the problem that google says it has.

        But the other thing, of course, is WHY it wants to crawl http://MySite.com/index.html from links that point to http://MySite.com!!!
        Signature

        {{ DiscussionBoard.errors[712152].message }}
        • Profile picture of the author dburk
          Hi Simon,

          The Googlebot doesn't attempt to crawl pages unless there are links to them (except when testing for 404 response). While you might not have created those links you cannot stop other people from linking to pages that don't exist.

          Most of my websites do not have a file named index.html and the googlebot has never had a problem crawling those sites. I assume you have checked your site map and sitemap.xml files for this errant URL?

          Your server logs will show your the referrer that is linking to your non-existent URL. Contact them and ask them to update their links. You may also want to verify your own links by using Xenu's Link Sleuth (useful for finding broken links).
          {{ DiscussionBoard.errors[712675].message }}
          • Profile picture of the author Unexpected Error
            Originally Posted by dburk View Post

            Hi Simon,

            The Googlebot doesn't attempt to crawl pages unless there are links to them (except when testing for 404 response). While you might not have created those links you cannot stop other people from linking to pages that don't exist.

            Most of my websites do not have a file named index.html and the googlebot has never had a problem crawling those sites. I assume you have checked your site map and sitemap.xml files for this errant URL?

            Your server logs will show your the referrer that is linking to your non-existent URL. Contact them and ask them to update their links. You may also want to verify your own links by using Xenu's Link Sleuth (useful for finding broken links).
            Thanks for your help. However, you are assuming that other people have added the link to the url http://mysite.com/index.html.

            They haven't. The links are MINE. I made them and the url I used was http://MySite.com. So why does google say the link is http://mysite.com/index.html when it certainly is not???
            {{ DiscussionBoard.errors[712860].message }}
            • Profile picture of the author dburk
              Originally Posted by Unexpected Error View Post

              Thanks for your help. However, you are assuming that other people have added the link to the url http://mysite.com/index.html.

              They haven't. The links are MINE. I made them and the url I used was http://MySite.com. So why does google say the link is http://mysite.com/index.html when it certainly is not???
              I assume that it is a possibility, why would you assume that the invalid backlinks had to be placed by yourself and that they couldn't be incorrect? That seems like a contradiction, if you didn't create invalid backlinks then the only reasonable conclusion is that someone else did. :confused:

              How do you know that that someone else has not linked to your website? You have no control over who links, when and where they link. Have you researched your server logs to find out who the referrer might be?

              Have used a tool like Link Sleuth to verify all links on your own site? There could be something there that wasn't put there by you, happens to folks frequently. While it could be links placed by someone else it's likely that the links were somehow inadvertently placed on your own website. Run the tool!

              The bottom line is that something somewhere is pointing to that URL, the googlebot doesn't just fabricate URLs (except when testing for valid 404 response).
              {{ DiscussionBoard.errors[713613].message }}
              • Profile picture of the author Unexpected Error
                Originally Posted by dburk View Post

                I assume that it is a possibility, why would you assume that the invalid backlinks had to be placed by yourself and that they couldn't be incorrect?
                Hi Don,

                I know, because I clicked on the links google said pointed to http://www.MySite/index.html. They are links made by me. I clicked on the links google said go to http://www.MySite/index.html. and landed on my http://www.MySite.com home page.

                I appreciate all you say about broken links - but the links google is saying go to my non existent index page do not go there. This is an indisputable fact - I clicked those very same links myself and did not get a 404 message page - I got my http://www.MySite.com home page.
                {{ DiscussionBoard.errors[714735].message }}
        • Profile picture of the author Unexpected Error
          Originally Posted by Netcel View Post

          I think a 301 redirect will work.

          In affect you'll be telling Google that http://mysite.com/index.html is actually http://MySite.com so it should stop trying to crawl http://mysite.com/index.html
          Ok, thanks very much.
          {{ DiscussionBoard.errors[712845].message }}
  • Profile picture of the author tamtu
    By using Google's new canonical link attribute you can give specific direction as to which link to crawl. Check that out.
    {{ DiscussionBoard.errors[714751].message }}
    • Profile picture of the author Unexpected Error
      Originally Posted by tamtu View Post

      By using Google's new canonical link attribute you can give specific direction as to which link to crawl. Check that out.
      Ok, thanks. I'll try it.
      {{ DiscussionBoard.errors[719127].message }}

Trending Topics