robots.txt update clarification

3 replies
  • SEO
  • |
I am working on a very large dynamically scripted site with many thousands of indexed pages. All of the urls have been changed. I am reconstructing the robots.txt file.

My question is this: If I disallow a path where an EXISTING google serp has been indexed, this does NOT automatically remove the current SERP, correct? That SERP would eventually go away when bots stop crawling the page but only after the new url has been crawled? In other words, will google see a disallowed path and say, 'go and immediately remove all these urls from our index'? I want the old pages/urls to drop off gradually as they get replaced by the new ones. Should I wait to disallow an old path for a while until the new paths are in place? Or does robots.txt affect only the CRAWL?
#clarification #robotstxt #update
  • Profile picture of the author trevord92
    As far as I'm aware, robots.txt only affects the crawl - and even then, only if the robot chooses to honour it (Google will, others may not). So you're correct in thinking that old pages take a while to drop out of the index.

    It may be quicker/better/easier to 301 (permanent) redirect the old pages to their new equivalent. That way anyone finding your content in a current search will get take to the latest page.

    On the scale you're talking about, that's probably a job for a techie.
    {{ DiscussionBoard.errors[9906634].message }}
  • Profile picture of the author DivyaRai
    Spiders might have already crawled your old web pages. So 301 redirection to your old pages, that can redirect your old pages to new pages. Learn about Rewrite url before applying 301 redirection.
    {{ DiscussionBoard.errors[9907241].message }}
    • Profile picture of the author Steviebone
      Originally Posted by DivyaRai View Post

      Spiders might have already crawled your old web pages. So 301 redirection to your old pages, that can redirect your old pages to new pages. Learn about Rewrite url before applying 301 redirection.
      Of course spiders have already crawled the pages. How else would they be in the index? I am quite familiar with 301 and rewrite. 301 is not possible for the script driven functions governing hundreds of thousands of pages. Rewrite, etc also assumes certain operating systems and configurations. Thanks tho.

      The question was about how robots.txt is implemented.
      {{ DiscussionBoard.errors[9907258].message }}

Trending Topics