How to block urls using robots.txt?

9 replies
  • SEO
  • |
Hi guys, i need an urgent help. I want to block/disallow some broken links from google webmaster tools using robots.txt. I search everywhere on web, but i didn't get any correct information. If you know please help me........
#404 errors #block #block urls #broken links remove #robotstxt #urls #webmaster tools
  • Profile picture of the author HD Node
    To block all search engines from visiting certain pages on your site, use the following format:

    Code:
    User-agent: *
    Disallow: /pages/warrior/thisisthefirstpage.html
    Disallow: /pages/warrior/thisisthesecondpage.html
    If you just want to block Google, then replace 'User-Agent: *' with 'User-Agent: Googlebot'.
    {{ DiscussionBoard.errors[6663028].message }}
    • Profile picture of the author pushmon
      Originally Posted by HD Node View Post

      To block all search engines from visiting certain pages on your site, use the following format:

      Code:
      User-agent: *
      Disallow: /pages/warrior/thisisthefirstpage.html
      Disallow: /pages/warrior/thisisthesecondpage.html
      If you just want to block Google, then replace 'User-Agent: *' with 'User-Agent: Googlebot'.
      Thanks HD Node, I also want to know, where i edit and put the robots.txt. I am using a wordpress self hosted site and also using "Google XML Sitemap" plugin in my site, it automatically generates the sitemap and robots.txt. I didn't find any robots.txt on my server. Please help me to solve this problem.
      {{ DiscussionBoard.errors[6663138].message }}
      • Profile picture of the author HD Node
        The robots.txt file should be placed in your root directory.
        {{ DiscussionBoard.errors[6665642].message }}
    • Profile picture of the author kaytav
      Originally Posted by HD Node View Post

      To block all search engines from visiting certain pages on your site, use the following format:

      Code:
      User-agent: *
      Disallow: /pages/warrior/thisisthefirstpage.html
      Disallow: /pages/warrior/thisisthesecondpage.html
      If you just want to block Google, then replace 'User-Agent: *' with 'User-Agent: Googlebot'.
      That's a perfect way to block something from the crawlers. Robot files help a lot in hiding our private pages from being crawled like Privacy pages, Policy, etc.
      {{ DiscussionBoard.errors[7478322].message }}
    • Profile picture of the author redchillies
      Originally Posted by HD Node View Post

      To block all search engines from visiting certain pages on your site, use the following format:

      Code:
      User-agent: *
      Disallow: /pages/warrior/thisisthefirstpage.html
      Disallow: /pages/warrior/thisisthesecondpage.html
      If you just want to block Google, then replace 'User-Agent: *' with 'User-Agent: Googlebot'.

      It is an ideal way of blocking the crawlers
      {{ DiscussionBoard.errors[7479050].message }}
  • Profile picture of the author KirkMcD
    Don't block them. Use a 301 Redirect to send them to another page on your site.
    {{ DiscussionBoard.errors[6668886].message }}
  • Profile picture of the author FtechBlog
    According to Google Adsense, One of my article was violating their rules and guidelines and I was looking for blocking that URL. Now I have blocked that URL as per guidelines by you in robots.txt.. Is there anything else I should do.
    {{ DiscussionBoard.errors[7478155].message }}
  • Profile picture of the author samual james
    It would be better if you can find those broken links and rectified them; otherwise you can block the particular URL on which broken link is placed (Described above)
    {{ DiscussionBoard.errors[7478298].message }}
  • Profile picture of the author trevord92
    Just make sure you don't block things like individual product download pages otherwise some human visitors will find the page. Block an entire directory instead and then put your product download page there.

    And remember that robots.txt is only a suggestion - Google, etc will obey it but other crawlers may be less scrupulous.
    {{ DiscussionBoard.errors[7478744].message }}

Trending Topics