robot.txt file...need help

8 replies
  • WEB DESIGN
  • |
Warriors,
I don't want a new site of mine to be indexed by any bot....what is the correct syntax in the robot.txt file to not allow any page on my site or anything be indexed?

Thanks

Slap
#fileneed #robottxt
  • Profile picture of the author askdanyal
    This one will do it

    User-agent: *
    Disallow: /
    Signature

    \

    {{ DiscussionBoard.errors[9168802].message }}
    • Profile picture of the author slappytheking
      So all I need is just those 2 lines and put into robot.txt and done?
      {{ DiscussionBoard.errors[9168812].message }}
  • Profile picture of the author askdanyal
    yes, this will block all search engines from indexing you
    Signature

    \

    {{ DiscussionBoard.errors[9168821].message }}
  • Profile picture of the author sheenaroy
    User-agent: *
    Disallow: /

    It prevents the search engines from indexing any pages or files on the website.
    {{ DiscussionBoard.errors[9169531].message }}
  • Profile picture of the author logoonlinepros1
    I hope this will help you all.
    User-agent: *
    Disallow: /

    <html>
    <head>
    <title>...</title>
    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    </head>
    Signature
    {{ DiscussionBoard.errors[9169934].message }}
  • Profile picture of the author Webbro
    Yes and the robots.txt file should be positioned in the root directory of the server Just incase you was not sure.
    {{ DiscussionBoard.errors[9174178].message }}
    • Profile picture of the author dgmufasa
      Hello there,

      I had a similar question today with robot.txt (what a coincidence)

      Originally Posted by Webbro View Post

      Yes and the robots.txt file should be positioned in the root directory of the server Just incase you was not sure.
      I have a domain (ex: mydomain.com).

      I have also created subdomains (ex: test1.mydomain.com, test2.mydomain.com)

      directory:/.../public_html/test and
      directory:/.../public_html/test2

      I have installed Wordpress on the subdomains - so -for robot.txt, would it go into /public_html/test1 and /public_html/test2 - or -
      would they go into:
      /.../public_html

      TIA
      {{ DiscussionBoard.errors[9174404].message }}
  • Profile picture of the author RobinInTexas
    Just remember that there are hundreds or robots that ignore robots.txt


    Here's a snippet from my .htaccess file that takes care of some of them and other traffic that I don't think is helpful:

    If they are on the list, they don't get to see any of the site including the robots.txt file.

    The second line below blocks a large range of ip addresses in China. I have almost 100 lines of ip ranges that I block based on identified web crawlers.

    Code:
    RewriteEngine On
    RewriteCond %{REMOTE_ADDR} ^110.(8[0-7]). [OR]
    RewriteCond %{HTTP_USER_AGENT} ^$  [OR]
    RewriteCond %{HTTP_USER_AGENT} _Bot/ [OR]
    RewriteCond %{HTTP_USER_AGENT} Baidu [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} feedparser  [NC,OR] 
    RewriteCond %{HTTP_USER_AGENT} linkdex  [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} magpie [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^php [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} YYSpider [NC]
    RewriteRule .* http://%{REMOTE_ADDR} [L,R=403]
    Signature

    Robin



    ...Even if you're on the right track, you'll get run over if you just set there.
    {{ DiscussionBoard.errors[9175511].message }}

Trending Topics