How to block my site from Indexing on Google/Yahoo/MSN?

21 replies
How to block my site from Indexing on Google/Yahoo/MSN?
Is it possible to do it thru .htaccess?

Please advise. Thx
#block #google or yahoo or msn #indexing #site
  • Profile picture of the author WritingFargo
    Use robots.txt You can create this file from within your Webmasters Tools account.
    {{ DiscussionBoard.errors[5132438].message }}
  • Profile picture of the author PrincessJasmine
    I am not looking to sign up webmaster tools for that site, is there any other suggestion?

    What should i put in the .htaccess file?
    Thanks.
    {{ DiscussionBoard.errors[5132456].message }}
  • Profile picture of the author WritingFargo
    Well just use the following code in your Robots.txt and place the file in website root.

    User-agent: *
    Disallow: /
    {{ DiscussionBoard.errors[5132504].message }}
    • Profile picture of the author PrincessJasmine
      Thanks.
      but will this block direct traffic as well? (or even anchor links from other sites?)
      {{ DiscussionBoard.errors[5132529].message }}
    • Profile picture of the author PrincessJasmine
      Originally Posted by WritingFargo View Post

      Well just use the following code in your Robots.txt and place the file in website root.

      User-agent: *
      Disallow: /
      I know you said only put 'root'
      but suppose i have a few sub-folder, should I put it into them as well?
      Thx
      {{ DiscussionBoard.errors[5145340].message }}
  • Profile picture of the author gamebak
    best thing to do is to use robots.txt. Another method is to check if the visitator is a google robot, and change the content for him (it's called cloaking)
    {{ DiscussionBoard.errors[5133588].message }}
    • Profile picture of the author davidlarson87
      You can also use:
      <META name="robots" content="noindex, nofollow"/>
      {{ DiscussionBoard.errors[5133714].message }}
  • Profile picture of the author scriptkid
    I know you said only put 'root'
    but suppose i have a few sub-folder, should I put it into them as well?
    Thx
    You don't need to add it into all the sub-folders. Just add it to the root folder, that's enough. Same thing applies to all the sub-folders too.

    Thanks.
    but will this block direct traffic as well? (or even anchor links from other sites?)
    No. It doesn't block direct traffic. It only blocks search engines from indexing your pages.
    {{ DiscussionBoard.errors[5145757].message }}
  • Profile picture of the author RobKonrad
    Hey Princess,

    if you don't feel comfortable working on "raw" files in your root directory, use this plugin:

    WordPress › KB Robots.txt « WordPress Plugins

    that will add whatever you want to your robots.txt directly from WITHIN wordpress, without the need to change anything.

    Cheers,
    Rob
    Signature
    ================================================== ===
    This blog is awesome: http://www.robkonrad.com/blog. Read it.
    ================================================== ===
    {{ DiscussionBoard.errors[5148576].message }}
  • Profile picture of the author PrincessJasmine
    UPDATE:

    I made a sub-domain and placed the robots.txt on there.
    I dont want my sub-domain to be crawled/indexed by search engines.

    Since the robot.txt is placed on the sub-domain directory,
    will that affect my 'main domain' as well? I still want my main domain to be crawled and indexed.

    Thanks.
    {{ DiscussionBoard.errors[5235699].message }}
  • Profile picture of the author robby1
    Banned
    [DELETED]
    {{ DiscussionBoard.errors[5235950].message }}
    • Profile picture of the author PrincessJasmine
      Originally Posted by robby1 View Post

      hi,
      just upload robot.txt file in your site and add which part of site you don't want to index in Google and other search engine
      how do I do that?
      right now, i have this on my robot.txt (on my sub-domain directory)


      User-agent: *
      Disallow: /

      Thanks again!
      {{ DiscussionBoard.errors[5236027].message }}
  • Profile picture of the author mrfriend
    In wordpress dashboard goto Settings -> Privacy -> and select "Ask search engines not to index this site." You are done.
    Signature
    {{ DiscussionBoard.errors[5237458].message }}
  • Profile picture of the author PrincessJasmine
    Thanks. I am not using wordpress.

    Well. i thought this will be a simple question...
    All i want to know is that if the robot.txt is on my sub-domain directory,
    will that impact the root directoy as well......

    Thanks again!
    {{ DiscussionBoard.errors[5237481].message }}
    • Profile picture of the author drewhowell21
      If you're wanting to stop Google from crawling and indexing use this:

      Code:
      User-agent: *
      Disallow: /folder1/
      
      User-Agent: Googlebot
      Disallow: /folder1/
      Where folder would be the folder your subdomain is located (followed by a / ).
      {{ DiscussionBoard.errors[5238650].message }}
      • Profile picture of the author Karen Blundell
        Originally Posted by drewhowell21 View Post

        If you're wanting to stop Google from crawling and indexing use this:

        Code:
        User-agent: *
        Disallow: /folder1/
        
        User-Agent: Googlebot
        Disallow: /folder1/
        Where folder would be the folder your subdomain is located (followed by a / ).
        this is the correct way to do it !
        Signature
        ---------------
        {{ DiscussionBoard.errors[6080488].message }}
    • Originally Posted by PrincessJasmine View Post


      Well. i thought this will be a simple question...
      All i want to know is that if the robot.txt is on my sub-domain directory,
      will that impact the root directoy as well......
      No, it will not affect your main webroot if it is in subdomain root.
      {{ DiscussionBoard.errors[5239622].message }}
  • Profile picture of the author rainso0
    It’s actually super simple! First, you create a text file called robots.txt using Notepad or any text editor. Now let’s say you want to block your entire website from being indexed by the search engines, so you would add these lines to your text file: User-agent: *
    Disallow: / The User-agent refers to the robot that is crawling your website, i.e. Google, Yahoo, etc. * means all robots. Note that a robot, such as a spam robot, can ignore your file altogether if it feels like.
    Only use a robots.txt file to block content from being indexed by major search engines, not for hiding information. If someone comes to your website, a robots.txt file will not prevent them from accessing that webpage and viewing it. So just make sure you understand what the file does, it prevents your site from showing up in Google search results pages (Yahoo and MSN also).
    Signature
    {{ DiscussionBoard.errors[5240179].message }}
  • Profile picture of the author christinejones
    Banned
    You can block or remove pages using a robots.txt file. A robots.txt file restricts access to your site by search engine robots that crawl the web.
    {{ DiscussionBoard.errors[6078617].message }}
  • Profile picture of the author Allcityloan
    Use index.txt file for restrict your site from search engine
    {{ DiscussionBoard.errors[6085094].message }}
    • Profile picture of the author saxatwork
      Originally Posted by Allcityloan View Post

      Use index.txt file for restrict your site from search engine
      "Index.txt" ??? I didn't know you could do that... how's that done???
      Signature

      "Be Still Like A Mountain And Flow Like A Great River"

      {{ DiscussionBoard.errors[6089740].message }}
  • Profile picture of the author JayWiz
    Here is the step:
    1. Create robots.txt in your root folder
    2. Copy this code

    User-agent: *
    Disallow: /

    => To disallow for all folders

    User-agent: *
    Disallow: /myfolder1

    => To disallow for myfolder1 only

    User-agent: *
    Disallow: /private_file.html

    => To disallow for specific files

    You can learn more from searching on google.
    {{ DiscussionBoard.errors[6090886].message }}

Trending Topics