11 replies
  • SEO
  • |
I would like to know that which things of website, an expert SEO must put into robots.txt file for disallow? Which pages is better to not show to Search Engines?
#file #robots.txt #robotstxt
  • Profile picture of the author johnvictor
    Robots.txt files inform search engine spiders how to interact with indexing your content.

    * By default search engines are greedy. They want to index as much high quality information as they can, and they will assume that they can crawl everything unless you tell them otherwise.
    * If you specify data for all bots (*) and data for a specific bot (like GoogleBot) then the specific bot commands will be followed while that engine ignores the global/default bot commands.
    {{ DiscussionBoard.errors[5626261].message }}
  • Profile picture of the author PromoDirect
    The following pages are likely to be in robots.txt

    1. Login page
    2. 404 error page
    3. any other page that you don't want to be get indexed.
    {{ DiscussionBoard.errors[5626378].message }}
  • Profile picture of the author ghazia
    Internal search pages can be blocked from search engines to be crawled. You can use this coding in robots.txt to block search pages.


    Disallow: /*?
    {{ DiscussionBoard.errors[5626518].message }}
  • Profile picture of the author paulgl
    Actually, the best practice is to not have one.

    But, since bots look for them, they will return an error
    if you look at your web stats. The error is moot. But
    webmasters do not like errors, so they panic and think
    they need one. If so, just leave it blank. This is 2012.
    There really is no need to block anything from a normal
    site. Google is smart enough to know what and when to
    show it in SERPs.

    The exception would be if you have some type of a search
    feature that searches your site and gives posted results.
    Then, if you had boatloads of people doing this, these
    results may get indexed needlessly and just be useless bloat.

    Rarely does one actually need a robots.txt with anything in it.

    For fun, I just checked the warriorforum's. They don't even block
    search results. They don't block anything except that sometimes
    pesky custom crawler from 80legs.

    Since the WF blocks nothing from google, I'd take that as a good
    example for what I was talking about.

    Paul
    Signature

    If you were disappointed in your results today, lower your standards tomorrow.

    {{ DiscussionBoard.errors[5626532].message }}
    • Profile picture of the author aygabtu
      If you have any ajax calls to files, the files they call should be disallow. Basically any support files, that a user would not get to or go to from their browser. Yes, this let's bots know about them, but it keeps google, etc from trying to access them or index them.
      Signature

      Check top 300 Google SERP results free. WhatsMySERP.com tracks and graphs changes for multiple domains/keywords/regions. Also includes advanced keyword density tool.

      {{ DiscussionBoard.errors[5626570].message }}
  • Profile picture of the author SASA Techno
    Hi In my Opinion mostly Logon page use every websites user.
    {{ DiscussionBoard.errors[5626612].message }}
  • Profile picture of the author mileagedriver
    disallow all secret folders and files. Thats it done.
    {{ DiscussionBoard.errors[5626847].message }}
  • Profile picture of the author Gunpal5
    disallow folder containing important files which you do not want google to crawl and show when someone searches on web. like thank you page, etc.
    {{ DiscussionBoard.errors[5628494].message }}
  • Profile picture of the author John Conner
    According to me basically set the robots.txt

    User-agent: *
    Disallow:
    Sitemap: http://www.xyz.com/sitemap.xml
    Signature
    TranscriptionServicesIndia.Com (TSI) - Low cost, fast and accurate transcription services for interviews, podcasts, webinars, dictations, etc.
    DataExtractionServices.Com - Scraping data from web directories, WebPages, LinkedIn, Yelp, Yell, Amazon, eBay etc.
    {{ DiscussionBoard.errors[5628533].message }}
  • Profile picture of the author CyborgX
    If you dont want search engines to crawl any web page, then you can use robot.txt on that page code. There is no disadvantage from this code.
    {{ DiscussionBoard.errors[6360866].message }}
  • Profile picture of the author andishm
    You shall disallow any content which you donot want to show to your website visitor such as your admin area files/folders.
    Signature
    Backup.Countryâ„¢ - Automated cloud backups for PC, Laptop & Servers
    Logon to https://backup.country/
    31% Off Coupon code: WORLDBACKUPDAY
    {{ DiscussionBoard.errors[6360883].message }}

Trending Topics