Blocked urls in Google Webmaster Tools

4 replies
  • SEO
  • |
Hello Warriors,

It has been a challenge to work on dynamic website. I am happy to get so many problems because that way I can expertise myself in the internet marketing field.

I have robots.txt file updated and uploaded to my web server couple of days ago.
In Google webmaster tools
Crawl >> Blocked URLs I can see those blocked urls

User-agent: *
Disallow: /load_cal_ajax/
Disallow: http://prod.example.com
Disallow: http://example.com/search/get_cat_by_city/

I am trying to deindex these pages/ urls from Search.

Blocked URLs shows 0 ?

in robots.txt file I have blocked some urls but it is still indexing and 0 blocked urls showing int GWT.

Can someone tell me what is issue here.

Thanks in advance!!
#blocked #google #tools #urls #webmaster
  • Profile picture of the author yukon
    Banned
    For the sub-domain create a new robots.txt file & put that file in the root folder of the sub-domain. A sub-domain is basically a different site than the root domain.

    This should block all bots from the entire site.
    Code:
    User-agent: *
    Disallow: /
    As a secondary backup, add a noindex tag in the site header that will show on every single page of the site, assumes you want to noindex the entire site.

    Code:
    <meta name="robots" content="noindex">
    You can use that noindex meta tag on individual web pages If needed, for instance If you didn't want to block every single page on the site, but you couldn't use the robots.txt code above because it's geared for blocking the entire site (all pages on the site).

    Make sure your adding the robots.txt file to the root directory of the site and/or sub-domain, that's the location bots are looking at to see what's in a robots.txt file, or If a robots.txt file even exist on the domain/sub-domain. Test your robots.txt file by pasting your URL in your browser (http://domain.com/robots.txt) If you can't see the robots.txt file load in your browser then no bots will find it. That would also mean you've placed your robots.txt file in a folder besides the root directory (fix it).

    Keep in mind it takes time to get Google to return to the pages your trying to noindex. Google won't remove a web page from the SERPs until after they've returned to your site/page & found a noindex tag or robots.txt file blocking Googlebot.

    You can help speed up the process by blocking Googlebot (robots.txt & sitewide noindex meta tag), then after your site is ready, build an xml sitemap & submit that sitemap to WMT. That way your trying to get Googlbot to return to your site ASAP & they'll see the pages are now blocked which in turn will hurry up the process of removing your sites pages from Google SERPs.

    Recap, in this order:
    1. Block/noindex pages.
    2. Build/submit xml sitemap to WMT.

    Keep an eye on your Google cache dates to get an idea how often Googlebot is visiting your pages. Doing a site:domain.com search every couple of days will show the number of indexed pages & how many pages are being removed from the SERPs.

    Don't be impatient, it takes time to remove a lot of pages from large sites. Also keep in mind Google has hundreds of data centers that are not synced, it might take longer to remove pages from country specific SERPs.

    Make sure you really want pages noindexed because once it happens it's not always as easy to get those pages reindexed. Even If you get them reindexed they might not bounce back to the previous SERP positions.
    {{ DiscussionBoard.errors[8935046].message }}
  • Profile picture of the author jhakasseo
    Thanks for your reply.
    As you can see I have already mentioned that I already have robots.txt file up on the directory for main domain (Not on sub-domain). I dont see any reason insipite of blocking urls in Robots.txt it is still indexed in Google. I did it 2 months ago. I have sitemap on website as well.

    Also, how can I use noindex tag to the directory
    If I want to block specific directory I have to put this into robots.txt as per my knowledge.
    Signature
    Real Youtube & Instagram Promotion Click Here
    {{ DiscussionBoard.errors[8935068].message }}
    • Profile picture of the author paulgl
      Subdomain robots.txt, and, change it to simply:

      User-agent: *
      Disallow: /search

      That might do it.

      Thing is, nothing is 100% foolproof.

      Google would like you to tell it to ignore search results, as you would
      be doing them a favor. But you don't need to.

      Paul
      Signature

      If you were disappointed in your results today, lower your standards tomorrow.

      {{ DiscussionBoard.errors[8936331].message }}
      • Profile picture of the author jhakasseo
        Originally Posted by paulgl View Post

        Subdomain robots.txt, and, change it to simply:

        User-agent: *
        Disallow: /search

        That might do it.

        Thing is, nothing is 100% foolproof.

        Google would like you to tell it to ignore search results, as you would
        be doing them a favor. But you don't need to.

        Paul
        Inspite of doing this still google is indexing urls and that directory.
        Is this issue if server? website is hosted in apache. Is there different robots for this?
        Signature
        Real Youtube & Instagram Promotion Click Here
        {{ DiscussionBoard.errors[8937478].message }}

Trending Topics