Correct Code in Robots.txt?

5 replies
Hi there,

I deleted all my tag pages many weeks ago but they are still turning up in the serps as 404 pages. By tag pages I mean http://mywebsite.com/tags/any-other-line-here
I am trying to tell the search engines not to index my tag pages by using the robots.txt.

Which of the 2 code lines below is correct? They both seem to work...

User-agent: *
Disallow: /tags/

User-agent: *
Disallow: /tags/*


Thank you
#code #correct #robotstxt
  • Profile picture of the author lovboa
    Banned
    If you deleted them, they will disappear on their own from the serps soon. It takes awhile before it gets updated.

    There is no need to go into your robots.txt and manually tell the spiders not to index them. There's nothing to index. That's why it's a 404. It can take even longer than a month before they're completely removed.

    Don't worry about it.

    The only time you would block it in your robots.txt is it the pages still exist but for some reason you didn't want those pages crawled or indexed.
    {{ DiscussionBoard.errors[6606532].message }}
    • Profile picture of the author RedWaterDub
      When I say many weeks ago, I am talking about at least 12... there are over 2000 of these 404 pages and they have triggered a penalty from G for broken site...(as half the pages 404)... my serps for all good kws have plummeted.. I know this is the reason as no other seo has been performed.

      I am trying to use the request url removal tool in Google webmaster tools for a directory. but it says the pages must be 404 pages and also be blocked by the robots.txt before they will remove these pages and their cached pages from the index.
      So I still need to get the robots.txt code right..

      Does anybody know which line is correct please?
      {{ DiscussionBoard.errors[6606550].message }}
  • Profile picture of the author hilarious89
    Originally Posted by RedWaterDub View Post

    Hi there,

    I deleted all my tag pages many weeks ago but they are still turning up in the serps as 404 pages. By tag pages I mean http://mywebsite.com/tags/any-other-line-here
    I am trying to tell the search engines not to index my tag pages by using the robots.txt.

    Which of the 2 code lines below is correct? They both seem to work...

    User-agent: *
    Disallow: /tags/

    User-agent: *
    Disallow: /tags/*


    Thank you
    Is tags a directory or folder? If yes then

    User-agent: *
    Disallow: /tags/

    this code will work.
    Signature
    My Archive :- A blog where you will get everything updated !
    Get Walkthrough Videos of Newly Released Games from Entertainment Discuss!!
    {{ DiscussionBoard.errors[6606720].message }}
    • Profile picture of the author RedWaterDub
      Originally Posted by hilarious89 View Post

      Is tags a directory or folder? If yes then

      User-agent: *
      Disallow: /tags/

      this code will work.
      I dont know what it is.. it is tags from wordpress.. each tag creates its own page so I am assuming it is a directory but I am not sure, maybe it isn't. could be just a word added to each page..... I cant find a tag folder in my cpanel
      {{ DiscussionBoard.errors[6606752].message }}
  • Profile picture of the author CodeShack
    They are not 'real' pages and folders - they are created on the fly by WP.

    Firstly, if you have a real folder/file that matches the url, then that is delivered.
    Otherwise, if you have a matching category, tag, page, date etc in the database then WP will dynamically generate the content and page and deliver it to your visitor
    If none of the above, then it will deliver a 404

    Doing the robots thing is pointless and will block future tags from being indexed, tags pages can have good SEO value, so don't do that.

    What you need to do is get a plugin for handling 404's - there are many, just go to the wordpress site and search the plugins - some will give a generic page, others will search your content for possible alternative results, others will go off and do a google search and deliver that. Others will hide the fact it's a 404, but thats not advised as it confused google

    The deleted pages will eventually drop out of the serps, but can take a long time, I've had that myself.
    {{ DiscussionBoard.errors[6606870].message }}

Trending Topics