Need help regarding Robots.txt Files & Link Crawling

2 replies
  • SEO
  • |
Hi there, I m working on a website which is https://rated-builders.com
And I m doing some on-page related tasks and inclusion of robots.txt is one of them.

I saw that there are so many unnecessary URL's has been crawled on Google when I was checking that how much pages have been crawled. and I found following types of irrelevant links which are:
https://rated-builders.com/emails/
https://rated-builders.com/service/index.php
https://rated-builders.com/service/login.php
https://rated-builders.com/service/create-job.php
https://rated-builders.com/emails/Ne...ewsletter.html
https://rated-builders.com/emails/Receipt/receipt.html
https://rated-builders.com/emails/St...tationery.html
https://rated-builders.com/emails/Pr...ouncement.html
https://rated-builders.com/emails/Si...ouncement.html
https://rated-builders.com/emails/Pr...0-%20Copy.html
https://rated-builders.com/emails/Pr...0Announcement/
https://rated-builders.com/emails/Newsletter/
https://rated-builders.com/emails/Receipt/
https://rated-builders.com/emails/Si...0Announcement/
https://rated-builders.com/emails/Stationery/

So I wanted to disallow all these above links from the search engine. So can anyone tell me what syntax I could have to use in robots.txt file for disallowing all the above irrelevant links from search engines. Please show me the exact syntax if someone can know how can we disallow all that irrelevant links from the search engines.

I used following syntax in my robots.txt file:

User-agent: *

Disallow:

Disallow:/emails/

Disallow:/login.php
Is that wrong syntax I used?
Please help me...!!!
#crawling #files #link #robotstxt
  • Profile picture of the author daniel27lt
    It seems you have set up the robots.txt file correctly. But I noticed if you're wanting to disallow subdirectories then you should do this...

    User-agent: *
    Disallow: /service/index.php
    Disallow: /service/login.php
    Disallow: /service/create-job.php
    Disallow: /emails/Newsletter/newsletter.html
    Disallow: /emails/Receipt/receipt.html
    Disallow: /emails/Stationery/stationery.html
    Disallow: /%20Announcement/product-announcement.html
    Disallow: /emails/Simple%20Announcement/simple-announcement.html
    Disallow: /emails/Product%20Announcement/product-announcement%20-%20Copy.html
    Disallow: /emails/Product%20Announcement/
    Disallow: /emails/Newsletter/
    Disallow: /emails/Receipt/
    Disallow: /emails/Simple%20Announcement/
    Disallow: /emails/Stationery/

    Also, if you can you should correct the spaces within your URL's. For example, the %20 is showing an empty space.

    Also, you're wanting to submit this robots.txt file to search engines once completed. Then what you should be doing to remove these from Google within your webmaster's account, telling Google you're not wanting them indexed and requesting removal of them. But if you don't have them within your robots.txt file, they will keep getting indexed. So it's important.

    Take a look at the following screenshot I provided. This is where you can tell Google to remove indexed URL's.

    Screenshot by Lightshot

    I hope this has helped.
    Signature
    Download Free PLR Products to give away to build your list. Find all the most up-to-date PLR on the market.
    {{ DiscussionBoard.errors[11078641].message }}
  • Profile picture of the author altonroot
    You can use this simple robots.txt builder tool. Its pretty easy to use.

    Robots.txt Generator
    {{ DiscussionBoard.errors[11079400].message }}

Trending Topics