Robots.txt collector/scanner

1 replies
I tried to find some software that would scan 500+ webpages robots.txt to show me these all together or something like this. I even found some perl code for robot.txt collector to list robots.txt by sites, but I don't know perl. Basically I need to find out witch pages are using Disallow: /. Somebody know good trick/software to do this?

PERL CODE:

use Socket;

@sites = qw( URL );

foreach $site (@sites) {

# resolve host name
$iaddr = inet_aton($site);
unless ($iaddr) {
print "Can't resolve $iaddr\n";
next;
}
$paddr = sockaddr_in(80,$iaddr);
$proto = getprotobyname("tcp");

socket (SOCK, PF_INET, SOCK_STREAM, $proto) ;
connect (SOCK, $paddr) ;

select SOCK;
$|=1;
print "GET /robots.txt HTTP/1.1\r\nhost: $site\r\n\r\n";
@response = <SOCK>;
$|=0;
select STDOUT;

print "Response from $site was ",@response+0," lines\n";

open (FH,">$site.robo");
print FH @response;

}
#collector or scanner #robotstxt
  • Profile picture of the author ericsouthga
    Oh yeah, I wrote some code to do that awhile ago, you can contact me via PM to discuss. My code was looking to collect what was in the TXT file for purposes of security assessments and such.
    {{ DiscussionBoard.errors[5071543].message }}

Trending Topics