Is this even possible...?

by WillR
14 replies
Hi,

I have a competitors website and want to be able to check all the pages they have on their url. A lot of the pages are not indexed in the search engine (so a Google search is not possible) and the pages are also not linked to from other pages or a sitemap.

Are there any sneeky/ninja ways you can find the pages on a url/domain?

I know it's a long shot but thought I'd ask.
  • Profile picture of the author KloudStrife
    Originally Posted by WillR View Post

    Hi,

    I have a competitors website and want to be able to check all the pages they have on their url. A lot of the pages are not indexed in the search engine (so a Google search is not possible) and the pages are also not linked to from other pages or a sitemap.

    Are there any sneeky/ninja ways you can find the pages on a url/domain?

    I know it's a long shot but thought I'd ask.
    The problem could be within the websites code, maybe his SEO methods are off or different from normal websites
    sometimes google won't pick up most pages when theres too much media content or flash involved
    {{ DiscussionBoard.errors[9246260].message }}
    • Profile picture of the author WillR
      Originally Posted by KloudStrife View Post

      The problem could be within the websites code, maybe his SEO methods are off or different from normal websites
      sometimes google won't pick up most pages when theres too much media content or flash involved
      There's no problem. They have pages they don't want indexed. My question is how do I find those pages.
      {{ DiscussionBoard.errors[9246267].message }}
      • Profile picture of the author KloudStrife
        Originally Posted by WillR View Post

        There's no problem. They have pages they don't want indexed. My question is how do I find those pages.
        hmmm good question, perhaps looking into the actual website code through the browser for a direct link to the next page
        {{ DiscussionBoard.errors[9246272].message }}
        • Profile picture of the author WillR
          Originally Posted by KloudStrife View Post

          hmmm good question, perhaps looking into the actual website code through the browser for a direct link to the next page
          Nope, as I said, they are not linked to anywhere.

          It's a long shot. I don't think it's even possible but just thought I would ask as there are some smart cookies out there.
          {{ DiscussionBoard.errors[9246280].message }}
        • Profile picture of the author joseph7384
          Originally Posted by WillR View Post

          There's no problem. They have pages they don't want indexed. My question is how do I find those pages.
          Originally Posted by KloudStrife View Post

          hmmm good question, perhaps looking into the actual website code through the browser for a direct link to the next page

          That won't work, as I myself have off blog pages! I build my own squeeze pages and I do it in an editor and ftp it to a blog's domain.

          I would say it's nearly impossible to know every page if they aren't indexed. The best thing that I can tell you is to get on your competitors lists (as many lists that this guy may have)and and study the sales funnel to see what pages that he sends you to.
          {{ DiscussionBoard.errors[9246281].message }}
  • Profile picture of the author beasty513
    Originally Posted by WillR View Post

    Hi,

    I have a competitors website and want to be able to check all the pages they have on their url. A lot of the pages are not indexed in the search engine (so a Google search is not possible) and the pages are also not linked to from other pages or a sitemap.

    Are there any sneeky/ninja ways you can find the pages on a url/domain?

    I know it's a long shot but thought I'd ask.

    Could be likely that he put a robot.txt in the script to make the search engine's

    spider ignore the site, therefore it doesn't get indexed.


    You can go to Ahrefs or majestic seo, enter the main domain and see what backlink's

    he has been building.
    {{ DiscussionBoard.errors[9246278].message }}
    • Profile picture of the author WillR
      Originally Posted by beasty513 View Post

      Could be likely that he put a robot.txt in the script to make the search engine's

      spider ignore the site, therefore it doesn't get indexed.


      You can go to Ahrefs or majestic seo, enter the main domain and see what backlink's

      he has been building.
      The pages are not indexed on purpose. They are not linked or indexed anywhere. That's my point.
      {{ DiscussionBoard.errors[9246290].message }}
  • Profile picture of the author savidge4
    Will,

    the SNEEKY way? open the robot.txt file and see what they are hiding! DUH

    Originally Posted by WillR View Post

    Hi,

    I have a competitors website and want to be able to check all the pages they have on their url. A lot of the pages are not indexed in the search engine (so a Google search is not possible) and the pages are also not linked to from other pages or a sitemap.

    Are there any sneeky/ninja ways you can find the pages on a url/domain?

    I know it's a long shot but thought I'd ask.
    Signature
    Success is an ACT not an idea
    {{ DiscussionBoard.errors[9246364].message }}
    • Profile picture of the author Dennis Gaskill
      Originally Posted by savidge4 View Post

      Will,

      the SNEEKY way? open the robot.txt file and see what they are hiding! DUH
      If they are trying to hide them it's unlikely they'd put them in their robots file for anyone to see, but you never know, I guess. I never put anything I'm trying to hide in my robots file. That's like advertising them. Plus there's no need to if there are no links to them.
      Signature

      Just when you think you've got it all figured out, someone changes the rules.

      {{ DiscussionBoard.errors[9246428].message }}
      • Profile picture of the author savidge4
        Ah but the OP is asking to find links that are not in the SERPS and don't have page links. It is VERY possible the site he basically wants to hack is USING the pages, but simply has them marked no follow. We all know that doing so for the most part on a page level is fruitless, so in most cases that would be done either in the robots.txt file OR there stands the possibility of some server side blocking in the .htaccess file

        Originally Posted by Dennis Gaskill View Post

        If they are trying to hide them it's unlikely they'd put them in their robots file for anyone to see, but you never know, I guess. I never put anything I'm trying to hide in my robots file. That's like advertising them. Plus there's no need to if there are no links to them.
        Signature
        Success is an ACT not an idea
        {{ DiscussionBoard.errors[9246633].message }}
  • Profile picture of the author yukon
    Banned
    I know you already said the pages aren't linked to each other but I would still run Screaming Frog to see what pops up. SF will show you 500 pages/URLs for free on the trial version.
    {{ DiscussionBoard.errors[9246439].message }}
  • Profile picture of the author tpw
    You can get a php spider tool and simply tell your spider to ignore robots.txt.

    The php spider will crawl the site the same way google does, by looking at the directory tree.
    Signature
    Bill Platt, Oklahoma USA, PlattPublishing.com
    Publish Coloring Books for Profit (WSOTD 7-30-2015)
    {{ DiscussionBoard.errors[9246442].message }}
    • Profile picture of the author Doug9
      Originally Posted by tpw View Post

      The php spider will crawl the site the same way google does, by looking at the directory tree.
      I don't think Google or anything else crawls sites by looking at the directory tree.
      The only way to get to a page on a web server is to supply an exact URL. This is usually provided by a link on the page that a user or Google can see.

      There is no way to see what directories or files are on the server unless this is purposely done by the owner.

      Web servers are specifically built so pages are not knowable unless the owner wants them to be known.
      {{ DiscussionBoard.errors[9248208].message }}
      • Profile picture of the author yukon
        Banned
        Originally Posted by Doug9 View Post

        I don't think Google or anything else crawls sites by looking at the directory tree.
        The only way to get to a page on a web server is to supply an exact URL. This is usually provided by a link on the page that a user or Google can see.

        There is no way to see what directories or files are on the server unless this is purposely done by the owner.

        Web servers are specifically built so pages are not knowable unless the owner wants them to be known.
        Especially considering a large percentage of the web is dynamic CMS pages built on the fly..
        {{ DiscussionBoard.errors[9248228].message }}

Trending Topics