How to get the full list of URL's of my site in a neat list?

11 replies
Title says it all really. I need to get the full list of URL's of my site in a neat list, so I can build some deep links. Anyone know of a simple tool for doing this ? I've tried simply scrapeboxing my URL but it doesn't give me all....
#full #list #neat #site #url
  • Profile picture of the author carrot
    What platform is your site based on?

    If its static html then you would need some kind of crawler application, that is likely to be more hassle than manually compiling a list.
    {{ DiscussionBoard.errors[2994443].message }}
    • Profile picture of the author Sandra Martinez
      I would probably start by googling

      site:yoursite.com

      and copying everything

      or going to yahoo: http://siteexplorer.search.yahoo.com

      but what you are looking for is a scraper, here there are some:

      Free domain url scraper Download - domain url scraper Files

      haven´t used any of them. Grey/black hat, and some illegal techniques are related to them.

      Sandra
      {{ DiscussionBoard.errors[2994468].message }}
    • Profile picture of the author japhyryder
      Its wordpress Carrot. Amazing how difficult this seemingly simple task is turning out to be! thanks for the help
      Signature
      www.lewesseo.com - 5 St James St, Lewes, BN71HR
      {{ DiscussionBoard.errors[2995864].message }}
      • Profile picture of the author magicmarcus
        you can use wordpress rss feed to get the links... or if you are indexed in google you can do site:yourdomain.com

        google gives a nice neat list if you are indexed
        {{ DiscussionBoard.errors[2995885].message }}
        • Profile picture of the author japhyryder
          thanks magicmarcus - I am indexed and I can see the individiual pages on google. But how to compile this all in a text document is the issue?
          Signature
          www.lewesseo.com - 5 St James St, Lewes, BN71HR
          {{ DiscussionBoard.errors[2995888].message }}
      • Profile picture of the author Dave Rodman
        Banned
        I use optispider. It's super fast and it definitely will capture all of the pages on your page, provided you have at least one link to the page.

        OptiTools Download Page

        It's Leslie Rohde's tool and, by far, the best one out there. Most use Google to search for indexed pages, but it won't give you the complete list of the URLs. For instance, one of my sites has 1000 pages indexed (both using the Site: search and webmaster tools), but will only display the first 250.

        Optispider will give you the complete list of URLs, how many internal links they have, Page title, and anchor text used internally. All which can be exported to a spreadsheet.

        I went through this exercise a few months ago to make sure all my pages were properly indexed. At the time I had restructred my site and had a bunch of pages dropped.

        1) First I used SEO elite to pull a list of all the indexed URLs Google was displaying.

        2) Then I used Optispider and pulled a complete list of URLS for my site.

        3) I put them all in the same column and then used the "Remove Duplicates" feature in Excel. That allows me to get the pages indexed that were currently out of the index.

        4) Then I created a 2nd column by copying the first. I did a Find/Replace so every column had the search query "info:http://www.mysite.com/page.html" so I could check to see if it was indexed (google doesn't display all URLs in the search so it still might be indexed.

        5) I removed the ones that WERE indexed. And then developed a plan for getting the remaining ones indexed. About a month later, I had 800 additional pages indexed.
        {{ DiscussionBoard.errors[2995914].message }}
  • Profile picture of the author carrot
    If just scraping site:yourdomain.com isnt working for you.

    Install xml sitemap wordpress plugin, then use scrapebox addon 'sitemap scraper' to get all your urls.
    {{ DiscussionBoard.errors[2995913].message }}
  • Profile picture of the author japhyryder
    Dave that is awesome. Much appreciate this thoughtful reply.
    Signature
    www.lewesseo.com - 5 St James St, Lewes, BN71HR
    {{ DiscussionBoard.errors[2995931].message }}
  • Profile picture of the author tee_emm
    Can this also be done with a site built using Interspire Shopping Cart?
    {{ DiscussionBoard.errors[2995941].message }}
  • Profile picture of the author Andyhenry
    Here's a video I made about RSS that also shows how to get all your URLs - no matter how your site is made:
    YouTube - Create_RSS_Feed
    Signature

    nothing to see here.

    {{ DiscussionBoard.errors[2995972].message }}
  • Profile picture of the author HorseStall
    MOst Sitemap Generators will create a text file that lists all of the websites URLs.
    {{ DiscussionBoard.errors[2996590].message }}

Trending Topics