List all URLs in a site

9 replies
How can I extract all the usable links from a site. I have a website with about 150 posts on it. What is the easiest way to get all of the URLs from the site, but only the post/category URLs. I used GSA Search Engine Ranker and wound up with a bunch of urls for tags and weird stuff.

How do I only extract the URLs that go to posts?
Thank you for your consideration.
#list #site #urls
  • Profile picture of the author Tim3
    Not sure about posts only but I think there is a Firefox add-on that will do this, unfortunately the name escapes me at moment, perhaps someone else will know.

    However the Wordpress Broken Links Checker plug-in will give you a list of all your site URL,s. I think you have to give a while to collate after you activate it. :-)
    Signature

    {{ DiscussionBoard.errors[8571234].message }}
  • Profile picture of the author ItWasLuck3
    Sounds like you're looking for a way to generate a sitemap.

    You mentioned posts... If you're using Wordpress, I have no doubt that you can grab a sitemap plugin that allows you to select which links get include/ excluded when generating said file.
    {{ DiscussionBoard.errors[8571236].message }}
  • Profile picture of the author webzie
    Write a spider which reads in every html from disk and outputs every "href" attribute of an "a" element (can be done with a parser). Keep in mind which links belong to a certain page (this is common task for a MultiMap datastructre). After this you can produce a mapping file which acts as the input for the 404 handler.
    {{ DiscussionBoard.errors[8571443].message }}
  • Profile picture of the author Microsys
    You use a site crawler tool to crawl your website. If you only want specific patterns of URLs (like posts) you could use exclude filters or limit to filters. Many solutions such as e.g. mine (A1 Sitemap Generator / A1 Website Analyzer) can do what you want.
    {{ DiscussionBoard.errors[8602698].message }}
  • Profile picture of the author jeffreyhuan
    Use WordPress › Google XML Sitemaps « WordPress Plugins to generate a sitemap for your site.

    You can choose to include only your posts in the sitemap and get your link list like this one.

    http://www.imgoldbox.com/sitemap.xml
    {{ DiscussionBoard.errors[8602802].message }}
  • Profile picture of the author Beatinest
    You can google site:sitename.com and get all of the urls google has indexed. The only down side is you'll have to copy/paste each link you want.

    Not exactly sure how to do it in bulk so you can download the list unless you use a sitemap. Yoast, for example has a great way of presenting the urls in a sitemap.
    Signature
    {{ DiscussionBoard.errors[8602818].message }}
  • Profile picture of the author MrMonetize
    You can use Scrapebox if you have it to extract all the indexed links from Google. Just use the correct operator like this, replacing it with your own domain name -

    site:mydomain.com

    If you don't have Scrapebox, simply use the above operator in Google, then click the cog on the right, then choose 'search settings', turn off instant results, then drag the slider to the right to display 100 SERP results instead of 10.

    Then use this plugin for Firefox - https://addons.mozilla.org/en-us/fir.../SnapLinksPlus

    You simply drag a square over all the SERP results and it will select all the links and give you the option what to do with them. Copy to clipboard and then paste them into Notepad. Ive just done a 150+ page site in less than 1 min.

    Good luck.
    {{ DiscussionBoard.errors[8603133].message }}
  • Profile picture of the author Tim3
    Bump on this old post because I found just what the OP wanted.
    This plug-in is a real nugget, ALL your Wordpress post and/or page URL's instantly.

    WordPress › List all URLs « WordPress Plugins
    Signature

    {{ DiscussionBoard.errors[8770131].message }}

Trending Topics