URL Harvesting tool

by dracoboar

Posted: 12 years ago 7 replies

Hello,

I have tools that auto bookmark url's that I feed it however as my syndication network grows grabbing these url's manually is taking more time then it is worth.

Can anyone recommend a tool that will harvest urls for me. Ideally I will point it to a site and it will output a list of urls it has not seen before.

Thank you in advance.

#harvesting #tool #url

janeiro82 12 years ago

I'm not quite sure what you are talkig about... maybe "scrapebox" could help?
- Thanks
- 1 reply
Signature
The Bookmark King - 1500+ Social Bookmarks starting @ 15$ - Crush Your Competition Today
{{ DiscussionBoard.errors[5988231].message }}
- dracoboar 12 years ago
  
  Originally Posted by janeiro82
  
  I'm not quite sure what you are talkig about... maybe "scrapebox" could help?
  
  basically I am looking for a program that will crawl a website and output(like to an excel file) urls that is has not seen before.
  
  For instance a few of my articles auto syndicate to a website, i then run the url harvester and it creates an excel file with all the new urls on that website, which I can then feed to my bookmarking tool.
  
  Thanks again
  
  Thanks
  
  {{ DiscussionBoard.errors[5988310].message }}

kingtana1

12 years ago

Scrapebox can harvest all pages of a website. you will want to place the domain name like this into the keywords field: site:domain.com

You would be harvesting the urls from a search engine, set the max connections for that engine to 5, set time out settings to 30 seconds, tick box to use proxies, set results to 1000, click Harvest.

Thanks
1 reply

Signature

Scrapebox Auto Approve List

{{ DiscussionBoard.errors[5995468].message }}

yukon

Banned 12 years ago

Originally Posted by kingtana1

The downside of that (site:domain) is, If Google doesn't know about the pages neither will Scrapebox.

[ 1 ] Thanks
1 reply

{{ DiscussionBoard.errors[5996156].message }}

kingtana1 12 years ago

Originally Posted by yukon

The downside of that (site:domain) is, If Google doesn't know about the pages neither will Scrapebox.

Thank you for clarifying, this is true. I'm sure you would agree that scrapebox is a good tool.

You can use multiple search engine's to harvest with it, to get the most out of it, it helps to learn more about the search operators for a given engine.
- Thanks
Signature
Scrapebox Auto Approve List
{{ DiscussionBoard.errors[5996355].message }}

yukon Banned 12 years ago

I haven't tried this, might help?

Find broken links on your site with Xenu's Link Sleuth (TM)
- Thanks
{{ DiscussionBoard.errors[5996175].message }}
JSProjects 12 years ago

If the site has a xml sitemap Scrapebox can grab all of the URLs from it.
- Thanks
{{ DiscussionBoard.errors[5996382].message }}

URL Harvesting tool

Trending Topics

Any thoughts on golf's latest phenom...Scottie Scheffler??

best high quality crypto traffic.

Some interesting stats about brand comms

What's the Best IM-Related Skill to Learn Today That Can Help in the Future?

Should i Index Author pages?