How to fetch data from website using php?

14 years ago

I am unsure of what you really want here, if you want to simple fetch the data and post to your site or if you want to simple clone the entire site or if u want to make a domain with some system/CMS behind and fetch the content from another site.

Could you be a bit more specific and perhaps give us an example.

To put it simple if you only want to fetch data for example you can use a php + dom to extract the wanted data and repost to your site, you could use a tool like httracker to clone an entire website without any CMS or system behind it, just html, you could use a CMS and copy the data or fed it automaticaly.

Thanks

{{ DiscussionBoard.errors[7288657].message }}

jeet020

14 years ago

Hi,
Thanks for the reply.

I want to fetch data from other site to my site like we generally use rss feed to get data, by using rss feed we can post data to our site. I am using this in few of my wordpress sites. I am using plugin to fetch data via rss feed and whenever it gets updated on the main site , It will also update the data on my site also. So, its become complete auto pilot site.

Like above I want to fetch data using php function or script not using rss feed. And whenever data updated on main site the script will update the data on my site too.

Here an example I found what exactly I want to do on my site:

Example:

Main site:
3gpmobilemovies.com

Clone site (Full copied site from above site):
3gparena.in

Please help me if you can do this.
Thanks and regards..

Thanks

{{ DiscussionBoard.errors[7290415].message }}

SteveSRS

14 years ago

This is called web scraping... and might or might not be illegal so be careful.

Secondly it is not the most easiest topic.. you can do it 'stupid' with file_get_contents() in php.
But better is using CURL (just google) and then when you have site data you're going to have to use the DOM model to analyze and pick out the parts you want... this is hardest part. If you aren't pretty advanced in PHP (or other language) I would recommend hiring somebody.

Thanks

Signature

Recurring Proven SEO Business Model Rent Out Your (Local) Sites and YT vids

Give you AD Funnel Super Power!
ADConnect.io - Lower CPC. Sky-Rocket Conversions!!

{{ DiscussionBoard.errors[7291570].message }}

cgimaster

14 years ago

This plugin might work for you but the ideal would make your own php using cURL and DOM to fetch the wanted data and post back.

WordPress › WP Web Scraper « WordPress Plugins

Thanks

{{ DiscussionBoard.errors[7291814].message }}

WebThinker

14 years ago

Yes, I agree with Steve, this is far more complicated than a simple command we can help you with. They are many things to consider (in some cases you might be even forced to spoof IP addresses, different user agents, etc, so your script doesn't get blocked).

So my advice is also: hire somebody who will write such a script for you... exactly custom tailored to your needs.

Thanks
1 reply

Signature

SmartBetting is a CONCEPT for a fully automatized sports betting prediction system with a continuously learning and auto-correction algorithm that will make the predictions more and more accurate.

{{ DiscussionBoard.errors[7292577].message }}

jeet020

14 years ago

Originally Posted by SteveSRS

This is called web scraping... and might or might not be illegal so be careful.

Secondly it is not the most easiest topic.. you can do it 'stupid' with file_get_contents() in php.
But better is using CURL (just google) and then when you have site data you're going to have to use the DOM model to analyze and pick out the parts you want... this is hardest part. If you aren't pretty advanced in PHP (or other language) I would recommend hiring somebody.

Thanks for the help..

Originally Posted by WebThinker

Yes, I agree with Steve, this is far more complicated than a simple command we can help you with. They are many things to consider (in some cases you might be even forced to spoof IP addresses, different user agents, etc, so your script doesn't get blocked).

So my advice is also: hire somebody who will write such a script for you... exactly custom tailored to your needs.

Okey.. Can you suggest me someone who can make this script for me??

Thanks all of you for your help

Thanks
1 reply

{{ DiscussionBoard.errors[7296557].message }}

WebThinker

14 years ago

Originally Posted by jeet020

Thanks for the help..

Okey.. Can you suggest me someone who can make this script for me??

Thanks all of you for your help

Yes: me

Please PM me exactly which site do you want to scrape and I will get back to you with details (an estimation of how long it will take and how much).

Thanks.

Thanks
1 reply

Signature

SmartBetting is a CONCEPT for a fully automatized sports betting prediction system with a continuously learning and auto-correction algorithm that will make the predictions more and more accurate.

{{ DiscussionBoard.errors[7297570].message }}

Brandon Tanner

14 years ago

Originally Posted by WebThinker

Yes: me

Please PM me exactly which site do you want to scrape and I will get back to you with details (an estimation of how long it will take and how much).

Thanks.

So you're offering to create a web scraping script for someone, yet in your sig you sell a product to prevent web scrapers?

Ever heard of the term "conflict of interest"? lol

Thanks
1 reply

Signature

{{ DiscussionBoard.errors[7298825].message }}

GamezFever

14 years ago

Originally Posted by Brandon Tanner

So you're offering to create a web scraping script for someone, yet in your sig you sell a product to prevent web scrapers?

Ever heard of the term "conflict of interest"? lol

HAHAHA wow.

Thanks
1 reply

{{ DiscussionBoard.errors[7299501].message }}

FirstSocialApps 14 years ago

This is actually fairly easy, but scrapers are notorious for braking.

Load the page with CURL, use REGEX to pull out what you want. If you dont know what both of these things are then you will have to read up some before you give it a go.
- Thanks
Signature
Create Facebook deals, sweepstakes, contests, polls, custom pages and more without a monthly fee!

Generate thousands of leads in any niche in under 60 seconds!
{{ DiscussionBoard.errors[7301146].message }}

WebThinker

14 years ago

Touche Brandon!

Haven't even thought about it his way

)

Thanks

Signature

SmartBetting is a CONCEPT for a fully automatized sports betting prediction system with a continuously learning and auto-correction algorithm that will make the predictions more and more accurate.

{{ DiscussionBoard.errors[7298890].message }}

cgimaster

14 years ago

Do not use REGEX to parse HTML:

html - RegEx match open tags except XHTML self-contained tags - Stack Overflow

Instead use a proper HTML Parser available within the programming/scripting language you are using.

For instance php you have DOM, c# you have HTML Agility Pack and so on.

Thanks
1 reply

{{ DiscussionBoard.errors[7304483].message }}

lordspace

14 years ago

Originally Posted by cgimaster

Do not use REGEX to parse HTML:

html - RegEx match open tags except XHTML self-contained tags - Stack Overflow

Instead use a proper HTML Parser available within the programming/scripting language you are using.

For instance php you have DOM, c# you have HTML Agility Pack and so on.

if HTML is not well structured the parsing can break. With regex you can focus on a few tags with random attributes and that's it

Thanks

Signature

Are you using WordPress? Have you tried qSandbox yet?

{{ DiscussionBoard.errors[7307833].message }}

SteveSRS

14 years ago

@seodude, that post has no value AT ALL in this topic..

@FirstSocialApps
yes if you use Regex for this you can be sure it will break.. probably before even using it for the first time

@Brandon

LOL nice one for catching that one

but on the other hand if he can create a system to prevent them from scraping (which isn't that hard to do btw, but might be a little bad for SEO also) chances are he knows how to make them...

I've done many of those however don't have the time to help you.. just make sure it is getting done with a good DOM structure analyzer which is stable and doesn't take too much memory of your servers (it is a pretty CPU intensive job!). Simple HTML DOM (google it) is a very good one to start building a scraper with.

Thanks
1 reply

Signature

Recurring Proven SEO Business Model Rent Out Your (Local) Sites and YT vids

Give you AD Funnel Super Power!
ADConnect.io - Lower CPC. Sky-Rocket Conversions!!

{{ DiscussionBoard.errors[7316821].message }}

FirstSocialApps

14 years ago

Originally Posted by SteveSRS

@FirstSocialApps
yes if you use Regex for this you can be sure it will break.. probably before even using it for the first time

Steve. with respect that comment makes no sense at all, if it breaks before the first time, then the script was never done, so hence it cant break since it was never done to break. See the paradox in your statement

REGEX is fine depending on what you want to pull out of the content. For example pulling out the title would work fine and is not likely to break.

Since the OP did not specify what he wants from the content, and is obviously new to this type of work I provided him with the simplest solution.

Thanks
1 reply

Signature

Create Facebook deals, sweepstakes, contests, polls, custom pages and more without a monthly fee!

Generate thousands of leads in any niche in under 60 seconds!

{{ DiscussionBoard.errors[7334487].message }}

cgimaster

14 years ago

Originally Posted by FirstSocialApps

Steve. with respect that comment makes no sense at all, if it breaks before the first time, then the script was never done, so hence it cant break since it was never done to break. See the paradox in your statement

REGEX is fine depending on what you want to pull out of the content. For example pulling out the title would work fine and is not likely to break.

Since the OP did not specify what he wants from the content, and is obviously new to this type of work I provided him with the simplest solution.

I dont see why you would want to re-invent the wheel when there are plenty libraries for most scripting/programming languages out there that will allow you to parse html without a bogus regex.

Regex is awesome its just not worth to be used for this anymore, it was a common thing 10+ years ago.

There is plenty of reason out there of why not to use regex on html over a parsing html library not limited but including the fact you have to create each rule, and test while the libraries have been developed for years and have sustained several tests and fixes along the years.

Thanks
1 reply

{{ DiscussionBoard.errors[7334695].message }}

FirstSocialApps

14 years ago

Originally Posted by cgimaster

I dont see why you would want to re-invent the wheel when there are plenty libraries for most scripting/programming languages out there that will allow you to parse html without a bogus regex.

I dont know why every thing has to be such a big argument on this forum. Its almost as if everyone is more concerned with proving that there way is better then with actually helping the person who asks the question.

Its obviously not 'reinventing the wheel' when as you just said doing this with REGEX is a very old method. Also as I said if you want to pull something very simple its not a bad way to go. Fast simple and very little code.

I have all ready explained why I have chosen to tell the OP to do it this way, because it is the most simple method and he is obviously new to programming. Either you didnt read my posts , didnt understand them, or just ignored it.

Im not going to argue over which is 'better' as better is a subjective term. Is better the way that gives the fastest execution, that requires the least lines of code, that is most reliable, that is most self contained, that is most ... on and on and on.

In practice a programmer must assign values to each of these things and weigh his options. In my answer to the OP I assigned maximum value to the most simple to understand option. It makes no sense to tell a newb to parse the DOM with a library he has never heard of when he has only a basic understanding of what DOM is and has never used a 3rd party library.

Thanks

Signature

Create Facebook deals, sweepstakes, contests, polls, custom pages and more without a monthly fee!

Generate thousands of leads in any niche in under 60 seconds!

{{ DiscussionBoard.errors[7340785].message }}

fortsolution

14 years ago

by getting html data from pages

Thanks

{{ DiscussionBoard.errors[7331213].message }}

seowonder56

14 years ago

Do you want to grab data or content from the site then you should use CURL library to get it and you can have the content in the file without any problem

Thanks

{{ DiscussionBoard.errors[7340299].message }}

cgimaster

14 years ago

because it is the most simple method and he is obviously new to programming

More reason not to recommend REGEX, not only it can introduce a lot of bugs depending on the regex that you will have to rework but REGEX is also not noob-friendly.

Thanks

{{ DiscussionBoard.errors[7340932].message }}

FirstSocialApps

14 years ago

I guess that depends on the noob, if using REGEX is more simple than understanding the DOM and an unknown library.

Thanks

Signature

Create Facebook deals, sweepstakes, contests, polls, custom pages and more without a monthly fee!

Generate thousands of leads in any niche in under 60 seconds!

{{ DiscussionBoard.errors[7340996].message }}

SteveSRS

14 years ago

@First SocialApps

I have to agree with cgimaster.. there are online tools to make regex a bit easier but it is just a pain in the ass. And if you compare it with Simple_HTML_DOM class using the documentation you can do simple tasks just in a few simple lines (copy / paste) which is a lot easier then regex which breaks super easy and fast. That was also the argument I made with my 'paradox statement'.. also I put a

smiley there, I did that because I was jokingly saying that to make a point on how much using 'regex' for that tasks sucks.

[ 1 ] Thanks
1 reply

Signature

Recurring Proven SEO Business Model Rent Out Your (Local) Sites and YT vids

Give you AD Funnel Super Power!
ADConnect.io - Lower CPC. Sky-Rocket Conversions!!

{{ DiscussionBoard.errors[7341203].message }}

FirstSocialApps 14 years ago

Ok I concede Simple_HTML_DOM is more easy. And I was being a smart #$$ when I said about the 'paradox statement' thats why I put the after it
- [ 3 ] Thanks
Signature
Create Facebook deals, sweepstakes, contests, polls, custom pages and more without a monthly fee!

Generate thousands of leads in any niche in under 60 seconds!
{{ DiscussionBoard.errors[7342998].message }}

wizwebtechno

14 years ago

If you want to fetch data using php then you should using client url library which is called CURL in php, it might get you content of the site

Thanks

{{ DiscussionBoard.errors[7353892].message }}

wayfarer

14 years ago

I prefer splitting on a delimiter in many cases, but regular expressions can save you a huge amount of work if you understand them. Sometimes they are the right choice, like when you need to know boundaries.

Thanks

Signature

I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs

{{ DiscussionBoard.errors[7354227].message }}

How to fetch data from website using php?

Trending Topics

I am new here

What Picks You Up?

What I have Learned Marketing a Business Where Trust Matters More Than Price

An AI Ethics Question

An introduction