Help me with a script

by banel Posted: 15 years ago 11 replies

Hi guys, i need a script that can export the ires div from google. I have this:

Code:

$url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100';
$m= file_get_contents ($url);
$specific_div = 'ires';
preg_match_all('#<ol\s*(?:id|class)\s*=\s*"'.preg_quote($specific_div).'">(.+?)</ol>#is', $m, $match);
print implode("<br>",$match[1]);

I used this a while ago with another script... I tried to make a scraper but it seems that the above code it's not working.

Help me please.

#programming #script

nmarley 15 years ago

Why not first download the web page and put the HTML into a file?

Then write a separate script to parse that file and extract the data you want.

This will accomplish 2 things:

1) Your development goes a lot faster, since you don't have to hit google every time you make changes to your regex.

2) You can actually see the data which you're searching through and change your regex accordingly (narrowing it down until you get what you need, etc).
badwolf 15 years ago

are you sure the regexp is working? why not give us a sample of the html so we can check?
- [1] reply
- banel 15 years ago
  
  Hi, thanks both of you for reply.
  
  @nmarley, @badwolf: That is what i want to do, to put the html into a file, but above i wrote the actual link for you guys to see the source. So, my html source file is the source from that link.
  
  I'm just having problems with regex.
Arbitbet 15 years ago

$url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100';
$m= file_get_contents ($url);
preg_match('/<div id=ires><ol>.*<\/ol><\/div>/Usi', $m, $matches);
print $matches[0];

please check.
banel 15 years ago

@Arbitbet: Thank you sooo much!!! Can you recommend me a site where i can learn more about preg_match... i want to be able to make my own scripts. As you see, i need simple tasks and i don't want to come back where every time i have a problem.
Thanks again man.
- [1] reply
- zapseo 15 years ago
  
  try PHP: Hypertext Preprocessor
banel 15 years ago

@Arbitbet: One more question: if i want to get h3 class="r" ? I tried this but don't work.
preg_match('/<div id=ires><ol><li class=g><h3 class=r>.*<\/h3><\/li><\/ol><\/div>/Usi', $m, $matches);
- [1] reply
- Arbitbet 15 years ago
  
  This is wrong.
  
  I would have done so:
  preg_match('/<div id=ires><ol>.*<\/ol><\/div>/Usi', $m, $matches); // scrape div ires
  //for next step you must look, uses var_dump($matches), what you receive from last command
  preg_match_all('/<h3 class="r">.*<\/h3>/Usi', $matches[0], $temp); // scrape h3 tags from div ires, preg_match_all - because div have many h3 tags
  var_dump($temp[0]); // look result
  
  Divide and Conquer!
  
  Another way hxxp://php.net/manual/en/book.dom.php you can use DOM model.
  
  About Regular Expression you can look it hxxp://php.net/manual/en/book.pcre.php or book "Mastering Regular Expressions" Jeffrey Friedl.
  
  [1] reply

Join the discussion

Next Topics on Trending Feed

13

Help me with a script

banel • 15 years ago • 11 replies

Hi guys, i need a script that can export the ires div from google. I have this: Code: $url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100'; $m= file_get_contents ($url); $specific_div = 'ires'; preg_match_all('#<ol\s*(?:id|class)\s*=\s*"'.preg_quote($specific_div).'">(.+?)</ol>#is', $m, $match); print implode("<br>",$match[1]); I used this a while ago with another script... I tried to make a scraper but it seems that the above code it's not working.