Help me with a script

by 11 replies
13
Hi guys, i need a script that can export the ires div from google. I have this:

Code:
$url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100';
$m= file_get_contents ($url);
$specific_div = 'ires';
preg_match_all('#<ol\s*(?:id|class)\s*=\s*"'.preg_quote($specific_div).'">(.+?)</ol>#is', $m, $match);
print implode("<br>",$match[1]);
I used this a while ago with another script... I tried to make a scraper but it seems that the above code it's not working.

Help me please.
#programming #script
  • Why not first download the web page and put the HTML into a file?

    Then write a separate script to parse that file and extract the data you want.

    This will accomplish 2 things:

    1) Your development goes a lot faster, since you don't have to hit google every time you make changes to your regex.

    2) You can actually see the data which you're searching through and change your regex accordingly (narrowing it down until you get what you need, etc).
  • are you sure the regexp is working? why not give us a sample of the html so we can check?
    • [1] reply
    • Hi, thanks both of you for reply.

      @nmarley, @badwolf: That is what i want to do, to put the html into a file, but above i wrote the actual link for you guys to see the source. So, my html source file is the source from that link.

      I'm just having problems with regex.
  • $url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100';
    $m= file_get_contents ($url);
    preg_match('/<div id=ires><ol>.*<\/ol><\/div>/Usi', $m, $matches);
    print $matches[0];

    please check.
  • @Arbitbet: Thank you sooo much!!! Can you recommend me a site where i can learn more about preg_match... i want to be able to make my own scripts. As you see, i need simple tasks and i don't want to come back where every time i have a problem.
    Thanks again man.
    • [1] reply
  • @Arbitbet: One more question: if i want to get h3 class="r" ? I tried this but don't work.
    preg_match('/<div id=ires><ol><li class=g><h3 class=r>.*<\/h3><\/li><\/ol><\/div>/Usi', $m, $matches);
    • [1] reply
    • This is wrong.

      I would have done so:
      preg_match('/<div id=ires><ol>.*<\/ol><\/div>/Usi', $m, $matches); // scrape div ires
      //for next step you must look, uses var_dump($matches), what you receive from last command
      preg_match_all('/<h3 class="r">.*<\/h3>/Usi', $matches[0], $temp); // scrape h3 tags from div ires, preg_match_all - because div have many h3 tags
      var_dump($temp[0]); // look result

      Divide and Conquer!

      Another way hxxp://php.net/manual/en/book.dom.php you can use DOM model.

      About Regular Expression you can look it hxxp://php.net/manual/en/book.pcre.php or book "Mastering Regular Expressions" Jeffrey Friedl.
      • [1] reply

Next Topics on Trending Feed

  • 13

    Hi guys, i need a script that can export the ires div from google. I have this: Code: $url = 'hxxp://google.com/search?q=site:warriorforum.com&hl=en&num=100'; $m= file_get_contents ($url); $specific_div = 'ires'; preg_match_all('#<ol\s*(?:id|class)\s*=\s*"'.preg_quote($specific_div).'">(.+?)</ol>#is', $m, $match); print implode("<br>",$match[1]); I used this a while ago with another script... I tried to make a scraper but it seems that the above code it's not working.