What Language Are Bots Written In and How to Learn it?

15 replies
So I am not completely new to programming. I've taken about 5 college level courses on C++, programing all basic datastructures and a small database/ file processing.

I also know some PHP, like I design a small user authentication site etc.

But I am a bit lost when it comes to bots, and other web automation programing.

Do you know how and where I can read up on this stuff? A good book, a good online tutorial? What language is commonly used for it? etc.

Thanks!
#bots #language #learn #written
  • Profile picture of the author jaywilsonjr
    Most commonly scrappers/bot are written in PHP/CURL however you can use other languages. If you are a complete noob (we all start that way) I highly recommend the book: Webbots, Spiders, and screenscrapers by Michael Schrenk...

    It is a excellent starting point for any developer interested in learning how to create bots. Think of this book as the starting point of the rabbit hole It is the BEST published book on the subject I have read (and I have several)...

    Cheers!


    Jay
    Signature
    Got Tech Problems? PM me for quick help!

    Need a psd converted into a website? --> Check out my new offer | 1 FREE review copy left!
    {{ DiscussionBoard.errors[2509158].message }}
  • Profile picture of the author CAllenMartinson
    I google this subject and there isn't much available. I did go to schrenk.com and watch the videos. Awesome stuff, I am definitely getting the book and taking the blue pill.
    {{ DiscussionBoard.errors[3473270].message }}
    • Profile picture of the author Tashi Mortier
      Hey thank you for this great book recommendation!

      There are several ways to do this, one way would be to use the Firefox addon greasemonkey. It allows you to write scripts to adjust websites but I've also seen bots written with it.

      cURL is great, too, and with PHP you have a simple scripting language to control it.

      I think you know what to do now!
      Signature

      Want to read my personal blog? Tashi Mortier

      {{ DiscussionBoard.errors[3473383].message }}
      • Profile picture of the author RyanAndrews
        One of the quickest way to deploy a bot in my experience, is using the humble wget, called from any language of your choice.
        If you look at the man page, it can be configured any way you like.
        {{ DiscussionBoard.errors[3473924].message }}
  • Profile picture of the author McBob
    perl + WWW::Mechanize.
    {{ DiscussionBoard.errors[3477095].message }}
  • Profile picture of the author wayfarer
    Google's spider is written in Python. Python is faster than PHP, and can handle multiple threads, which PHP does not support. If you need something that will run continuously, there's no way you want to be using PHP. If not, PHP is probably a bit easier to use, especially if you have experience with it.
    Signature
    I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
    {{ DiscussionBoard.errors[3477506].message }}
    • Profile picture of the author Tashi Mortier
      Originally Posted by wayfarer View Post

      Google's spider is written in Python. Python is faster than PHP, and can handle multiple threads, which PHP does not support. If you need something that will run continuously, there's no way you want to be using PHP. If not, PHP is probably a bit easier to use, especially if you have experience with it.

      You'd be surprised that there are actually people using PHP to program a Server. Facebook runs on PHP, too.

      It's not the language that makes the difference, it's how you use it.
      Signature

      Want to read my personal blog? Tashi Mortier

      {{ DiscussionBoard.errors[3481481].message }}
      • Profile picture of the author wayfarer
        Originally Posted by Tashi Mortier View Post

        You'd be surprised that there are actually people using PHP to program a Server. Facebook runs on PHP, too.

        It's not the language that makes the difference, it's how you use it.
        I'd be surprised??? PHP is my primary choice for web-development. But it doesn't support multi-threading and making it run continuously is a problem. Programming a spider typically has nothing to do with server programming. An HTTP server is a stateless machine: a request is made, information is processed, then the process dies.

        Of course PHP can be made to run via a cron job, or can make requests to external pages in response to an HTTP request, but because of the nature of HTTP, and PHP's single-threaded nature, the request will not finish until the process has completed. If you just want to make simple scrapes of content in response to user requests, a simple file_get_contents() and some pattern matching is all you need, but in my experience this can cause huge overhead if you start getting a lot of requests to the server.

        Python on the other hand was basically designed for networking tasks, and doesn't need a server to operate. It can run non-stop in the background, forever, as long as it's programmed well.

        Java or Perl would also be good choices, if you're not looking for a "fire and die" process.
        Signature
        I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
        {{ DiscussionBoard.errors[3481609].message }}
        • Profile picture of the author Tashi Mortier
          Well you can have a master PHP file that spawns child PHP processes etc. No problem there.

          I have done that to search the web for proxy servers and check them once.
          Signature

          Want to read my personal blog? Tashi Mortier

          {{ DiscussionBoard.errors[3481805].message }}
          • Profile picture of the author wayfarer
            Originally Posted by Tashi Mortier View Post

            Well you can have a master PHP file that spawns child PHP processes etc. No problem there.

            I have done that to search the web for proxy servers and check them once.
            That's fine, but that has nothing to do with multi-threading. Multi-threading means using multiple cores on a processor. Still doesn't solve the "run continuously" problem. If it doesn't do that, it's not a "bot" as per the original question. It's fine for on the fly content scrapers though, nothing wrong with it.
            Signature
            I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
            {{ DiscussionBoard.errors[3482018].message }}
            • Profile picture of the author Tashi Mortier
              Originally Posted by wayfarer View Post

              That's fine, but that has nothing to do with multi-threading. Multi-threading means using multiple cores on a processor. Still doesn't solve the "run continuously" problem. If it doesn't do that, it's not a "bot" as per the original question. It's fine for on the fly content scrapers though, nothing wrong with it.
              That is not true, multi-threading means that you split up a running program into different independent threads. Now, you can either let the multi-threading get handled by the OS if you use different processes (just as with the Apache prefork module) or you can use different threads (requires you to either use the Windows or Unix API for those threads).

              You can also let a PHP script run continuously on the command line, even as a server, I have created a script for a browser game that would constantly process events and run the combat system scripts in a separate process.

              I think we should stop discussing this, though, because in the core we do agree and we are really taking this subject too far. I don't want to attack you in any way, wayfarer, I'm just excited that you can get nearly everything done in any programming language.
              Signature

              Want to read my personal blog? Tashi Mortier

              {{ DiscussionBoard.errors[3493998].message }}
              • Profile picture of the author wayfarer
                Originally Posted by Tashi Mortier View Post

                multi-threading means that you split up a running program into different independent threads.
                This is true, but my experience is that the new threads don't become useful unless you have more cores to work with.

                Anyway, you're right, we're taking this whole discussion too far
                Signature
                I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
                {{ DiscussionBoard.errors[3494229].message }}
  • Profile picture of the author ALee
    You can write a bot in any language of your choice. If you already know C++ and PHP, go for that! I wrote one in Visual Basic but PHP would be better since it's serverside.
    {{ DiscussionBoard.errors[3481814].message }}
    • Profile picture of the author McBob
      Originally Posted by ALee View Post

      You can write a bot in any language of your choice. If you already know C++ and PHP, go for that! I wrote one in Visual Basic but PHP would be better since it's serverside.
      You can write programs that run on your server in php, perl, visual basic, c/c++, asm,... It simply depends on your server.
      {{ DiscussionBoard.errors[3485680].message }}
  • Profile picture of the author Cliff_OBA
    I was enjoying the threaded PHP discussion . But its probably nerdy enough as is.

    For the OP, lately I have been using Ruby for any 'bot' type programming. There are a ton of useful libraries, and the community support is massive. I am in the middle of porting one of my personal scripts to PHP so I can deploy on my hosting account for others to use, and have found it takes a little extra code to do the same thing. But that works great, too.

    At the end of the day, you want to choose a programming language you are comfortable with as most languages have the library support needed. Ones I have worked with:

    Java, C#, VB.NET,Python, Perl, PHP, Ruby, Javascript (entirely possible, albeit difficult to scale to reasonable performance)

    No need to use all of them, any one would do. Maybe not Javascript .

    Cliff
    {{ DiscussionBoard.errors[3528852].message }}

Trending Topics