PHP/MySQL Experts : How to Generate a Unique Sequence of Numbers?

26 replies
I originally thought this would be quite simple but it now appears far from trivial. Perhaps I am missing something.

I want a way to create a sequence of 32-bit tokens that does not repeat and must be guaranteed unique across different sessions.

Preferably pseudo-random but sequential would be OK too.

Although MySQL does not support sequences one possible way is to create a table with an auto_increment column and for a new token insert NULL and then read back last_insert_id, but I would prefer to avoid an extra database write per session.

Any other ideas?

The tokens must be 32-bits long and the method work consistently on 32/64-bit servers without requiring any extensions or privileges beyond a typical default hosting account.

I'm beginning to think the above method may be the only way and would be interested in the thoughts of some more experienced PHP/MySQL developers.

Cheers,

Phil
#experts #generate #numbers #php or mysql #sequence #unique
  • Profile picture of the author jaimegm
    Ok you have to keep track of each session increasing your number and based on that continue with the next binary number, I mean session 1, [1 .. 32] session 2 [33 ..64] session 3 [65 .. 96] and so on, so for each number you have to return an array of numbers as described above and then convert them from decimals to binary and from binary to chars, that is it.
    Now let's make the formula, if N is any number (1,2,3,...)
    base=(N-1)*32;
    A[1]=base+1;
    A[2]=base+2;
    ...
    A[32]=base+32;
    That will make it
    Good luck
    {{ DiscussionBoard.errors[2499197].message }}
    • Profile picture of the author xiaophil
      Thanks for the reply, although I admit I don't understand the method you are suggesting.

      Maybe I didn't explain well enough. There is nothing special about the tokens, they are just numbers.

      If each session could get the new value from a sequence 1,2,3... that would be fine, I could hash them to make them look random.

      The problem is really twofold:

      1) Finding a unique sequence generator that doesn't repeat, but if it's sequential and not pseudo-random then that's not an issue.

      2) Atomically selecting the next value from the sequence across multiple simultaneous sessions.

      The MySQL auto_increment method is the only one I could come up with. The problem with that is it's impossible to get the next number without writing to the database first. Guess I will just have to take the hit.

      Cheers,

      Phil
      {{ DiscussionBoard.errors[2499459].message }}
  • Profile picture of the author jaimegm
    Why you don't take the seconds of the Greenwich time as a base plus milliseconds?
    I mean the seconds from 1970 up to the moment.
    {{ DiscussionBoard.errors[2499646].message }}
    • Profile picture of the author xiaophil
      It would be great to use time and I did consider this, but millisecond precision would steal 10 bits and leave only 22 for the seconds which would wrap in less than two months, and that's a bit soon.

      The chances of a collision are probably quite slim but I still wouldn't rest easy.

      If the token were larger, say 64-bits, MySQL has a UUID_SHORT function which is guaranteed unique.

      Hopefully one day sequences will be built into MySQL. Until then I still think the auto_increment column is the only sensible way forward. Either that or increase the token size.

      I miss PostgreSQL and Python. Looking forward to seeing them again.

      Cheers,

      Phil
      {{ DiscussionBoard.errors[2499844].message }}
      • Profile picture of the author wayfarer
        Originally Posted by xiaophil View Post

        The chances of a collision are probably quite slim but I still wouldn't rest easy.
        So concatenate the time() with a PHP session_id, then hash it. Get a session id like this:
        PHP Code:
        <?php
        session_start
        ();
        $current_session_id session_id();
        ?>
        Now stick it together with time() and hash it. You'll never get a collision this way, not if you try for thousands of years:
        PHP Code:
        <?php
        $unique_string 
        time().$current_session_id;
        $
        32_bit_sequence md5($unique_string);
        ?>
        I may not be totally clear what your aim is. Is there a reason you can't just use PHP's built in session management?
        Signature
        I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
        {{ DiscussionBoard.errors[2500156].message }}
        • Profile picture of the author xiaophil
          Originally Posted by MemberWing View Post

          unique_token = md5(time().microtime());
          Originally Posted by wayfarer View Post

          32_bit_sequence = md5(unique_string)
          Thanks for the replies guys but an MD5 digest is 128-bits.

          I may not be totally clear what your aim is. Is there a reason you can't just use PHP's built in session management?
          It's not for session management. I just mentioned sessions to remind that there may be simultaneous access to the sequence.

          The requirement is very simple: generate a unique 32-bit token, on demand.

          The fewer system resources used the better, although currently it's Hobson's choice because still the only way I can see to do it is the auto_increment method above.

          I racked my brain for some time over this one and would be very pleasantly surprised if there were a lighter weight solution available (on LAMP anyway).

          Cheers,

          Phil
          {{ DiscussionBoard.errors[2501536].message }}
          • Profile picture of the author AnthonyThomson
            Originally Posted by xiaophil View Post

            Thanks for the replies guys but an MD5 digest is 128-bits.
            Hi Phil,

            What do you think about taking the most/least significant 32 bits of a 128-bit MD5 hash?

            Kind regards,

            Anthony.
            {{ DiscussionBoard.errors[2505235].message }}
            • Profile picture of the author xiaophil
              Originally Posted by AnthonyThomson View Post

              What do you think about taking the most/least significant 32 bits of a 128-bit MD5 hash?
              Hi Anthony,

              I would expect a collision rate of maybe 1 in 1000.

              We need to be really careful when truncating hashes to so few bits. I'm not an expert in these things but think it has something to do with the Birthday Problem. I once actually did use a digest truncated to 64-bits but had enough information (and RAM) to compute all the permutations I expected and verify the absence of collisions.

              As an aside, I often see MD5 being misused as a magic 'uniqifier' when of course the output domain can only be as unique as the input domain. If that's already unique, and its elements are small, then we don't need a digest.

              Also (just for interest's sake) the PHP session ID is a hash of time and microtime, amongst other things. For PHP-5.3.3 anyone interested can take a look at line 371 of session.c

              I would really prefer a deterministic approach and am surprised at the apparently limited options of implementing such a seemingly trivial task on the current platform.

              Still think I must be missing a trick

              Cheers,

              Phil
              {{ DiscussionBoard.errors[2506790].message }}
              • Profile picture of the author AnthonyThomson
                Phil,

                I think the best way is to achieve your goal is to use some form of persistence, which unfortunately would involve DB transactions.

                The auto-increment approach you outlined in the initial post would work; let me suggest a method for generating a sequence of distinct 32-bit tokens, unique across different sessions:

                * Choose an initial value n, where 0≥n>2^32
                * Choose a prime number p, where p is co-prime with 2^32
                * Insert a row with the value of n to your MySQL table
                * For each token to be generated:
                - Read the value of n from your MySQL table
                - n:=n + p (mod 2^32)
                - Update the row in your MySQL table with the updated value of n
                - Issue the value of n as a token

                This method will generate a sequence of 2^32 tokens without repitition.

                Let me know if this sounds reasonable - feel free to PM me if you like.

                Kind regards,

                Anthony.
                {{ DiscussionBoard.errors[2506920].message }}
                • Profile picture of the author xiaophil
                  Hi Anthony,

                  Thanks for your reply.

                  n:=n + p (mod 2^32)
                  Should that be n:=n * p (mod 2^32)?

                  Anyway I ended up going with a straight sequence from the DB and applying an integer hash according to Knuth's method where p is the golden ratio. The effect is the same as you described only the result doesn't need to be saved.

                  The hash is computed within the DB from a single insert statement, atomically and without transactions using an on_duplicate hack. It's actual value is unknown however until selected so the net query cost is the same as your solution.

                  Some problems are far too interesting. While increasing the token length was an option (and a quick fix) the original requirement did not seem overly stringent. Plus I enjoy a challenge.

                  Thanks again for your input.

                  Phil.
                  {{ DiscussionBoard.errors[2507390].message }}
                  • Profile picture of the author AnthonyThomson
                    Phil,

                    Glad you found a solution to your problem.

                    I'm pretty sure that n:=(n+p)%(2^32) generates a sequence without repetition; thanks to one of the cool properties of prime numbers.

                    I could be wrong, but I think that n:=(n*p)%(2^32) isn't guaranteed to generate a sequence without repetition; say you were generating 4-bit keys, n:=(n*p)%(2^4) will unfortunately output a list of keys that repeats every 2 or 4 keys, depending on your choice of p .

                    Kind regards,

                    Anthony.
                    {{ DiscussionBoard.errors[2507560].message }}
                    • Profile picture of the author xiaophil
                      Originally Posted by AnthonyThomson View Post

                      I'm pretty sure that n:=(n+p)%(2^32) generates a sequence without repetition;
                      Yes, right you are. I'll remember that one

                      I could be wrong, but I think that n:=(n*p)%(2^32) isn't guaranteed to generate a sequence without repetition;
                      Not if n is accumulated, but for m = (n*p)%e where n is sequential and p is the golden ratio of e it seems to shuffle things really well. I didn't see any repeats up to 24-bits anyway.

                      Thanks again,

                      Phil
                      {{ DiscussionBoard.errors[2507860].message }}
          • Profile picture of the author wayfarer
            Originally Posted by xiaophil View Post

            Thanks for the replies guys but an MD5 digest is 128-bits.
            Ah, yes it is... An MD5 is 32 characters long... which is totally different, but what my brain was thinking of... lol
            Signature
            I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
            {{ DiscussionBoard.errors[2507629].message }}
            • Profile picture of the author xiaophil
              Originally Posted by wayfarer View Post

              An MD5 is 32 characters long...
              I tend to think in terms of binary - a hangover from hardware design I guess.

              If you printed a 128-bit number in base-16 (and left padded any zeros) then it could certainly be 32 characters long.

              5d41402abc4b2a76b9719d911017c592

              All of these are the same as the number above, assuming it's unsigned:

              1352024005257045452355345614731042005742622

              123957004363873451094272536567338222994

              5IR3T0OZOELRNAUHRWYU1XFGY

              10111010100000101000000001010101011110001001011001 01010011101101011100101110001100111011001000100010 000000101111100010110010010

              Cheers,

              Phil
              {{ DiscussionBoard.errors[2507941].message }}
  • Profile picture of the author MemberWing
    Originally Posted by xiaophil View Post

    I originally thought this would be quite simple but it now appears far from trivial. Perhaps I am missing something.

    I want a way to create a sequence of 32-bit tokens that does not repeat and must be guaranteed unique across different sessions.

    Preferably pseudo-random but sequential would be OK too.

    Although MySQL does not support sequences one possible way is to create a table with an auto_increment column and for a new token insert NULL and then read back last_insert_id, but I would prefer to avoid an extra database write per session.

    Any other ideas?

    The tokens must be 32-bits long and the method work consistently on 32/64-bit servers without requiring any extensions or privileges beyond a typical default hosting account.

    I'm beginning to think the above method may be the only way and would be interested in the thoughts of some more experienced PHP/MySQL developers.

    Cheers,

    Phil
    $unique_token = md5(time().microtime());

    Gleb
    {{ DiscussionBoard.errors[2500144].message }}
  • {{ DiscussionBoard.errors[2507247].message }}
    • Profile picture of the author xiaophil
      Originally Posted by KirkMcD View Post

      Microshell Emulating nextval() function to get sequence in MySQL
      Thanks Kirk,

      That's an interesting idea although it's a fair bit of overhead just to get a number and I am not sure about its atomicity.

      Also not sure if function creation is permitted by default on a typical shared host.

      I think it's aimed more at a compatibility layer for PostgreSQL sequences?

      Cheers,
      Phil
      {{ DiscussionBoard.errors[2507512].message }}
  • Profile picture of the author wayfarer
    I was thinking about this post, and queried this: crc32. I wasn't even aware of this hash type, but apparently it's been available in PHP for quite some time now. According to this page, the chance of a random collision is 1 / 2^32
    Signature
    I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
    {{ DiscussionBoard.errors[2514809].message }}
    • Profile picture of the author xiaophil
      Originally Posted by wayfarer View Post

      I was thinking about this post, and queried this: crc32. I wasn't even aware of this hash type, but apparently it's been available in PHP for quite some time now. According to this page, the chance of a random collision is 1 / 2^32
      Hi,

      I think there may be some confusion about CRC32 on the page you referenced.

      As it's not a 'perfect hash' a CRC32 will generate the same output even for many different inputs, and the chances of collision are much, much higher than the figure mentioned.

      I think generally a common point of confusion is the difference between 'random' and 'unique'.

      A good random number generator will repeat some numbers before using all the numbers in its range. That's what random is - each new selection is independent of all the previous selections, so having repeating numbers is normal and desirable for an RNG.

      A 'unique' number generator is a little different. The simplest of course is to just start counting 1,2,3... but if we want the numbers to be unpredictable then what is really needed is a shuffle where each number will only be used once. See the discussion I had with Anthony for some ways to achieve this.

      A unique sequence may look random, and the choice of number may be (pseudo) random, but the sequence itself is certainly not random.

      Even with 2^32 unique inputs, a CRC32 would repeat many, many times and the collision rate can be no better than random numbers which I still think would generally average around 1 in 1000 for 32-bits.

      Cheers,
      Phil
      {{ DiscussionBoard.errors[2518314].message }}
  • Profile picture of the author bucksuper
    Here you go man. If you just want numbers modify the function and remove the letters.

    <?
    //GET A RANDOM STRING
    echo createRandomString('25');


    //*****************************
    //FUNCTION: CREATE RANDOM STRING
    //*****************************
    function createRandomString($stringLength) {
    $chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW XYZ0123456789";
    srand((double)microtime()*1000000);
    $i = 0;
    $pass = '' ;
    while ($i <= $stringLength) {
    $num = rand() % 33;
    $tmp = substr($chars, $num, 1);
    $randomNum = $randomNum . $tmp;
    $i++;
    }
    return $randomNum;
    }

    ?>

    Let me know if you need any help with it.
    Best of luck!
    Signature
    300% Traffic Return check out Trippy Wire Content Exchange! (100% FREE!)
    Trippy Wire Content Exchage Network

    Trippy Wire is looking for partners in these niches niches: Humor, Movies, Gaming, Celebrities, Entertainment, Sports, Food & Drinks, Hot Women, Technology & Transportation.
    {{ DiscussionBoard.errors[2516830].message }}
    • Profile picture of the author xiaophil
      Originally Posted by bucksuper View Post

      Here you go man. If you just want numbers modify the function and remove the letters.
      //*****************************
      //FUNCTION: CREATE RANDOM STRING
      //*****************************
      function createRandomString($stringLength) {
      $chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS TUVW XYZ0123456789";
      srand((double)microtime()*1000000);
      $i = 0;
      $pass = '' ;
      while ($i <= $stringLength) {
      $num = rand() % 33;
      $tmp = substr($chars, $num, 1);
      $randomNum = $randomNum . $tmp;
      $i++;
      }
      return $randomNum;
      }




      Hi, and thanks for your reply.

      Although using a PRNG is not suitable for my application I do have a few thoughts on the code you supplied and some suggestions for improvements if you are interested.

      1) The number of selectable characters in the code you posted is 62 but the code is only selecting from the first 33.

      2) PHP's rand() function is really a terrible PRNG. On some platforms it's default range is only 15-bits. mt_rand() using the Mersenne Twister algorithm is better, but still only 31 bits on some platforms, I believe.

      3) Seeding is done automatically since PHP 4.2. Reseeding every time is not necessary and probably won't make things any more random.

      4) If you rearranged the selectable string a bit you could have a generic function for generating strings of random numbers for any base from 1 to 62.

      Here's a function I use for generating pseudo-random keys, passwords and numbers of different bases.

      function _random_key($length,$base) {
      $chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz';
      $key = '';
      for ( $i=0; $i<$length; $i++ ) {
      $key .= $chars[ (mt_rand( 0, ($base-1) ))];
      }
      return $key;
      }

      (sorry about the formatting - the forum barfed on php or code tags, so plain text it is. That space after the 'm' in the chars string isn't really there either.)

      So for a random 32-bit number as hex...

      _random_key(8,16)

      or whatever length and base you want, up to base 62.

      Cheers,

      Phil
      {{ DiscussionBoard.errors[2518441].message }}
      • Profile picture of the author wayfarer
        Originally Posted by xiaophil View Post

        (sorry about the formatting - the forum barfed on php or code tags, so plain text it is. That space after the 'm' in the chars string isn't really there either.)
        Before you submit your post, scroll down to "Additional Options" and uncheck the "Automatically parse links in text". This option messes with PHP highlighting on this forum for whatever reason.
        Signature
        I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
        {{ DiscussionBoard.errors[2519835].message }}
  • Profile picture of the author Johnny Slater
    That code looks very familiar bucksuper... lol

    It's almost exactly the same as the code I had in SMP to generate random passwords. It looks like an advanced workup of what I had.. but really nice use of a random character generator.
    Signature

    {{ DiscussionBoard.errors[2517158].message }}
    • Profile picture of the author bucksuper
      Originally Posted by Johnny Slater View Post

      That code looks very familiar bucksuper... lol

      It's almost exactly the same as the code I had in SMP to generate random passwords. It looks like an advanced workup of what I had.. but really nice use of a random character generator.
      Johnny,
      What's SMP?
      I don't think I wrote that from scratch but I definetely modded it. Not sure but I've been using it for years.
      Signature
      300% Traffic Return check out Trippy Wire Content Exchange! (100% FREE!)
      Trippy Wire Content Exchage Network

      Trippy Wire is looking for partners in these niches niches: Humor, Movies, Gaming, Celebrities, Entertainment, Sports, Food & Drinks, Hot Women, Technology & Transportation.
      {{ DiscussionBoard.errors[2526326].message }}
  • Profile picture of the author Johnny Slater
    Buck, sorry haven't been in this area of the forum for a few days and missed your question. SMP is Simple Member Pro. It's a membership script I wrote 3 years ago.

    It does use some code from another script that I had PLR to, namely the admin and member logins and html formatting. The code you posted was almost line for line the same code that is in the common class that was in the code I modeled my script after. A lot of people do have rights to that particular block of code so it wouldn't surprise me if you did see it in another script some time in the past.
    Signature

    {{ DiscussionBoard.errors[2546067].message }}

Trending Topics