java - Randomly selecting k different numbers in a range -

May 15, 2010

i need select k elements randomly in range 0 n-1. n can upto 10^9. , k can range 1 n-1. can in o(n) time shuffling array containing values 0 n-1 , choosing first k elements it. when k small method both time , memory inefficient. there o(k) solution problem?

note : selected k numbers must distinct.

i'm thinking of solution. there 2 approaches can think of. let r set returned.

select random value in range , add r. keep on doing until |r| = k. process takes sum(n/i) n+1-k <= <= ntime , o(k) space.
insert 0 n-1 in array, shuffle it, take first k elements it. process takes o(n+k) time , space.

so given k can select preferable method in o(k) time.

modified fisher-yates algorithm

the shuffle solution can improved, since have shuffle first k elements of array. that's still o(n) because naïve shuffle implementation requires array of size n, needs initialized n values 0 n-1.

initialize value[n] {0..n-1} 0 k-1:   swap(value[i], value[random_in_range(i, n)]) result value[0..k-1]

to improve on that, can use kind of virtual array, consisting of 2 parts:

value: array of first k elements, resulting selection. initialized {0..k-1}
rest: sparse datastructure (a hashtable, example) capacity of k entries, containing remaining entries of array not own index. empty.

now can define function swaps elements i , j value array (note: i<k, guaranteed above algorithm):

# swap elements , j if j < k:   # both elements swapped in selection   tmp = value[i]; value[i] = value[j]; value[j] = tmp else if j in rest:   # element j has been swapped before   tmp = value[i]; value[i] = rest[j]; rest[j] = tmp else:   # value @ j still j, add virtual array   rest[j] = value[i]; value[i] = j

that uses o(k) space , time, value of k≤n.

three algorithm strategy

a simpler solution using o(k) memory keep hashtable of selected values, , generate values until hashtable contains k values, rejecting duplicates.

for small k, probability of randomly-selected element being duplicate insignificant , naïve hashtable simplest solution. on other hand, if k significant fraction of n, hash table wasted space, since simple bit vector of size n be sufficient record values have been seen. , large k, rejection algorithm take time sample fills up, , full vector needed shuffle not more space vector used hold sample.

as result, pragmatic solution use whichever of 3 solutions uses less space , time: values of k sufficiently large n-bit bitvector smaller hash table k entries not large probability of rejection significant (say, n/64≤k≤n/4), use bitvector. smaller values of k, use simple hashtable algorithm, , values of k close n , use fisher-yates shuffle on full n-element vector (but limited k steps).

since choose o(n) strategies in cases k>cn constant c, composite algorithm still o(k) time , space because constraint, n in o(k).

Search This Blog

Call

java - Randomly selecting k different numbers in a range -

modified fisher-yates algorithm

three algorithm strategy

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -