java - Randomly selecting k different numbers in a range -
i need select k
elements randomly in range 0 n-1
. n
can upto 10^9. , k
can range 1 n-1
. can in o(n) time shuffling array containing values 0 n-1
, choosing first k
elements it. when k
small method both time , memory inefficient. there o(k) solution problem?
note : selected k
numbers must distinct.
i'm thinking of solution. there 2 approaches can think of. let r
set returned.
- select random value in range , add
r
. keep on doing until|r| = k
. process takessum(n/i) n+1-k <= <= n
time , o(k) space. - insert 0 n-1 in array, shuffle it, take first
k
elements it. process takes o(n+k) time , space.
so given k
can select preferable method in o(k) time.
modified fisher-yates algorithm
the shuffle solution can improved, since have shuffle first k elements of array. that's still o(n) because naïve shuffle implementation requires array of size n, needs initialized n values 0 n-1.
initialize value[n] {0..n-1} 0 k-1: swap(value[i], value[random_in_range(i, n)]) result value[0..k-1]
to improve on that, can use kind of virtual array, consisting of 2 parts:
value: array of first k elements, resulting selection. initialized {0..k-1}
rest: sparse datastructure (a hashtable, example) capacity of k entries, containing remaining entries of array not own index. empty.
now can define function swaps elements i , j value array (note: i<k, guaranteed above algorithm):
# swap elements , j if j < k: # both elements swapped in selection tmp = value[i]; value[i] = value[j]; value[j] = tmp else if j in rest: # element j has been swapped before tmp = value[i]; value[i] = rest[j]; rest[j] = tmp else: # value @ j still j, add virtual array rest[j] = value[i]; value[i] = j
that uses o(k) space , time, value of k≤n.
three algorithm strategy
a simpler solution using o(k) memory keep hashtable of selected values, , generate values until hashtable contains k values, rejecting duplicates.
for small k, probability of randomly-selected element being duplicate insignificant , naïve hashtable simplest solution. on other hand, if k significant fraction of n, hash table wasted space, since simple bit vector of size n be sufficient record values have been seen. , large k, rejection algorithm take time sample fills up, , full vector needed shuffle not more space vector used hold sample.
as result, pragmatic solution use whichever of 3 solutions uses less space , time: values of k sufficiently large n-bit bitvector smaller hash table k entries not large probability of rejection significant (say, n/64≤k≤n/4), use bitvector. smaller values of k, use simple hashtable algorithm, , values of k close n , use fisher-yates shuffle on full n-element vector (but limited k steps).
since choose o(n) strategies in cases k>cn constant c, composite algorithm still o(k) time , space because constraint, n in o(k).
Comments
Post a Comment