c++ - Is std::sort the best choice to do in-place sort for a huge array with limited integer value? -
i want sort array huge(millions or billions) elements, while values integers within small range(1 100 or 1 1000), in such case, std::sort , parallelized version __gnu_parallel::sort best choice me?
actually want sort vecotor of own class integer member representing processor index.
as there other member inside class, so, if 2 data have same integer member used comparing, might not regarded same data.
counting sort right choice if know range limited. if range [0,m) efficient way have vector in index represent element , value count. example:
vector<int> to_sort; vector<int> counts; (int : to_sort) { if (counts.size() < i) { counts.resize(i+1, 0); } counts[i]++; } note count @ i lazily initialized can resize once if know m.
if sorting objects field , distinct, can modify above as:
vector<t> to_sort; vector<vector<const t*>> count_sorted; (const t& t : to_sort) { const int = t.sort_field() if (count_sorted.size() < i) { count_sorted.resize(i+1, {}); } count_sorted[i].push_back(&t); } now main difference space requirements grow substantially because need store vectors of pointers. space complexity went o(m) o(n). time complexity same. note algorithm stable. code above assumes to_sort in scope during life cycle of count_sorted. if ts implement move semantics can store object , move them in. if need count_sorted outlive to_sort need or make copies.
if have range of type [-l, m), substance not change much, index represents value i + l , need know l beforehand.
finally, should trivial simulate iteration through sorted array iterating through counts array taking account value of count. if want stl iterators might need custom data structure encapsulates behavior.
note: in previous version of answer mentioned multiset way use data structure count sort. efficient in java implementations (i believe guava implementation efficient) not in c++ keys in rb tree repeated many times.
Comments
Post a Comment