Approximating large frequency moments with pick-and-drop sampling
Given data stream D={p1,p2,...,pm} of size m of numbers from {1,...,n}, the frequency of i is defined as fi=|{j:pj=i}|. The k-th frequency moment of D is defined as Fk=∑ni=1fki. We consider the problem of approximating frequency moments for k≥3. For any constant c we show an O(n1−2/klog(n)log(c)(n)) upper bound on the space complexity of the problem. Here log(c)(n) is the iterative log function. We make the following assumptions: n and m are polynomially far; approximation error ϵ and parameter k are constants. We observe a natural bijection between streams and special matrices. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a *pick-and-drop sampling*; it samples a heavy element (i.e., element i with frequency Ω(Fk)) with probability Ω(1/n1−2/k) and gives approximation ~fi≥(1−ϵ)fi. In addition, the estimations never exceed the real values, that is ~fj≤fj for all j. As a result, we reduce the space complexity of finding a heavy element to O(n1−2/klog(n)) bits. We apply our method of recursive sketches and resolve the problem with O(n1−2/klog(n)log(c)(n)) bits. This is joint work with Rafail Ostrovsky.