Count sketch

From testwiki
Revision as of 11:09, 4 February 2025 by 2a00:23c6:1492:7a01:9c57:8743:f9ca:aa3 (talk) (See also: space)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description Template:Machine learning bar

Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms.[1][2] It was invented by Moses Charikar, Kevin Chen and Martin Farach-ColtonTemplate:Sfn in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[3] (these calculations require counting of the number of occurrences for the distinct elements of the stream).

The sketch is nearly identicalTemplate:Cn to the Feature hashing algorithm by John Moody,[4] but differs in its use of hash functions with low dependence, which makes it more practical. In order to still have a high probability of success, the median trick is used to aggregate multiple count sketches, rather than the mean.

These properties allow use for explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms.[5]

Intuitive explanation

The inventors of this data structure offer the following iterative explanation of its operation:Template:Sfn

  • at the simplest level, the output of a single hash function Template:Mvar mapping stream elements Template:Mvar into {+1, -1} is feeding a single up/down counter Template:Mvar. After a single pass over the data, the frequency n(q) of a stream element Template:Mvar can be approximated, although extremely poorly, by the expected value 𝐄[Cs(q)];
  • a straightforward way to improve the variance of the previous estimate is to use an array of different hash functions si, each connected to its own counter Ci. For each element Template:Mvar, the 𝐄[Cisi(q)]=n(q) still holds, so averaging across the Template:Mvar range will tighten the approximation;
  • the previous construct still has a major deficiency: if a lower-frequency-but-still-important output element Template:Mvar exhibits a hash collision with a high-frequency element, n(a) estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each Ci in the previous construct with an array of Template:Mvar counters (making the counter set into a two-dimensional matrix Ci,j), with index Template:Mvar of a particular counter to be incremented/decremented selected via another set of hash functions hi that map element Template:Mvar into the range {1..Template:Mvar}. Since 𝐄[Ci,hi(q)si(q)]=n(q), averaging across all values of Template:Mvar will work.

Mathematical definition

1. For constants w and t (to be defined later) independently choose d=2t+1 random hash functions h1,,hd and s1,,sd such that hi:[n][w] and si:[n]{±1}. It is necessary that the hash families from which hi and si are chosen be pairwise independent.

2. For each item qi in the stream, add sj(qi) to the hj(qi)th bucket of the jth hash.

At the end of this process, one has wd sums (Cij) where

Ci,j=hi(k)=jsi(k).

To estimate the count of qs one computes the following value:

rq=mediani=1dsi(q)Ci,hi(q).

The values si(q)Ci,hi(q) are unbiased estimates of how many times q has appeared in the stream.

The estimate rq has variance O(min{m12/w2,m22/w}), where m1 is the length of the stream and m22 is q(i[qi=q])2.[6]

Furthermore, rq is guaranteed to never be more than 2m2/w off from the true value, with probability 1eO(t).

Vector formulation

Alternatively Count-Sketch can be seen as a linear mapping with a non-linear reconstruction function. Let M(i[d]){1,0,1}w×n, be a collection of d=2t+1 matrices, defined by

Mhi(j),j(i)=si(j)

for j[w] and 0 everywhere else.

Then a vector vn is sketched by C(i)=M(i)vw. To reconstruct v we take vj*=medianiCj(i)si(j). This gives the same guarantees as stated above, if we take m1=v1 and m2=v2.

Relation to Tensor sketch

The count sketch projection of the outer product of two vectors is equivalent to the convolution of two component count sketches.

The count sketch computes a vector convolution

C(1)xC(2)xT, where C(1) and C(2) are independent count sketch matrices.

Pham and Pagh[7] show that this equals C(xxT) – a count sketch C of the outer product of vectors, where denotes Kronecker product.

The fast Fourier transform can be used to do fast convolution of count sketches. By using the face-splitting product[8][9][10] such structures can be computed much faster than normal matrices.

See also

References

Template:Reflist

Further reading

  • Template:Cite journal
  • Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017.
  • Template:Cite web
  1. Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017.
  2. Template:Cite web
  3. Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Journal of Computer and system sciences 58.1 (1999): 137-147.
  4. Moody, John. "Fast learning in multi-resolution hierarchies." Advances in neural information processing systems. 1989.
  5. Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157.
  6. Larsen, Kasper Green, Rasmus Pagh, and Jakub Tětek. "CountSketches, Feature Hashing and the Median of Three." International Conference on Machine Learning. PMLR, 2021.
  7. Template:Cite conference
  8. Template:Cite journal
  9. Template:Cite journal
  10. Template:Cite journal