This package provides implementations of various one pass algorithms for finding frequent items in data streams. In particular it contains the following:
The code is an extension of the MassDAL library. Implementations are by Graham Cormode.
- Frequent Algorithm
- Lossy Counting, and variations
- Space Saving
- Greewald & Khanna
- Quantile Digest
- Count Sketch
- Hierarchical Count-Min Sketch
- Combinatorial Group Testing
C++ Source code v1.0 (Visual Studio 2005, gcc 3.4.6): [bzip2],[zip].
Finding Frequent Items in Data Streams [pdf],
G. Cormode, M. Hadjieleftheriou
Proc. of the International Conference on Very Large Data Bases (VLDB)
Auckland, New Zealand, August 2008.