Algorithm based on counting for mining frequent items over data stream

Ranwei Zhu, Peng Wang, Majin Liu

Research output: Contribution to journalArticlepeer-review


Mining frequent items over data stream has drawn great attention, and large amount of efficient algorithms have been proposed by many researchers over the past decades. Although the classical algorithms are well suited to find frequent items, usually they do not perform well when estimating items' approximate frequency. To solve this problem, we introduce a series of counter-based algorithms called SRoEC (segment rotative efficient count), SReEC (segment reserve efficient count) and RFreq (reserve frequent). They divide the counter used in classical algorithms and define operations for counters to improve the accuracy of item frequency and avoid the effect of low frequency items. As the experience shows, these algorithms can find Top-K items above the threshold correctly and return their approximate frequency as accurate as possible. Both analysis and experiments demonstrate that under same cost of space, these algorithms return higher count accuracy rate, lower frequency error rate and higher frequency reserve rate on both simulated data set and real data set when compared with the two best classical algorithms (frequent algorithm and space saving algorithm) nowadays. Amongst them, RFreq algorithm shows obvious advantages. What's more, the algorithms perform much better than classical ones when the data distribution is smooth.

Original languageEnglish
Pages (from-to)1803-1811
Number of pages9
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Issue number10
StatePublished - Oct 2011
Externally publishedYes


  • Data mining
  • Data stream
  • Frequency estimation
  • Top-K
  • Words frequent item


Dive into the research topics of 'Algorithm based on counting for mining frequent items over data stream'. Together they form a unique fingerprint.

Cite this