Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy


Release:

Releases Archive. Вестник ТюмГУ. Физико-математические науки. Информатика (№7, 2013)

Title: 
Clustering algorithm for data streams with changing distribution parameters


About the author:

Olga V. Nissenbaum, Cand. Sci. (Phys.-Math.), Associate Professor, Information Security Department, Tyumen State University; o.v.nissenbaum@utmn.ru

Abstract:

The article contains a clustering algorithm for time-weighted data streams based on the dynamic EM-algorithm. This algorithm can be used for clustering data with the normal distribution in , the parameters of the distribution undergoing changes over time, which is the case in real dymaniv systems such as computer systems or communication nets. The author offers the results of the computational experiment (based on the imitation model with the normal density of cluster distribution), which prove better quality of the proposed algorithm as to the percent of the erroneously recognized points and precision in cluster parameters description in contrast with the algorithm which does not use the time-weighed factors.

References:

1. Munro, J., Paterson, M. Selection and Sorting with Limited Storage. Theoretical Computer Science. 1980. Pp. 315-323.

2. Henzinger, M., Raghavan, P., Rajagopalan, S. Computing on Data Streams. Digital Equipment Corporation. SRC TN-1998-011, August 1998.

3. Barbara, D. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter. 2003. Vol. 3, № 2. Pp. 23-27.

4. Cao, F., Zhou, A. Y. Fast clustering of data streams using graphics processors. Journal of Software. 2007. Vol. 18, № 2. Pp. 291-302.

5. Zhu, W.H., Yin, J., Xie, Y.H. Arbitrary shape cluster algorithm for clustering data stream. Journal of Software. 2006. Vol. 17, № 3. Pp. 379-387.

6. Chandrika, J., Ananda Kumar, K.R. Dynamic Clustering Of High Speed Data Streams. International Journal of Computer Science. 2012. Vol. 9. Issue 2. № 1. Pp. 224-228.

7. Qian Quan, Chao-Jie Xiao, Rui Zhang. Grid-based Data Stream Clustering for Intrusion Detection. International Journal of Network Security. 2013. Jan. Vol. 15. № 1. Pp. 1-8.

8. Nissenbaum, O.V., Prisjazhnjuk, A.S. Adaptive algorithm for anomalous network traffic indication based on alternating process. Prikladnaja diskretnaja matematika.

Prilozhenie №3 — Applied Discrete Mathematics. Supplement №3. 2010. Pp. 55-58.

(in Russian).

9. Mingzhou Song, Hongbin Wang. Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. Proceedings of SPIE 5803. 2005. Pp. 174-183.

10. Nesterenko, V.A. Effective clustering algorithm with the unknown number of clusters [Jeffektivnyj algoritm klasterizacii s nefeksirovannym chislom klasterov]. M-ly XI Mezhdunarod. nauch.-praktich. konf. «Informacionnaja bezopasnost'». Ch.2 (Proc. of the XI Int. Research Conf. «Information Security». Part. 2). Taganrog, 2010. Pp. 102-104. (in Russian).

11. Nissenbaum, O.V., Rusakov, S.V., Sheshnjaeva, E.S. Adaptive clustering algorithm for the data with changing distribution parameters [Adaptivnyj algoritm klasterizacii dannyh s izmenjajushhimisja parametrami raspredelenija]. M-ly 9 Rossijskoj konf. «Novye informacionnye tehnologii v issledovanii slozhnyh struktur» (Proc. of thw 9th Russian Conf. with Int. Participation «New Information Technologies in Complex Structure Research»). Tomsk, 2012. P. 107. (in Russian).