DOI: https://doi.org/10.20535/2411-1031.2018.6.1.153143

Streaming clustering algorithm for monitoring and condition's diagnostics of technical real-time systems

Dmytro Sharadkin

Abstract


Special features of automatization of the states monitoring and diagnostics processes in technical systems which are executed in real-time mode, in particular modern computer systems and networks, are investigated and described in this paper. It is shown that the existing methods provide a solution for diagnostics and monitoring with significant limitations, which are mainly related to the stationary assumption of the basic characteristics of the objects of monitoring. The specific features of real-time systems require that the classification and clustering algorithms, which form the basis of modern monitoring and diagnostic tools, have to execute in a streaming mode, while simultaneously requirements for minimizing the amount of involved memory. These algorithms should provide practical independence of the execution time from the amount of data. They have to handling with clusters of spherical form in the feature space; to preserve the performance under conditions of dynamically changing of statistical characteristics of the data flow and with an unknown, possibly variable, number of clusters in the sample. Outliers and anomalies in data have to be detected and processed. An algorithm based on the simultaneous use of the mapping of the original data sample into a specially designed finite grid space, using both the fill density characteristics of the object description space and its metric properties for detecting the cluster structure is proposed. The properties of the algorithm and the dependence of its characteristics from the specified parameters are analysed. Some modification of the algorithm allows execute streaming data processing, easily adapt the algorithm without utilization extra memory.  For handling of the clusters' parameters dynamic changes the attenuation function was introduced.  Some variants of its specification were considered, their influence on proposed algorithm's performance was analyzed. The relative simplicity of the algorithm and the semantic transparency of its external parameters make it possible simple configure the algorithm for various areas of its application, including the tasks of IT-security incidents detecting and preventing in computer systems and networks.


Keywords


Monitoring and diagnostics of real-time systems; detection of information security incidents; machine learning; classification; clustering; streaming data processing; dynamic changes of cluster’s form and position

References


O. I. Shelukhin, D. Z. Sakalema, and A. S. Filinova, Detection of intrusions in computer networks (network anomalies). Moscow, Russian Federation: Hotline-Telecom, 2013.

E. Wilson, Monitoring and analysis of networks. Methods for identifying faults, Moscow, Russian Federation: LORI, 2002.

M. Collins, Network Security Through Data Analysis. Sebastopol, CA: O'Reilly Media, Inc., 2014.

N. Adams, and N. Hearth, Data Analysis for Network Cyber-Security. Danvers, MA: Imperial College Press, 2014.

M. Munia, S. Samrose, P. Dey, A. Annesha, and S. Hasan, “Network Intrusion Detection using Selected Data Mining Approaches: A Review”, International Journal of Computer Applications, vol. 132, no.13, pр.10-17, December 2015. doi: 10.5120/ijca2015907572.

A. A Branitsky, and I. V. Kotenko, “Analysis and classification of methods for detecting network attacks”, in Proc. SPIIRAN, iss. 2 (45), рp.207-243, 2016. doi: 10.15622/sp.45.13

T. I. Buldakova, and A. Sh. Jalolov, “Choice of Data Mining Technologies for Intrusion Detection Systems in the Corporate Network”, Engineering Journal: Science and Innovation, iss. 11, pp. 1-14, 2013. doi: 10.18698/2308-6033-2013-11-987.

P. M. Shipulin, and A. N. Shniperov, “About possibility of application of Data Mining methods for the analysis of the distributed attacks in a network”, Actual problems of aviation and komonavtiki, vol. 1, p. 782-784, 2016.

M. Ghesmoune, M. Lebbah, and H. Azzag, “State-of-the-art on Clustering Data Stream”, Big Data Analytics, vol. 1, no. 13, pр.1-27, 2016. doi:10.1186/s41044-016-0011-3.

C. C. Aggarwal, Data Streams: Models and Algorithms. London, UK: Kluwer Academic Publishers, 2007.

doi: 10.1007/978-0-387-47534-9.

S. Muthukrishnan. “Data Streams: Algorithms and Applications”, Foundations and Trends in Theoretical Computer Science, 2005, vol. 1, no.2, pр.117-236. doi: 10.1561/0400000002.

V. A. Gimarov, M.I. Dli, and S.Y.Bityutsky, “Problems of non-stationary clusterization of the petrochemical equipment states”, Neftegazovoedelo, no. 2, pp. 1-9, 2004.

O. V. Nissenbaum, “Algorithm for the clustering of a data stream with changing distribution parameters”, Bulletin of Tyumen State University, no.7, p.180-186, 2013.

F. Cao, M. Estery, W. Qian, and A. Zhou, “Density-Based Clustering over an Evolving Data Stream with Noise”, in Proc. International Conference on Data Mining, Bethesda, 2006, pр.327-336. doi:10.1137/1.9781611972764.29.

A. Kassambara, Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA, 2017.

H.-P. Kriegel, P. Kroeger, J. Sander, and A. Zimek, “Density-based clustering”, WIREs Data Mining and Knowledge Discovery, vol. 1, pp.231-240, 2011. doi: 10.1002/widm.30.

C. C. Aggarwal, Data clustering: algorithms and applications. CRC Press, 2014.

Examples of samples of the library Scikit-Learn. [Електронный ресурс]. Доступно: http://scikit-learn.org/stable/modules/classes.html.

E. A. Khalov, “A systematic review of clear one-dimensional functions of the ownership of intelligent systems”, Information technologies and computer systems, no. 3, рp. 60-74, 2009.




ISSN 2411-1031 (Print), ISSN 2518-1033 (Online)