Streaming clustering algorithm for monitoring and condition's diagnostics of technical real-time systems
Keywords:Monitoring and diagnostics of real-time systems, detection of information security incidents, machine learning, classification, clustering, streaming data processing, dynamic changes of cluster’s form and position
Special features of automatization of the states monitoring and diagnostics processes in technical systems which are executed in real-time mode, in particular modern computer systems and networks, are investigated and described in this paper. It is shown that the existing methods provide a solution for diagnostics and monitoring with significant limitations, which are mainly related to the stationary assumption of the basic characteristics of the objects of monitoring. The specific features of real-time systems require that the classification and clustering algorithms, which form the basis of modern monitoring and diagnostic tools, have to execute in a streaming mode, while simultaneously requirements for minimizing the amount of involved memory. These algorithms should provide practical independence of the execution time from the amount of data. They have to handling with clusters of spherical form in the feature space; to preserve the performance under conditions of dynamically changing of statistical characteristics of the data flow and with an unknown, possibly variable, number of clusters in the sample. Outliers and anomalies in data have to be detected and processed. An algorithm based on the simultaneous use of the mapping of the original data sample into a specially designed finite grid space, using both the fill density characteristics of the object description space and its metric properties for detecting the cluster structure is proposed. The properties of the algorithm and the dependence of its characteristics from the specified parameters are analysed. Some modification of the algorithm allows execute streaming data processing, easily adapt the algorithm without utilization extra memory. For handling of the clusters' parameters dynamic changes the attenuation function was introduced. Some variants of its specification were considered, their influence on proposed algorithm's performance was analyzed. The relative simplicity of the algorithm and the semantic transparency of its external parameters make it possible simple configure the algorithm for various areas of its application, including the tasks of IT-security incidents detecting and preventing in computer systems and networks.
O. I. Shelukhin, D. Z. Sakalema, and A. S. Filinova, Detection of intrusions in computer networks (network anomalies). Moscow, Russian Federation: Hotline-Telecom, 2013.
E. Wilson, Monitoring and analysis of networks. Methods for identifying faults, Moscow, Russian Federation: LORI, 2002.
M. Collins, Network Security Through Data Analysis. Sebastopol, CA: O'Reilly Media, Inc., 2014.
N. Adams, and N. Hearth, Data Analysis for Network Cyber-Security. Danvers, MA: Imperial College Press, 2014.
M. Munia, S. Samrose, P. Dey, A. Annesha, and S. Hasan, “Network Intrusion Detection using Selected Data Mining Approaches: A Review”, International Journal of Computer Applications, vol. 132, no.13, pр.10-17, December 2015. doi: 10.5120/ijca2015907572.
A. A Branitsky, and I. V. Kotenko, “Analysis and classification of methods for detecting network attacks”, in Proc. SPIIRAN, iss. 2 (45), рp.207-243, 2016. doi: 10.15622/sp.45.13
T. I. Buldakova, and A. Sh. Jalolov, “Choice of Data Mining Technologies for Intrusion Detection Systems in the Corporate Network”, Engineering Journal: Science and Innovation, iss. 11, pp. 1-14, 2013. doi: 10.18698/2308-6033-2013-11-987.
P. M. Shipulin, and A. N. Shniperov, “About possibility of application of Data Mining methods for the analysis of the distributed attacks in a network”, Actual problems of aviation and komonavtiki, vol. 1, p. 782-784, 2016.
M. Ghesmoune, M. Lebbah, and H. Azzag, “State-of-the-art on Clustering Data Stream”, Big Data Analytics, vol. 1, no. 13, pр.1-27, 2016. doi:10.1186/s41044-016-0011-3.
C. C. Aggarwal, Data Streams: Models and Algorithms. London, UK: Kluwer Academic Publishers, 2007.
S. Muthukrishnan. “Data Streams: Algorithms and Applications”, Foundations and Trends in Theoretical Computer Science, 2005, vol. 1, no.2, pр.117-236. doi: 10.1561/0400000002.
V. A. Gimarov, M.I. Dli, and S.Y.Bityutsky, “Problems of non-stationary clusterization of the petrochemical equipment states”, Neftegazovoedelo, no. 2, pp. 1-9, 2004.
O. V. Nissenbaum, “Algorithm for the clustering of a data stream with changing distribution parameters”, Bulletin of Tyumen State University, no.7, p.180-186, 2013.
F. Cao, M. Estery, W. Qian, and A. Zhou, “Density-Based Clustering over an Evolving Data Stream with Noise”, in Proc. International Conference on Data Mining, Bethesda, 2006, pр.327-336. doi:10.1137/1.9781611972764.29.
A. Kassambara, Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA, 2017.
H.-P. Kriegel, P. Kroeger, J. Sander, and A. Zimek, “Density-based clustering”, WIREs Data Mining and Knowledge Discovery, vol. 1, pp.231-240, 2011. doi: 10.1002/widm.30.
C. C. Aggarwal, Data clustering: algorithms and applications. CRC Press, 2014.
Examples of samples of the library Scikit-Learn. [Електронный ресурс]. Доступно: http://scikit-learn.org/stable/modules/classes.html.
E. A. Khalov, “A systematic review of clear one-dimensional functions of the ownership of intelligent systems”, Information technologies and computer systems, no. 3, рp. 60-74, 2009.
How to Cite
Copyright (c) 2020 Collection "Information technology and security"
This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors that are published in this collection, agree to the following terms:
- The authors reserve the right to authorship of their work and pass the collection right of first publication this work is licensed under the Creative Commons Attribution License, which allows others to freely distribute the published work with the obligatory reference to the authors of the original work and the first publication of the work in this collection.
- The authors have the right to conclude an agreement on exclusive distribution of the work in the form in which it was published this anthology (for example, to place the work in a digital repository institution or to publish in the structure of the monograph), provided that references to the first publication of the work in this collection.
- Policy of the journal allows and encourages the placement of authors on the Internet (for example, in storage facilities or on personal web sites) the manuscript of the work, prior to the submission of the manuscript to the editor, and during its editorial processing, as it contributes to productive scientific discussion and positive effect on the efficiency and dynamics of citations of published work (see The Effect of Open Access).