Aggregation of information from diverse networks as the basis for training cyber security specialists on processing ultra large data sets
DOI:
https://doi.org/10.20535/2411-1031.2021.9.1.247256Keywords:
big data, social networks, cybersecurity, information retrieval systems, data aggregation, data science, information technologyAbstract
The basic principles of training cybersecurity specialists on processing large data sets to solve complex unstructured tasks in the course of their functional responsibilities based on the achievements of Data Science in the field of cybersecurity, by acquiring the necessary competencies and practical application of the latest information technologies based on methods of aggregation of large amounts of data are substantiated and presented. The most common latest technologies and tools in the field of cybersecurity, the list of which allows getting a fairly holistic view of what is used today by specialists in the field of Data Science, are considered. The tools you need to have to solve complex problems using big data are analyzed. The subject of the study is the fundamental provisions of the concept of “big data”; appropriate data models; architectural concepts of creating information systems for “big data”; big data analytics, as well as the practical application of big data processing results. The theoretical basis of the training, which includes two sections: “Big Data: theoretical principles”, and “Technological applications for big data”, which, in turn, are logically divided into ten, is considered. As a material and technical basis for the acquisition of practical skills by students, a model based on the system “CyberAggregator” was created and described, which operates and is constantly improved in accordance with the expansion of the list of tasks assigned to it. The CyberAggregator system consists of three main parts: a server for collecting and primary processing of information; an information retrieval server (search engine); an interface server from which the service is provided to users and other systems via the API. The system is based on technological components such as the Elasticsearch information retrieval system, the Kibana utility, the Neo4j database graph management system, JavaScript-based results visualization tools (D3.js) and network information scanning modules. The system provides the implementation of such functions as the formation of databases from certain information resources; maintaining full-text databases of information; detection of duplicates similar in content to information messages; full-text search; analysis of text messages, determination of tonality, formation of analytical reports; integration with the geographic information system; data analysis and visualization; research of thematic information flows dynamics; forecasting events based on the analysis of the publications dynamics, etc. The suggested approach allows students to acquire the necessary competencies needed to process effectively large amounts of data from social networks, create systems for monitoring network information on cybersecurity, selection of relevant information from social networks, search engine implementation, analytical research, forecasting.
References
B. Franks, and T. Davenport, Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. Hoboken, USA: Wiley, 2012.
D. Lande, and E. Shnurko-Tabakova, “OSINT as a part of cyber defense system”, Theoretical and Applied Cybersecurity: scientific journal, vol. 1, no. 1, pp. 103-108, 2019, doi: https://doi.org/10.20535/tacs.2664-29132019.1.169091.
D. Cielen, and A. D. B. Maysmen, Introducing Data Science Big Data, Machine Learning, and more using. Python tools. New York, USA: Manning Publications, 2016.
O. G. Dodonov, D. W. Lande, V. V. Pryshchepa, and V. G. Putyatin, Competitive intelligence, Kyiv, Ukraine: LLC “Engineering”, 2021.
P. J. Sadapage, and M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Boston, USA: Addison-Wesley Professional, 2013.
T. White, Hadoop: Detailed guide, Newton, USA: O’Reilly Media, 2009.
P. Shukla, and S. Kumar, Elasticsearch, Kibana, Logstash and new generation search engines. Moskow, Russia: Piter, 2019.
B. Azarmi, Learning Kibana 5.0. Exploit the visualization capabilities of Kibana and build powerful interactive dashboards. Birmingham, England: Packt Publishing, 2017.
D. Lande, I. Subach, and A. Puchkov, “System of analysis of big data from social media”, Information & Security, vol. 47, no. 1, pp. 44-61, 2020, doi: https://doi.org/10.11610/isij.4703.
D. Lande, O. Puchkov, and I. Subach, “System for analysing of big data on cybersecurity issues from social media”, Information Technology and Security, vol. 8, iss. 1, pp. 4-18, 2020, doi: https://doi.org/10.20535/2411-1031.2020.8.1.217993.
K. Cherven, Mastering Gephi Network Visualization. Produce advanced network graphs in Gephi and gain valuable insights into your network datasets. Birmingham, England: Packt Publishing, 2015.
K. Cherven, Network Graph Analysis and Visualization with Gephi Visualize and analyze your data swiftly using dynamic network graphs built with Gephi. Birmingham, England: Packt Publishing, 2015.
R. V. Bruggen, Learning Neo4j. Run blazingly fast queries on complex graph datasets with the power of the Neo4j graph database. Birmingham, England: Packt Publishing, 2014.
B. М. Gerasimov, О. Y. Sergeev, I. Y. Subach, “Extraction of information phrases from primary electronic documents in information retrieval systems”, Control Systems and Machines, no. 1, pp. 26-29, 2006.
D. W. Lande, and A. N. Grigoriev, “Multilevel classifier-navigator according to the responses of the information retrieval system”, in Proc. Computational linguistics and intelligent technologies: proceedings of the international conference Dialogue, Russia, 2006, pp. 329-331.
S. Murray, Interactive Data Visualization for the Web. An Introduction to Designing with D3. Newton, USA: O’Reilly Media, 2017.
D. Sornette, Why Stock Markets Crash: Critical Events in Complex Financial Systems. Oxford, England: University Press Scholarship, 2017.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Information Technology and Security
This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors that are published in this collection, agree to the following terms:
- The authors reserve the right to authorship of their work and pass the collection right of first publication this work is licensed under the Creative Commons Attribution License, which allows others to freely distribute the published work with the obligatory reference to the authors of the original work and the first publication of the work in this collection.
- The authors have the right to conclude an agreement on exclusive distribution of the work in the form in which it was published this anthology (for example, to place the work in a digital repository institution or to publish in the structure of the monograph), provided that references to the first publication of the work in this collection.
- Policy of the journal allows and encourages the placement of authors on the Internet (for example, in storage facilities or on personal web sites) the manuscript of the work, prior to the submission of the manuscript to the editor, and during its editorial processing, as it contributes to productive scientific discussion and positive effect on the efficiency and dynamics of citations of published work (see The Effect of Open Access).