System for analysing of big data on cybersecurity issues from social media
Keywords:social media monitoring, cybersecurity, open-source intelligence, social media analysis, CyberAggregator
The paper proposes and substantiates approaches to building a corporate system for monitoring and analyzing social media on cybersecurity issues, which are based on the concept Big Data, Data/Text Mining, Information Extraction, Complex Networks, and Cloud Computing. The components of Elastic Stack technology, Sphinx information retrieval system, Graph Data Base Management System Neo4j, and Gephi graph analysis system are examined in detail. The main idea of a system for analyzing large amounts of data on cybersecurity issues from social media is the simultaneous application of methods and means of information retrieval, data analysis, and aggregation of information flows. The system should ensure the implementation of the following functions: the formation of databases by collecting information from certain information resources; settings for automatic scanning and primary processing of information from websites and social networks; maintaining full-text information databases; identifying duplicates similar in content to informational messages; full-text search; analysis of text messages, determination of tonality, the formation of analytical reports; integration with geographic information system; data analysis and visualization; study of the dynamics of thematic information flows; predicting developments based on the analysis of the dynamics of the publication in social media; providing access for many users to the functional components of the system. The practical significance of the results is to create a working layout of the content monitoring and analysis system of social media on cybersecurity issues, ready to be used as a component in information and cybersecurity decision support systems. The interface of the system layout is considered, in which the functions of search, analysis, and forecasting of information appearance in social media are available. Central to the interface is a digest of the most relevant user needs. In the analytical mode, a number of tools are implemented for graphical presentation of the analyzed data, which are displayed as a time series of the number of relevant queries per day, as well as viewing the main topics, clusters grouped by predefined reference words. The system provides modes for forming networks of concepts that correspond to individual messages (persons, brands) and information sources that allow you to rank the concepts and explore the relationships between them.
D. V. Lande, I. Yu. Subach, and Yu. Ye. Boyarinova, Fundamentals of the theory and practice of data mining in the field of cyber security, Kyiv: Institute of special communication and information protection of National technical university of Ukraine “Igor Sikorsky Kyiv polytechnic institute”, 2018.
D. Boyd, and K. Crawford, “Critical questions for Big Data”, Journal Information, Communication & Society, vol. 15, iss. 5, pp. 662-679, 2012, doi: https://doi.org/10.1080/1369118X.2012.678878.
R. Layton, and P. A. Watters, Automating open source intelligence: algorithms for OSINT: Elsevier, Syngress, 2016, doi: https://doi.org/10.1016/C2014-0-02170-3.
B. Akhgar, P. S. Bayerl, and F. Sampson, Open Source Intelligence Investigation. From Strategy to Implementation: Springer International Publishing AG, 2016.
N. Memon, and R. Reda Alhajj, Counterterrorism and Open Source Intelligence, Wien, Austria: Springer-Verlag, 2011.
E. J. Appel, Cybervetting. Internet Searches for Vetting, Investigations, and Open-Source Intelligence: Taylor & Francis Group, LLC, 2015.
J. W. Foreman, Data Smart. Using Data Science to Transform Information into Insight: Wiley, 2013.
N. Marz, and J Warren, Big Data: Principles and best practices of scalable realtime data systems: Manning, 2012.
D. Cielen, A. Meysman, and M. Ali, Introducing Data Science. Big Data, Machine Learning, and More, Using Python Tools: Manning Publications Co., 2016.
K. Krishnan, Data Warehousing in the Age of Big Data: Elsevier Inc, 2013.
D. Easley, and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World: Cambridge University Press, 2010.
G. Ragozini, and M. P. Vitale, Challenges In Social Network Research: Methods And Applications: Lecture Notes In Social Network: Springer, 2020.
M. Kaya, J. Kawash, S. Khoury, and M. Y. Day, Social Network Based Big Data Analysis and Applications: Springer International Publishing, 2018.
M. Kaya, Ö. Erdogan, and J. Rokne, From Social Data Mining and Analysis to Prediction and Community Detection: Springer International Publishing, 2017.
K. A. Zweig, Network Analysis Literacy: A Practical Approach to the Analysis of Networks, Wien, Austria: Springer-Verlag, 2016.
M. A. Russell, and M. Klassen, Mining the Social Web Data Mining Facebook Twitter LinkedIn Instagram: O’Reilly Media, 2019.
M. A. Russell, 21 Recipes for Mining Twitter: O’Reilly Media, 2011.
ATP 2-22.9, Army Techniques Publication, no. 2-22.9 (FMI 2-22.9). Headquarters Department of the Army Washington, DC, 10 July 2012.
D. Lande, and E. Shnurko-Tabakova, “OSINT as a part of cyber defense system”, Theoretical and Applied Cybersecurity, no. 1, pp. 103-108, 2019, doi: https://doi.org/10.20535/tacs.2664-29132019.1.169091.
D. Lande, “Information Streams Analysis in the Global Computer Networks”, Visnyk NAS of Ukraine, no. 3, pp. 46-54, 2017, doi: https://doi.org/10.15407/visn2017.03.045.
A. Dodonov, D. Lande, V. Tsyganok, O. Andriichuk, S. Kadenko, and A. Graivoronskaya, Information Operations Recognition. From Nonlinear Analysis to Decision-Making: Lambert Academic Publishing, 2019.
P. Kisel’ov, and D. Lande, “Development of software for analysis and forecasting of information operations”, in Proc. of the scientific-practical conference of cadets (students), graduate students, doctoral students and young scientists “Topical issues of special information and telecommunications systems”, Kyiv, 2019, pp. 180.
O. Dodonov, D. Lande, O. Nesterenko, and B. Berezin, “Approach to forecasting the effectiveness of public administration using OSINT technologies”, in Proc. of the XIX International Scientific and Practical Conference ITS-2019, Kyiv, 2019. pp. 230-233.
D. Lande, I. Subach, and A. Sobolyev, “Computer program “Computer program of social networks content monitoring on cybersecurity “CyberAggregator” (“CyberAggregator”)”, Ukraine, Certificate of registration of copyright to the work № 91831, July 31, 2019.
D. Lande, N. Kalyan, and O. Matiishin, “Social media aggregation system on cybersecurity”, in Proc. of the XVII All-Ukrainian scientific-practical conference of students, graduate students and young scientists “Theoretical and applied problems of physics, mathematics and computer science”, Kyiv, 2019, pp. 10-11.
D. Sornette, How to predict the collapse of financial markets. Critical events in complex financial systems, Litres, 2017.
O. V. Urentsov, “Testing the possibility of predicting crises in the financial market using the method of D. Sornette”, in Proc. of the Institute of System Analysis of the Russian Academy of Sciences, 2008, no. 40, pp. 174-191.
How to Cite
Copyright (c) 2020 Information Technology and Security
This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors that are published in this collection, agree to the following terms:
- The authors reserve the right to authorship of their work and pass the collection right of first publication this work is licensed under the Creative Commons Attribution License, which allows others to freely distribute the published work with the obligatory reference to the authors of the original work and the first publication of the work in this collection.
- The authors have the right to conclude an agreement on exclusive distribution of the work in the form in which it was published this anthology (for example, to place the work in a digital repository institution or to publish in the structure of the monograph), provided that references to the first publication of the work in this collection.
- Policy of the journal allows and encourages the placement of authors on the Internet (for example, in storage facilities or on personal web sites) the manuscript of the work, prior to the submission of the manuscript to the editor, and during its editorial processing, as it contributes to productive scientific discussion and positive effect on the efficiency and dynamics of citations of published work (see The Effect of Open Access).