Searching technology for questionable records when creating the Unified registry of ukrainian individuals identification
DOI:
https://doi.org/10.20535/2411-1031.2019.7.2.190539Keywords:
Data accuracy, personal data, privacy, end-to-end identifier, registry, confidentiality.Abstract
One of the most effective solutions for protecting personal data when building a Unified Identity Registry is to share end-to-end identifier and hash codes generated from combinations of personal data using one-sided hash functions. This is due to the fact that the stage of creating a unified personal identification register does not involve the use of open personal data and therefore no personal data is allowed on the server and only unique identifiers and hash codes are allowed. In accordance with the principles of creating the above registry, five required and fifteen optional types of personal data stored in the registry were analyzed and used to generate hash codes, as well as possible combinations of personal data fields (ten different combinations of personal data were used in the work) data) built on the types specified. The technology of end-to-end identification has been developed, which has the ability to track errors in the fields with personal data when entering new data and when searching the registry. For the evaluation of the proposed technology, 100,000 simulated individuals were selected with random errors in the appropriate fields that store personal data. These errors are randomly placed in the fields of the created registry database that store personal information of the required and optional types. The efficiency of the proposed technology has also been verified by registering new persons in the registry. The proposed technology has a high tolerance for errors and can correctly identify and associate an individual, even with errors in multiple fields of personal data. Correct personal data, especially in the fields of the database with mandatory personal data, is crucial to avoid erroneous entries in the created registry. In the context of one-sided hash transformation, a doubtful record with personal data can be identified by applying hash operators based on hash codes calculated according to certain combinations of personal data.References
Verkhovna Rada Ukrainy. VI convocation, 11th session. (2012, Sept. 06). Zakon № 5203-VI, Pro Administratyvni Posluhy. [Online]. Available: http://zakon2.rada.gov.ua/laws/show/5203-17. Accessed on: 06.09.19.
Verkhovna Rada Ukrainy. VI convocation, 11th session. (2012, Nov. 20). Zakon № 5492-VI, Pro Yedynyi Derzhavnyi Demohrafichnyi Reiestr Ta Dokumenty Shcho Pidtverdzhuiut Hromadianstvo Ukrainy Posvidchuiut Osobu Chy Yii Spetsialnyi Status. [Online]. Available: https://zakon.rada.gov.ua/laws/card/5492-17. Accessed on: 06.09.19.
Kabinet Ministriv Ukrainy. (2016, Sept. 08). Postanova Kabinetu Ministriv Ukrainy № 606. [Online]. Available: https://zakon.rada.gov.ua/laws/show/606-2016-%D0%BF. Accessed on: 06.09.19.
Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, HHS. [Online]. Available: https://www.hhs.gov/hipaa/for-professionals/privacy/ special-topics/de-identification/index.html. Accessed on: 06.09.19.
J.B. Freymann, J.S. Kirby, J.H. Perry, D.A. Clunie, and C.C. Jaffe, “Image data sharing for biomedical research-meeting HIPAA requirements for de-identification”, Digit Imaging, № 25 (1), pp. 14-24, 2012. doi: 10.1007/s10278-011-9422-x.
O. Uzuner. Y. Luo, and P. Szolovits, “Evaluating the state-of-the-art in automatic de-identification”, Am Med Inform Assoc., № 14 (5), pp. 550-563, 2007. doi: 10.1197/ jamia.M2444.
K.El Emam, and etc., “De-identification methods for open health data: the case of the Heritage Health Prize claims dataset”, Med Internet Res., № 27, 2012. doi: 10.2196/ jmir.2001.
B. S. Elger, and etc., “Strategies for health data exchange for secondary, cross-institutional clinical research”, Comput Methods Programs Biomed, № 99 (3), pp. 230-251, 2010. doi: 10.1016/j.cmpb.2009.12.001.
Privacy rule and research nih. Clinical research and the HIPAA Privacy Rule, HSS. [Online]. Available: https://privacyruleandresearch.nih.gov/ pdf/clin_research.pdf. Accessed on: 06.09.19.
L. Ohno-Machado, and etc., “iDASH: integrating data for analysis, anonymization, and sharing”, Am Med Inform Assoc., № 19 (2), pp. 196-201, 2012. doi: 10.1136/amiajnl-2011-000538.
K. Benitez, and B. Malin, “Evaluating re-identification risks with respect to the HIPAA privacy rule”, Am Med Inform Assoc., № 17 (2), pp. 169-177, 2010. doi: 10.1136/ jamia.2009.000026.
C. Quantin, and etc., “Linking anonymous databases for national and international multicenter epidemiological studies: a cryptographic algorithm”, Epidemiol Sante Publique, № 57 (1), pp. 33-39, 2009. doi: 10.1016/j.respe.2008.10.010.
S.B. Johnson, “Using global unique identifiers to link autism collections”, Am Med Inform Assoc., № 17 (6), pp. 689-695, 2010. doi: 10.1136/jamia.2009.002063.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Collection "Information technology and security"
This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors that are published in this collection, agree to the following terms:
- The authors reserve the right to authorship of their work and pass the collection right of first publication this work is licensed under the Creative Commons Attribution License, which allows others to freely distribute the published work with the obligatory reference to the authors of the original work and the first publication of the work in this collection.
- The authors have the right to conclude an agreement on exclusive distribution of the work in the form in which it was published this anthology (for example, to place the work in a digital repository institution or to publish in the structure of the monograph), provided that references to the first publication of the work in this collection.
- Policy of the journal allows and encourages the placement of authors on the Internet (for example, in storage facilities or on personal web sites) the manuscript of the work, prior to the submission of the manuscript to the editor, and during its editorial processing, as it contributes to productive scientific discussion and positive effect on the efficiency and dynamics of citations of published work (see The Effect of Open Access).