An Analysis of Organizational Factors and their Relation with GDPR-Compliance: A Natural Language Processing Based Machine Learning Approach
This paper employs machine learning (ML) and natural language processing (NLP) techniques to examine the relationship between organizational factors, such as company size and headquarters location, of data processing entities and their GDPR compliance promises as disclosed in privacy policies. Our methodology comprises three main stages, each representing a key contribution. Firstly, we developed five NLP-based classification models with precision scores of at least 0.908 to assess different GDPR compliance promises in privacy policies. Secondly, we have collected a data set of 8,614 organizations in the EU containing organizational information and the GDPR compliance promises derived from the organization’s privacy policy. Lastly, we have analyzed the organizational factors correlating to these GDPR compliance promises. The findings reveal, among other things, that small or medium-sized enterprises negatively correlate with the disclosure of two GDPR privacy policy core requirements. Moreover, as a headquarters location, Denmark performs best regarding positively correlating with disclosing GDPR privacy policy core requirements, whereas Spain, Italy, and Slovenia negatively correlate with multiple requirements. This study contributes to the novel field of GDPR compliance, offering valuable insights for policymakers and practitioners to enhance data protection practices and mitigate non-compliance risks.
Keywords - general data protection regulation; data protection; privacy policy; natural language processing; machine learning.
Author(s) Blinded
Currently under review at the International Journal of Information Systems and Project Management (IJISPM)