Research: A Novel Natural Language Processing Approach for Analyzing Lengthy Terms and Condition Agreements
Awards: WESEF 2021 Participant, Bronze Medal ACT-SO 2021, 2nd Place Award for Computer Science WESEF 2022
Mentor: Sol Vitkin
Research Location: Systems Inc.
Privacy has always been a concern in an increasingly digital world with very few participants reading the terms and conditions. The “Notice of Choice” framework is what requires organizations to inform the users about the collection and utilization of their data. Upon accepting the terms, companies can exchange the user data with third-party platforms for profit. There are very limited practices addressing these issues, some described as “ineffective” and “unattainable”. The purpose of this research is an improved automated process to inform users of the company's policies and the agreement they accepted rashly through the use of computational linguistics. Data is obtained by the open-source website tosdr.org, to help build a better and more efficient model for the experiment. Once the data is successfully pulled down, it is cleaned to remove tags, symbols, or any other unnecessary text. The result is the website's data and the points system which is fed to the program as testing data. Once trained the model is tested on non-documented phrases to test its accuracy and prints the performance in a confusion matrix. Each stop word is recorded and scrapped to maintain performance in the model. The model was trained on a total dataset of 1026 statements from over 240 websites, resulting in an accuracy of about .8226. Its performance was documented on a confusion matrix where it predicted 531 true positive statements as positive and 313 true negatives as negative. False predictions are minimal whereas 82 true negatives were detected as positives and 100 true positives were predicted as negatives. It was shown that the model performs effectively in determining tone in statements, which makes the creation of more precise automation very possible with over a 90% success rate. Acknowledgment of the concerning actions companies perform with the consumer data is the first step toward solutions to grant individuals digital privacy and protection.
About this Scientist: