IDENTIFYING HARMFUL TWEETS: COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS /
BANDAR KHALED MOHAMMED AL-KALADI ; SUPERVISOR, ASST. PROF. DR. KIAN JAZAYERI
- 46 sheets ; 30 cm +1 CD ROM
Thesis (MSc) - Cyprus International University. Institute of Graduate Studies and Research Information Technologies
Sentiment analysis on platforms along the lines of Twitter provides actual perception into public opinions and trends. This study evaluated the performance of one deep learning model, the Gated Recurrent Unit, and three machine learning algorithms, Support Vector Machine, Random Forest, and Naive Bayes, in differentiating between normal and harmful tweets. The dataset, sourced from Twitter, goes through preprocessing in order to remove noise, including user mentions, hashtags, links, and stop words. The cleaned dataset was split into training and testing sets in order to train each classifier. Performance metrics such as accuracy, precision, recall, F1-score, and confusion matrices were used in order to evaluate each model. An accuracy of 92.02% was accomplished by Naive Bayes, with normal tweets possessing a precision of 0.96 and recall of 0.88, and harmful tweets possessing a precision of 0.89 and recall of 0.96. Random Forest accomplished the highest accuracy at 97.28%, with normal tweets possessing a precision of 0.98 and recall of 0.97, and harmful tweets possessing a precision of 0.97 and recall of 0.98. Support Vector Machine (SVM) accomplished an accuracy of 96.92%, with normal tweets having a precision of 0.96 and recall of 0.98, and harmful tweets possessing a precision of 0.98 and recall of 0.96. The Gated Recurrent Unit (GRU) achieved an accuracy of 95.02%, with normal tweets having a precision of 0.98 and recall of 0.92, and harmful tweets possessing a precision of 0.92 and recall of 0.98. Each algorithm had advantages and disadvantages: Naive Bayes expressed high precision but lower recall for harmful tweets. Random Forest displayed balanced precision and recall. SVM achieved high accuracy with strong performance in both precision and recall. And GRU successfully handled sequential patterns. The findings highlight the relative advantages and limitations of deep learning and machine learning perspectives in Twitter sentiment analysis, with Random Forest and Support Vector Machine appearing as the majority of its effective among the evaluated methods.