AU - Adi,Abdulwahab O. AU - Supervisor: Çelebi, Erbuğ TI - Document classification using naive bayes algorithm PY - 2014/// CY - Nicosia PB - Cyprus International University KW - Makine öğrenme KW - Machine learning KW - Bayes teoremi KW - Bayes Theorem N1 - 1; CHAPTER ONE; 1; INTRODUCTION; 3; Objectives; 3; Organization of Thesis; 4; CHAPTER 2; 4; LITERATURE REVIEW; 4; Machine Learning; 5; Supervised Learning; 7; Unsupervised Learning; 7; Semi-Supervised Learning; 8; Reinforcement Learning; 8; Transduction; 10; Learning to Learn; 10; Developmental Learning; 11; PREVIOUS WORK DONE; 11; Naive Bayes Classifier As A Spam Detector; 12; Naive Bayes Classifier in Sentiment Analysis; 13; Naive Bayes Classifier in Cancer Diagnosis; 14; Naive Bayes Classifier in Plant Specie classification; 15; CHAPTER 3; 15; NAİVE BAYES CLASSIFIER; 15; Bayes Theorem; 15; Text Classification Simplified; 18; Prior Probability,P(c); 19; Likelihood Probability, Pd/c; 20; Laplace Smoothening; 22; Simple Text Classification Examples; 26; CHAPTER 4; 26; IMPLEMENTATION; 26; Introduction; 26; Java an NLP Libraries; 27; Program Design; 27; Experimental Setup; 28; Loading the data set; 29; Stop Word Removal; 30; Tokenization; 33; Stemming; 36; Bag of Word Creation; 39; Evaluation; 39; Classification; 40; Design Summary; 41; CHAPTER 5; 41; EVALUATION; 41; Cross Validation method; 41; Comparison with other Classifier Application; 42; Icsiboost-bigram; 42; Expected Maximum alorithm; 42; Varied Training Set based Evaluation; 43; RESULTS OF EVALUATION PROCEDURES; 43; Cross Validation Method; 44; Comparison with other Classifier Programs; 45; Varied Training Set based Evaluation; 46; CHAPTER 6; 46; CONCLUSION AND FUTURE WORK; 47; REFERENCES N2 - 'ABSTRACT In this study, we have implemented a naïve Bayes Classifier in the Java Language. The classifier was tested on the popular 20 News group data set for majority of document categorization and clustering algorithm implementation. The ultimate object is for better understanding of the algorithm as an a way for automatic document categorization is done and also to be able to ponder new methods that can be proposed for future research purposes. At the end of this research, we successfully tested the performance of our implementation using three methods. The accuracy was measured by comparing it's with the accuracies of other algorithms using the same dataset. It turned out to work as postulated theoretically in normal academic environs. Also, we were able to conclude that the naïve Bayes classifier performs well among other similar classifiers but it also has its short comings as well. Keywords: Bayes Theorem, Supervised Learning, Document Classification, Naïve Bayes Classifier, Tokenization, Stemming, Machine Learning, Information Retrieval, Java ' ER -