Document classification using naive bayes algorithm

Adi, Abdulwahab O.

Document classification using naive bayes algorithm Abdulwahab O. Adi; Supervisor: Erbuğ Çelebi - Nicosia Cyprus International University 2014 - IX, 49 p. figure 30.5 cm CD

Includes CD

CHAPTER ONE 1 INTRODUCTION 1 Objectives 3 Organization of Thesis 3 CHAPTER 2 4 LITERATURE REVIEW 4 Machine Learning 4 Supervised Learning 5 Unsupervised Learning 7 Semi-Supervised Learning 7 Reinforcement Learning 8 Transduction 8 Learning to Learn 10 Developmental Learning 10 PREVIOUS WORK DONE 11 Naive Bayes Classifier As A Spam Detector 11 Naive Bayes Classifier in Sentiment Analysis 12 Naive Bayes Classifier in Cancer Diagnosis 13 Naive Bayes Classifier in Plant Specie classification 14 CHAPTER 3 15 NAİVE BAYES CLASSIFIER 15 Bayes Theorem 15 Text Classification Simplified 15 Prior Probability,P(c) 18 Likelihood Probability, Pd/c 19 Laplace Smoothening 20 Simple Text Classification Examples 22 CHAPTER 4 26 IMPLEMENTATION 26 Introduction 26 Java an NLP Libraries 26 Program Design 27 Experimental Setup 27 Loading the data set 28 Stop Word Removal 29 Tokenization 30 Stemming 33 Bag of Word Creation 36 Evaluation 39 Classification 39 Design Summary 40 CHAPTER 5 41 EVALUATION 41 Cross Validation method 41 Comparison with other Classifier Application 41 Icsiboost-bigram 42 Expected Maximum alorithm 42 Varied Training Set based Evaluation 42 RESULTS OF EVALUATION PROCEDURES 43 Cross Validation Method 43 Comparison with other Classifier Programs 44 Varied Training Set based Evaluation 45 CHAPTER 6 46 CONCLUSION AND FUTURE WORK 46 REFERENCES 47

'ABSTRACT In this study, we have implemented a naïve Bayes Classifier in the Java Language. The classifier was tested on the popular 20 News group data set for majority of document categorization and clustering algorithm implementation. The ultimate object is for better understanding of the algorithm as an a way for automatic document categorization is done and also to be able to ponder new methods that can be proposed for future research purposes. At the end of this research, we successfully tested the performance of our implementation using three methods. The accuracy was measured by comparing it's with the accuracies of other algorithms using the same dataset. It turned out to work as postulated theoretically in normal academic environs. Also, we were able to conclude that the naïve Bayes classifier performs well among other similar classifiers but it also has its short comings as well. Keywords: Bayes Theorem, Supervised Learning, Document Classification, Naïve Bayes Classifier, Tokenization, Stemming, Machine Learning, Information Retrieval, Java '


Makine öğrenme
Machine learning
Bayes teoremi
Bayes Theorem
Araştırmaya Başlarken  
  Sıkça Sorulan Sorular