Document classification using naive bayes algorithm Abdulwahab O. Adi; Supervisor: Erbuğ Çelebi

Yazar: Katkıda bulunan(lar):Dil: İngilizce Yayın ayrıntıları:Nicosia Cyprus International University 2014Tanım: IX, 49 p. figure 30.5 cm CDİçerik türü:
  • text
Ortam türü:
  • unmediated
Taşıyıcı türü:
  • volume
Konu(lar):
Eksik içerik
1 CHAPTER ONE
1 INTRODUCTION
3 Objectives
3 Organization of Thesis
4 CHAPTER 2
4 LITERATURE REVIEW
4 Machine Learning
5 Supervised Learning
7 Unsupervised Learning
7 Semi-Supervised Learning
8 Reinforcement Learning
8 Transduction
10 Learning to Learn
10 Developmental Learning
11 PREVIOUS WORK DONE
11 Naive Bayes Classifier As A Spam Detector
12 Naive Bayes Classifier in Sentiment Analysis
13 Naive Bayes Classifier in Cancer Diagnosis
14 Naive Bayes Classifier in Plant Specie classification
15 CHAPTER 3
15 NAİVE BAYES CLASSIFIER
15 Bayes Theorem
15 Text Classification Simplified
18 Prior Probability,P(c)
19 Likelihood Probability, Pd/c
20 Laplace Smoothening
22 Simple Text Classification Examples
26 CHAPTER 4
26 IMPLEMENTATION
26 Introduction
26 Java an NLP Libraries
27 Program Design
27 Experimental Setup
28 Loading the data set
29 Stop Word Removal
30 Tokenization
33 Stemming
36 Bag of Word Creation
39 Evaluation
39 Classification
40 Design Summary
41 CHAPTER 5
41 EVALUATION
41 Cross Validation method
41 Comparison with other Classifier Application
42 Icsiboost-bigram
42 Expected Maximum alorithm
42 Varied Training Set based Evaluation
43 RESULTS OF EVALUATION PROCEDURES
43 Cross Validation Method
44 Comparison with other Classifier Programs
45 Varied Training Set based Evaluation
46 CHAPTER 6
46 CONCLUSION AND FUTURE WORK
47 REFERENCES
Özet: 'ABSTRACT In this study, we have implemented a naïve Bayes Classifier in the Java Language. The classifier was tested on the popular 20 News group data set for majority of document categorization and clustering algorithm implementation. The ultimate object is for better understanding of the algorithm as an a way for automatic document categorization is done and also to be able to ponder new methods that can be proposed for future research purposes. At the end of this research, we successfully tested the performance of our implementation using three methods. The accuracy was measured by comparing it's with the accuracies of other algorithms using the same dataset. It turned out to work as postulated theoretically in normal academic environs. Also, we were able to conclude that the naïve Bayes classifier performs well among other similar classifiers but it also has its short comings as well. Keywords: Bayes Theorem, Supervised Learning, Document Classification, Naïve Bayes Classifier, Tokenization, Stemming, Machine Learning, Information Retrieval, Java '
Materyal türü: Thesis

Includes CD

'ABSTRACT In this study, we have implemented a naïve Bayes Classifier in the Java Language. The classifier was tested on the popular 20 News group data set for majority of document categorization and clustering algorithm implementation. The ultimate object is for better understanding of the algorithm as an a way for automatic document categorization is done and also to be able to ponder new methods that can be proposed for future research purposes. At the end of this research, we successfully tested the performance of our implementation using three methods. The accuracy was measured by comparing it's with the accuracies of other algorithms using the same dataset. It turned out to work as postulated theoretically in normal academic environs. Also, we were able to conclude that the naïve Bayes classifier performs well among other similar classifiers but it also has its short comings as well. Keywords: Bayes Theorem, Supervised Learning, Document Classification, Naïve Bayes Classifier, Tokenization, Stemming, Machine Learning, Information Retrieval, Java '

1 CHAPTER ONE

1 INTRODUCTION

3 Objectives

3 Organization of Thesis

4 CHAPTER 2

4 LITERATURE REVIEW

4 Machine Learning

5 Supervised Learning

7 Unsupervised Learning

7 Semi-Supervised Learning

8 Reinforcement Learning

8 Transduction

10 Learning to Learn

10 Developmental Learning

11 PREVIOUS WORK DONE

11 Naive Bayes Classifier As A Spam Detector

12 Naive Bayes Classifier in Sentiment Analysis

13 Naive Bayes Classifier in Cancer Diagnosis

14 Naive Bayes Classifier in Plant Specie classification

15 CHAPTER 3

15 NAİVE BAYES CLASSIFIER

15 Bayes Theorem

15 Text Classification Simplified

18 Prior Probability,P(c)

19 Likelihood Probability, Pd/c

20 Laplace Smoothening

22 Simple Text Classification Examples

26 CHAPTER 4

26 IMPLEMENTATION

26 Introduction

26 Java an NLP Libraries

27 Program Design

27 Experimental Setup

28 Loading the data set

29 Stop Word Removal

30 Tokenization

33 Stemming

36 Bag of Word Creation

39 Evaluation

39 Classification

40 Design Summary

41 CHAPTER 5

41 EVALUATION

41 Cross Validation method

41 Comparison with other Classifier Application

42 Icsiboost-bigram

42 Expected Maximum alorithm

42 Varied Training Set based Evaluation

43 RESULTS OF EVALUATION PROCEDURES

43 Cross Validation Method

44 Comparison with other Classifier Programs

45 Varied Training Set based Evaluation

46 CHAPTER 6

46 CONCLUSION AND FUTURE WORK

47 REFERENCES

Araştırmaya Başlarken  
  Sıkça Sorulan Sorular