HCAB-SMOTE: A HYBRID CLUSTERED AFFINITIVE BORDERLINE SMOTE APPROACH FOR IMBALANCED DATA BINARY CLASSIFICATION/ Hisham AL MAJZOUB; Supervisor: Öykü AKAYDIN, Co-supervisors: Islam ELGEDAWY, Mehtap KÖSE ULUKÖK
Tanım: p. VII, 102; color figure, table, color graphic, 30.5 cm CDİçerik türü:- text
- unmediated
- volume
Materyal türü | Geçerli Kütüphane | Koleksiyon | Yer Numarası | Durum | Notlar | İade tarihi | Barkod | Materyal Ayırtmaları | |
---|---|---|---|---|---|---|---|---|---|
Thesis | CIU LIBRARY Tez Koleksiyonu | Tez Koleksiyonu | D 210 A46 2020 (Rafa gözat(Aşağıda açılır)) | Kullanılabilir | Management Information Systems Department | T2061 |
CIU LIBRARY raflarına göz atılıyor, Raftaki konumu: Tez Koleksiyonu, Koleksiyon: Tez Koleksiyonu Raf tarayıcısını kapatın(Raf tarayıcısını kapatır)
Includes CD
Thesis (Ph.D) - CYPRUS INTERNATIONAL UNIVERSITY INSTITUTE GRADUATE STUDIES AND RESEARCH MANAGEMENT INFORMATION SYSTEMS DEPARTMENT
Includes REFERENCES p. 96-102
ABSTRACT
In this thesis, three algorithms are developed and implemented to optimize
the performance of machine learning oversampling algorithms that increase the
accuracy of the classification tasks over datasets having an imbalanced class
problem. Machine learning uses historical data to reveal hidden patterns and
improve the decision-making process in different business, medical, or other fields.
However, it faces lots of obstacles and challenges, one of them is the dataset
structure issue called imbalanced class dataset problem. In imbalanced class
datasets, the distribution of the instances is imbalanced between the classes, leading
the classification algorithm to act in a biased manner toward the class having the
most instances and obtaining low classification accuracy for instances falling in the
minority class. Most often, the goal of using machine learning is to get the patterns
of the minority class instances so that the model can predict the class of the new
unlabeled instances, but this process acquires low accuracy if the dataset has an
imbalanced class problem. Different methods are available to reduce the effect of
the imbalanced class problem on the generated models bias, but to increase the
classification accuracy with those methods it has to deeply modify the original data
through removing a high number of majority instances through undersampling
methods or generating a huge number of new instances within the minority class
through oversampling methods. The main focus of this thesis is to optimize the
oversampling algorithm SMOTE to increase the classification accuracy of the
intended class with minimal data altering. This thesis proposes the Affinitive
Borderline – SMOTE (AB-SMOTE) that outperforms the classification accuracy of
the former Borderline - SMOTE due to oversampling new instances within the
borderline area instead of oversampling the instances around it. Then, the thesis
develops Clustered Affinitive Borderline – SMOTE (CAB-SMOTE) which clusters
the borderline area into different smaller clusters and oversamples within these
clusters, delivering higher classification accuracy than AB-SMOTE in classifying
the minority instances. Finally, the thesis proposes the Hybrid Clustered Affinitive
Borderline - SMOTE which combines the undersampling method for removing
noisy borderline instances from the majority and minority classes with
oversampling CAB-SMOTE. Thus, obtaining the highest classification accuracy
among other oversampling techniques. Therefore, these methods can be used to
improve the accuracy of some machine learning applications to make them more
reliable for the decision-making process that leads to decrease cost, and increase
profit.
Keywords Imbalanced Data ꞏ Borderline SMOTE ꞏ Oversampling ꞏ SMOTE ꞏ ABSMOTE
ꞏCAB-SMOTE ꞏ HCAB-SMOTEꞏ K-Means Clustering