HCAB-SMOTE: A HYBRID CLUSTERED AFFINITIVE BORDERLINE SMOTE APPROACH FOR IMBALANCED DATA BINARY CLASSIFICATION/ Hisham AL MAJZOUB; Supervisor: Öykü AKAYDIN, Co-supervisors: Islam ELGEDAWY, Mehtap KÖSE ULUKÖK

Yazar: Katkıda bulunan(lar):Tanım: p. VII, 102; color figure, table, color graphic, 30.5 cm CDİçerik türü:
  • text
Ortam türü:
  • unmediated
Taşıyıcı türü:
  • volume
Konu(lar): Tez notu: Thesis (Ph.D) - CYPRUS INTERNATIONAL UNIVERSITY INSTITUTE GRADUATE STUDIES AND RESEARCH MANAGEMENT INFORMATION SYSTEMS DEPARTMENT Özet: ABSTRACT In this thesis, three algorithms are developed and implemented to optimize the performance of machine learning oversampling algorithms that increase the accuracy of the classification tasks over datasets having an imbalanced class problem. Machine learning uses historical data to reveal hidden patterns and improve the decision-making process in different business, medical, or other fields. However, it faces lots of obstacles and challenges, one of them is the dataset structure issue called imbalanced class dataset problem. In imbalanced class datasets, the distribution of the instances is imbalanced between the classes, leading the classification algorithm to act in a biased manner toward the class having the most instances and obtaining low classification accuracy for instances falling in the minority class. Most often, the goal of using machine learning is to get the patterns of the minority class instances so that the model can predict the class of the new unlabeled instances, but this process acquires low accuracy if the dataset has an imbalanced class problem. Different methods are available to reduce the effect of the imbalanced class problem on the generated models bias, but to increase the classification accuracy with those methods it has to deeply modify the original data through removing a high number of majority instances through undersampling methods or generating a huge number of new instances within the minority class through oversampling methods. The main focus of this thesis is to optimize the oversampling algorithm SMOTE to increase the classification accuracy of the intended class with minimal data altering. This thesis proposes the Affinitive Borderline – SMOTE (AB-SMOTE) that outperforms the classification accuracy of the former Borderline - SMOTE due to oversampling new instances within the borderline area instead of oversampling the instances around it. Then, the thesis develops Clustered Affinitive Borderline – SMOTE (CAB-SMOTE) which clusters the borderline area into different smaller clusters and oversamples within these clusters, delivering higher classification accuracy than AB-SMOTE in classifying the minority instances. Finally, the thesis proposes the Hybrid Clustered Affinitive Borderline - SMOTE which combines the undersampling method for removing noisy borderline instances from the majority and minority classes with oversampling CAB-SMOTE. Thus, obtaining the highest classification accuracy among other oversampling techniques. Therefore, these methods can be used to improve the accuracy of some machine learning applications to make them more reliable for the decision-making process that leads to decrease cost, and increase profit. Keywords Imbalanced Data ꞏ Borderline SMOTE ꞏ Oversampling ꞏ SMOTE ꞏ ABSMOTE ꞏCAB-SMOTE ꞏ HCAB-SMOTEꞏ K-Means Clustering
Materyal türü: Thesis
Mevcut
Materyal türü Geçerli Kütüphane Koleksiyon Yer Numarası Durum Notlar İade tarihi Barkod Materyal Ayırtmaları
Thesis Thesis CIU LIBRARY Tez Koleksiyonu Tez Koleksiyonu D 210 A46 2020 (Rafa gözat(Aşağıda açılır)) Kullanılabilir Management Information Systems Department T2061
Toplam ayırtılanlar: 0

Includes CD

Thesis (Ph.D) - CYPRUS INTERNATIONAL UNIVERSITY INSTITUTE GRADUATE STUDIES AND RESEARCH MANAGEMENT INFORMATION SYSTEMS DEPARTMENT

Includes REFERENCES p. 96-102

ABSTRACT
In this thesis, three algorithms are developed and implemented to optimize
the performance of machine learning oversampling algorithms that increase the
accuracy of the classification tasks over datasets having an imbalanced class
problem. Machine learning uses historical data to reveal hidden patterns and
improve the decision-making process in different business, medical, or other fields.
However, it faces lots of obstacles and challenges, one of them is the dataset
structure issue called imbalanced class dataset problem. In imbalanced class
datasets, the distribution of the instances is imbalanced between the classes, leading
the classification algorithm to act in a biased manner toward the class having the
most instances and obtaining low classification accuracy for instances falling in the
minority class. Most often, the goal of using machine learning is to get the patterns
of the minority class instances so that the model can predict the class of the new
unlabeled instances, but this process acquires low accuracy if the dataset has an
imbalanced class problem. Different methods are available to reduce the effect of
the imbalanced class problem on the generated models bias, but to increase the
classification accuracy with those methods it has to deeply modify the original data
through removing a high number of majority instances through undersampling
methods or generating a huge number of new instances within the minority class
through oversampling methods. The main focus of this thesis is to optimize the
oversampling algorithm SMOTE to increase the classification accuracy of the
intended class with minimal data altering. This thesis proposes the Affinitive
Borderline – SMOTE (AB-SMOTE) that outperforms the classification accuracy of
the former Borderline - SMOTE due to oversampling new instances within the
borderline area instead of oversampling the instances around it. Then, the thesis
develops Clustered Affinitive Borderline – SMOTE (CAB-SMOTE) which clusters
the borderline area into different smaller clusters and oversamples within these
clusters, delivering higher classification accuracy than AB-SMOTE in classifying
the minority instances. Finally, the thesis proposes the Hybrid Clustered Affinitive
Borderline - SMOTE which combines the undersampling method for removing
noisy borderline instances from the majority and minority classes with
oversampling CAB-SMOTE. Thus, obtaining the highest classification accuracy
among other oversampling techniques. Therefore, these methods can be used to
improve the accuracy of some machine learning applications to make them more
reliable for the decision-making process that leads to decrease cost, and increase
profit.
Keywords Imbalanced Data ꞏ Borderline SMOTE ꞏ Oversampling ꞏ SMOTE ꞏ ABSMOTE
ꞏCAB-SMOTE ꞏ HCAB-SMOTEꞏ K-Means Clustering

Araştırmaya Başlarken  
  Sıkça Sorulan Sorular