ETD PDF

Learning Imbalanced Data Sets with Noisy Replication

Citation

Dong, Ensheng. (2017). Learning Imbalanced Data Sets with Noisy Replication. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/dong_idaho_0089n_11106.html

Title:
Learning Imbalanced Data Sets with Noisy Replication
Author:
Dong, Ensheng
Date:
2017
Keywords:
Imbalanced data Machine learning Noisy replication
Program:
Statistical Sciences
Subject Category:
Statistics; Mathematics
Abstract:

The noisy replication method has been proven to be an effective approach in learning the imbalanced binary data set in previous researches. This thesis expands its concept and effectiveness in broader scenarios: we study with several levels of sigma noise, a wide range of imbalanced ratios (IR), eight commonly used machine learning models, both binary and multi-class data sets, adding both noise and anti-noise, and more than 60 simulated and real data sets, etc. This thesis finds that the performance of the noisy replication method is significantly improved with the increase of IR by adding a relatively small noise for some models, KNN, Neural Network and C5.0, for instance. Moreover, it further shows that the noisy replication method is an ideal model-free approach in learning both the binary and the multi-class imbalanced data sets in terms of ROC area and Kullback-Leibler distance.

Description:
masters, M.S., Statistical Sciences -- University of Idaho - College of Graduate Studies, 2017
Major Professor:
Lee, Stephen S
Committee:
Wiest, Michelle M; Gao, Fuchang
Defense Date:
2017
Identifier:
Dong_idaho_0089N_11106
Type:
Text
Format Original:
PDF
Format:
application/pdf

Contact us about this record

Rights
Rights:
In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
Standardized Rights:
http://rightsstatements.org/vocab/InC-EDU/1.0/