Classifying Imbalanced Data for DDoS Attack Detection

ETD PDF

Classifying Imbalanced Data for DDoS Attack Detection

Citation

Alghamdi, Amal Ali. (2021-12). Classifying Imbalanced Data for DDoS Attack Detection. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/alghamdi_idaho_0089n_12287.html

Title:

Classifying Imbalanced Data for DDoS Attack Detection

Author:

Alghamdi, Amal Ali

Date:

2021-12

Embargo Remove Date:

2023-12-21

Program:

Computer Science

Subject Category:

Computer science

Abstract:

In the first quarter of 2021, researchers witnessed over 2.8 million DDoS attacks—a 32% increase from the same period in 2020, as reported by Info-Security magazine on May 18, 2021. The magazine also noted that the number of attacks against educational institutions has increased by 41% over the past three quarters. DDoS has become a serious issue for many organizations and individuals. The evolution of networks has ushered in a level of complexity that is the enemy of security. Currently, attacks are more prevalent and at the same time more noticeable due to the variety of features that exist on networks, a consequence of the constant escalation between attackers and defenders. Machine learning algorithms (MLAs) have become a tool to help thicken the layers of defense. To be effective, MLAs must be trained in ways that provide high confidence for detection and prevention, which boils down to precision and accuracy (i.e., low false positives and/or high true positives). This work has developed a setup for establishing a measured intrusion detection system (IDS) that can help to better understand and identify the various unique features of a network in order to better prevent DoS and DDos attacks from being successful.

The goal is to develop models that can predict (i.e., classify) with high precision and accuracy different types of DoS/DDoS attacks with low false positive/negative rates. In addition to dealing with the multiclass classification and extremely imbalance problems, the derived model leverages two feature selection techniques to reduce the number of features in the dataset and help improve the model's execution time, thereby reducing the IDS complexity. A combination of under-sampling with adjusting weight was applied to handle the imbalance problem. The extracted data was evaluated using supervised MLAs, including Random Forest, Decision tree, Naive Bayes, Logistic regression, and ensemble methods. Ensemble methods using supervised outcomes aim to improve the overall performance of the classification. The experiments utilized the popular benchmark NSL-KDD and CICIDS2017 datasets. Random Forest achieved the best performance results, decreasing 37% of the training and test time. In addition to solving the imbalance problem caused by feature selection, it increased accuracy 6.25% and FPR 21%. The random forest model has achieved 99% accuracy and 0.0001 for the False-Positive rate. Furthermore, it can detect minor classes with more than 80% accuracy.

Description:

masters, M.Engr., Computer Science -- University of Idaho - College of Graduate Studies, 2021-12

Major Professor:

Sheldon, Frederick

Committee:

Marshall, Xiaogang; Song, Jia; Soule, Terry

Defense Date:

2021-12

Identifier:

Alghamdi_idaho_0089N_12287

Type:

Text

Format Original:

PDF

Format:

application/pdf

Rights

Rights:: In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
Standardized Rights:: http://rightsstatements.org/vocab/InC-EDU/1.0/

« Previous Back to Browse Next »

« »