ETD PDF

A Grid Partition-based Local Outlier Factor for Big Data Stream Processing

Citation

Alsini, Raed. (2021-05). A Grid Partition-based Local Outlier Factor for Big Data Stream Processing. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/alsini_idaho_0089e_12090.html

Title:
A Grid Partition-based Local Outlier Factor for Big Data Stream Processing
Author:
Alsini, Raed
ORCID:
0000-0002-3163-575X
Date:
2021-05
Program:
Computer Science
Subject Category:
Computer science
Abstract:

Outlier detection is getting significant attention in the research field of big data. Detecting the outlier is important in various applications such as communication, finance, fraud detection, and network intrusion detection. Because of their unique characteristics, such as large volume and high velocity, data streams pose a challenge to traditional outlier detection methods. Local Outlier Factor (LOF) is one of the most appropriate techniques for determining outliers in the density-based method. However, it faces some challenges when dealing with the data stream. One issue is that LOF requires the entire dataset as well as the distance value to be stored in the computer memory. Another issue arises when a change occurs in the dataset, which necessitates a significant recalculation from the beginning. To address these issues, this dissertation proposes a new method for detecting local outliers in data streams called the Grid Partition-based Local Outlier Factor (GP-LOF). We improve the GP-LOF algorithm even further by adding another technique known as the Local Outlier Factor by Reachability Distance (LOFR). The improved algorithm is thus called the Grid-Partition-based Local Outlier Factor by Reachability Distance (GP-LOFR). We tested both GP-LOF and GP-LOFR with several benchmark datasets. They outperformed the Density Summarization Incremental Local Outlier Factor (DILOF) algorithm, which is the most representative algorithm in existing studies of data stream processing. We also worked with real-world datasets of concrete mixture. In that work, a new algorithm called the Isolation Forest based on a sliding window for the Local Outlier Factor (IFS-LOF) was developed. The IFS-LOF outperformed both LOF and LOF-Sliding Window (LOF-SW) in accuracy of the results. In summary, the three new algorithms GP-LOF, GP-LOFR, and IFS-LOF are the major contributions of this PhD research. All proposed algorithms work without any previous knowledge of data distributions and are capable to execute with limited computer memory. This PhD research makes a solid contribution to the field of local outlier detection in big data streams. In the near future, we will extend the developed algorithms by applying Evolution Computation (EC) methods to further improve the accuracy and reduce the execution time. Moreover, we will apply these algorithms to more real-world datasets.

Description:
doctoral, Ph.D., Computer Science -- University of Idaho - College of Graduate Studies, 2021-05
Major Professor:
Ma, Xiaogang
Committee:
Soule, Terence ; Sheldon, Frederick ; Ibrahim, Ahmed
Defense Date:
2021-05
Identifier:
Alsini_idaho_0089E_12090
Type:
Text
Format Original:
PDF
Format:
application/pdf

Contact us about this record

Rights
Rights:
In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
Standardized Rights:
http://rightsstatements.org/vocab/InC-EDU/1.0/