Tensor decomposition is a powerful unsupervised Machine Learning method that enables the modeling of multi-dimensional data, including malware data. This thesis introduces a novel ensemble semi-supervised classification algorithm, named Random Forest of Tensors (RFoT), that utilizes tensor decomposition to extract the complex and multi-faceted latent patterns from data. Our hybrid model leverages the strength of multi-dimensional analysis combined with clustering to capture the sample groupings in the latent components, whose combinations distinguish malware and benign-ware. The patterns extracted from a malware data with tensor decom- position depend upon the configuration of the tensor such as dimension, entry, and rank selection. To capture the unique perspectives of different tensor configurations, we employ the “wisdom of crowds” philosophy and make use of decisions made by the majority of a randomly generated ensemble of tensors with varying dimensions, entries, and ranks. We show the capabilities of RFoT when classifying Windows Portable Executable (PE) malware and benign-ware.
Tensors, Machine Learning, Ensemble, Semi-supervised, Malware
Eren, M. E. Random Forest of Tensors (RFoT) Master’s Thesis. Master’s Thesis in Computer Science at the University of Maryland, Baltimore County Department of Computer Science and Electrical Engineering. 2022.
@misc{eren2022RFoTThesis,
title={Random Forest of Tensors (RFoT) Master's Thesis},
author={M. E. {Eren}},
year={2022},
note={Master's Thesis in Computer Science at the University of Maryland, Baltimore County Department of Computer Science and Electrical Engineering. 2022.}
}