Accurate labeling is important for detecting malware and building reference datasets which can be used for evaluating machine learning (ML) based malware classification and clustering approaches. Labels obtained from Anti-Virus (AV) vendors (such as Kaspersky, Malwarebytes, and McAfee) are one source of information; however, despite ongoing research efforts there is still inconsistency with the labeling across AV vendors. AV vendors use differing formats and naming conventions when reporting labels of malware samples, and the reported labels between any two vendors can disagree. We address this problem in our work utilizing CP-APR, a powerful tensor decomposition method for unsupervised ML, to discover the hidden patterns across AV vendors in the way they report the malware labels. In comparison to the traditional ML methods, tensor decomposition models the multi-dimensional properties of the data and produces interpretable results. The higher-dimensional representation of the AV scans enables the discovery of multi-faceted and complex details of those scans.
Tensors, Machine Learning, AV, Malware
Bhandary, P, Vieson, C., Kiendrebeogo, A., Adetunji, I., Joyce, R., Eren, M. E., and Nicholas, C. (2022). Malware Antivirus Scan Pattern Mining via Tensor Decomposition. Presented at the 13th Annual Malware Technical Exchange Meeting, Online, 2022.
@misc{Bhandary2022MTEM,
title={Malware Antivirus Scan Pattern Mining via Tensor Decomposition},
author={P. {Bhandary} and C. {Vieson} and A. {Kiendrebeogo} and I. {Adetunji} and R. {Joyce} and M. E. {Eren} and C. {Nicholas}},
year={2022},
note={Presented at the 13th Annual Malware Technical Exchange Meeting, Online, 2022}
}