Transformer-based Semantic Malware Analysis Framework (ARD/314)

Transformer-based Semantic Malware Analysis Framework (ARD/314)

Transformer-based Semantic Malware Analysis Framework (ARD/314)
ARD/314
Seed
01 / 04 / 2024 - 31 / 03 / 2025
2,758.275

Mr Iat-meng IEONG

Office of the Government Chief Information Officer
Lapcom Limited


In recent year, malware volume increases significantly, reaching 5.5billion attacks worldwide in 2022, with most being variants. The increasing complexity poses challenges to analysis due to technological advancements hackers leveraging to bypass detection (evasion-techniques). The cyber threat landscape is challenging with issues such as inefficient detection, global shortage of cybersecurity professionals, and lack of automated and accurate analysis tools. Malicious use of generative AI by enabling it generating variants further complicates the landscape. The project aims to address the issues by developing a Transformer-based Semantic Malware Analysis Framework to analyse malware variants safely. It aims to assist and streamline the analysis process to reduce time-consuming work of cybersecurity experts. An optimized File Resolver, streamlines pipeline of intelligence enrichment and evasion reversion process, will be developed and facilitating machine learning analysis by normalizing input. A Disassembly Module that can handle input executable of various CPU architectures and output CPU architecture-independent assembly code (LLVM-IR) will be developed. A new Semantic Malware Similarity Analysis will also be developed by using labelled Code Fragments to train a Fine-Tuned Model (Malware Similarity Model) for similarity analysis focusing on extracting meaning (semantic features) to overcome the low robustness issues of conventional syntactic analysis with better detection accuracy.