Deep Reinforcement Learning for Combined Cyber Threats

Threat
Cyberattacks are infiltrating systems through multiple access points and in many forms, including torjan horses, infected USB devices, spearphishing campaigns, and supply chain compromises.

By: Amr M. Saber

Expertise: Reinforcement learning (RL), deep learning, Neural network design and implementation, ICS security, Systems control.

Skills: Python, PyTorch, StableBaselines, OpenAI Gym, Andes, Numpy, Matplotlib

This research project is the subject of the paper below, presented at the 2023 International Conference on Machine Learning (ICMLA):

A. S. Mohamed and D. Kundur, "Reinforcement Learning for Supply Chain Attacks Against Frequency and Voltage Control," in 22nd IEEE International Conference on Machine Learning and Applications (ICMLA) 2023.

The paper's code is also available at paperswithcode.com/paper/reinforcement-learning-for-supply-chain.

This research is an extension of this project.

Problem: Cyberattacks can exploit various channels to target systems. Supply chain attacks, for instance, can infect devices before deployment, while spearphishing campaigns can use infected emails, Internet links, attachments, and more to introduce malware—such as trojan horses, rootkit software, and worms—to recipients' machines, subsequently propagating to other networked machines. The sophistication of these cyberattacks is evident as they patiently await opportune moments to cause system damage. Coordinated attacks, if activated simultaneously, can have a compounding detrimental effect on our systems, particularly in critical infrastructures such as transportation, water supply and treatment, energy, and medical systems.

In industrial control systems, upon which these critical systems rely, the consequences can be even severe.

Challenge: The primary challenge lies in our inability to simulate how multiple attacks might synergize to harm our systems.

Method: I employ deep Reinforcement Learning (RL) to develop intelligent agents that simulate how various threats can collaborate to harm the grid and assess the impacts of combined attacks on amplifying each other. This method allows us to understand how distinct attackers leverage available information to coordinate their efforts and execute attacks synergistically.

Approach: The research involves creating an RL environment that simulates an industrial control system, specifically an electric grid, under attacks by multiple threat actors. The environment is implemented using OpenAI Gym in Python. We utilized Stable Baselines to train a Proximal Policy Optimization (PPO) RL agent with a neural backbone developed in PyTorch, and the electric grid model was constructed using Andes.

Results: In the paper, we illustrate how combined attacks can amplify each other, presenting a case study of electricity flicker escalating to a blackout as more threat actors participate.