|Published||April 7, 2023|
About the research centre or Inria department
The Inria Centre at Rennes University is one of Inria's eight centres and has more than thirty research teams. The Inria Centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
Within the framework of the ANR PEPR project "Superviz".
Adversarial attacks against machine learning (ML) systems aim to manipulate training and/or test examples to cause misclassification of ML models. Widely observed in computer vision and natural language process research, slightly changing the pixel values of an input image or words of an input corpus may drastically mislead the classification output from a machine learning model. It is also well known as evasion attack . Similarly, injecting noise to the training data set of a target machine learning model can also cause misclasification over the attacker-desired classes like in poisoning  and backdoor attacks . Demonstrated in previous works, such adversarial risks  can bring signfiicant harm to the trustworthiness of machine learning systems. With the wide deployment of machine learning techniques in intrusion detection and classification, the vulnerability of machine learning systems becomes a critical factor that impacts the utility and reliability of the machine learning-based security practices. It is hence a must to assess the adversarial risk of machine learning in security data analysis and discuss how to harden machine learning-based detection systems.
The objective of this thesis
In this research, we focus on assessing the risk/impact of ML-based detection algorithms in the presence of adversarial threats and designing and implementing potential mitigation solutions to defend against these attacks and strengthen ML-based detection applications.
Our study explores the following research problems:
- We first define realistic adversarial threat models against ML-based network intrusion detection systems. In particular, we focus on input perturbation at test time (evasion attack) and data perturbation at learning time (data poisoning attack). Unlike adversarial attacks on images, network and system security log disruption aims to evade detection methods while ensuring successful intrusions. We need to consider domain constraints on data disruption patterns, such as which behavioral attributes of an intrusion incident can (or cannot) be modified/deleted. In addition, the input data of intrusion detection systems typically contains unstructured categorical attributes, e.g., short textual descriptions of security events. How to perturb these discrete attributes is inherently a combinatorial problem and remains open in adverse learning research.
- We study the identification of key factors that determine the risk of weakness in the face of an adversarial attack of an ML-driven intrusion detection model. We start by examining classical ML models, such as support vector machines and decision trees, and then extend our scope to more advanced models based on Deep Neural Networks (DNNs), especially in approaches that take into account the temporal dimension of attacks (LSTMs, Transformers). We seek to answer the following questions 1) What properties (smoothness, model complexity, ability to generalize out of distribution, etc.) of the detection model are responsible for the adversarial vulnerability of the ML-based detector? 2) Is the adversarial vulnerability also associated with the security data used to train the detection model (feature sensitivity, feature redundancy, data sparsity, etc.)? How can quantified measures be defined to reflect the level of adverse vulnerability of the ML-based detector?
- Our goal is to propose defense mechanisms based on the identified risk factors to enhance the robustness by construction of ML-based detection models. For example, differential privacy has been shown to be an effective tool to improve the robustness of DNN-based classifiers. In addition, we investigate how to establish health checking methods to identify potentially poisoned training and test inputs in ML-based intrusion detection services.
We propose to evaluate our approaches using publicly available network intrusion datasets collected from real devices. Two examples of data sources are CIC-IDS-2018  and DAPT2020 . The former provides a large-scale labeled dataset containing network traffic of normal and intrusion behaviors. This dataset provides rich descriptions of network traffic profiles, e.g., pcap files, to facilitate the description of intrusion incidents. The second gives a collection of network traffic simulating Advanced Persistent Attacks (APT). This is a good test bed for evaluating how ML-based intrusion detection models perform against commonly deployed APT attack techniques.
The candidate for this thesis is expected to have accomplished courses on Machine Learning and/or have experience of implementing Machine Learning algorithms using Python for practical data mining problems. Especially, expertise in using Pytorch will be required in the project. Theoretical developments are also expected based on statistics and theory of machine learning and approximation. Knowledge about intrusion detection systems will be preferred.
 M. A. Ayub, W. A. Johnson, D. A. Talbert and A. Siraj, "Model Evasion Attack on Intrusion Detection Systems using Adversarial Machine Learning," 2020 54th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 2020, pp. 1-6, doi: 10.1109/CISS48834.2020.1570617116.
 B. Biggio, I.Corona, D.Maiorca, B.Nelson, N.Srndic, P.Laskov, G.Giacinto and F.Roli, "Evasion Attacks against Machine Learning at Testing Time", Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013.
 Octavian Suciu, Radu Marginean, Yigitcan Kaya, Hal Daume Iii, and Tudor Dumitras. When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1299–1316, 2018. ISBN 978-1-939133-04-5.
 Manoj, Naren and Avrim Blum. “Excess Capacity and Backdoor Poisoning.” Neural Information Processing Systems (2021).
 CIC-IDS 2018 dataset: https://www.unb.ca/cic/datasets/ids-2018.html
 DAPT-2020 dataset: https://sailik1991.github.io/files/DAPT_at_MLHat2020.pdf
This thesis will be conducted at INRIA Rennes and co-supervised with the AI researchers of CEA List team in Paris. The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc. The monthly gross salary for the PhD candidate amounts around 2000 euros. For every applicant, please submit online your resume, cover letter and letters of recommendation.
Technical skills and level required : Machine Learning, Statistics, Information theory, Pytorch, Intrusion Detection
Languages : English
- Subsidized meals
- Partial reimbursement of public transport costs
- Possibility of teleworking (90 days per year) and flexible organization of working hours
- Partial payment of insurance costs
Monthly gross salary amounting to 2051 euros for the first and second years and 2158 euros for the third year