Introduction to Machine learning (1/3)

Wrote by Wassim B., Cybersecurity Expert SQUAD

Nowadays, network Functions Virtualization (NFV) [1] and Cloud-native Network Functions (CNFs) offer new ways of designing, scaling and operating networks, as customer-centric services can be elastically scaled and easily orchestrated, not only at the core but also at the edge of the network.

In particular, the gradual shift from legacy physical network functions to Virtualized Network Functions (VNFs) running on VMs and Cloud-native Network Functions (CNFs) running as containerized microservices promises a wider choice of deployment environments (private or public cloud infrastructure) and more efficient use of computing resources (data center or multi-access edge computing). 

In the midst of the recent softwarisation of networks, existing security solutions remain ill-equipped [[2]] to deal with threats and misbehaviors, making virtualized networks highly vulnerable. While certain preventive technologies [[3]] have been designed to prevent malicious/unexpected activities from threatening networked systems from a hardware and software perspective, it becomes vital to monitor and analyze the events occurring in today's networks to look for potential signs of faults or security breaches. 

To this end, we introduce an anomaly detector that establishes the normal behavior of the networking system to identify any anomaly as a deviation between the given observations and the pre-set behavior. For this purpose, our anomaly monitors some key indicators that reflect the state and behavior of NFV/CNF environment.

While the vast majority of anomaly detectors are based on machine learning techniques [[4]], we leverage a statistical model called autoregressive integrated moving average (ARIMA) and its variants [[5]] to model and predict the behavior of the virtualized networking system.

The advantage is threefold. Instead, deep learning methods, e.g. Recurrent Neural Networks (RNNs) [1] and Long Short-Term Memory (LSTM) [[6]] involve a heavy training, tuning and optimizing of the many network parameters.

As mentioned, ARIMA and its variants require little training and thereby the availability of small-size datasets. In particular, ARIMA establishes and forecasts the behavior of the virtualized environment based on auto regression with some parameters fitted to the model; then it applies a moving average with a set of parameters.

Second, anomaly detection is resource consuming because it entails analyzing the behavior of an NFV/VNF as a whole, on the basis of multiple indicators (such as CPU load, network memory usage, packet loss, traffic load collected as time series to name a few). In order to support the lightweight monitoring of multiple indicators, we leverage a variant of ARIMA called Vector Autoregressive Integrated Moving Average (VARMA), which deals with multivariate time series. Following, we complement our anomaly detector with a Security Information and Event Management (SIEM) system [5] that orchestrates the collection, correlation, and analysis of real-time events across heterogeneous sources and is used to alert network administrators and react by e.g., rerouting traffic through an alternative link.

In order to evaluate the performances associated with our anomaly detector, we further set up a testbed in which a virtualized IP Multimedia Service (vIMS) [[7],[8]] running on Kubernetes containers supports voice and video calls with high sound quality and minimal delay.  

  • [1] Donovan, J., & Prabhu, K. (Eds.). (2017). Building the network of the future: Getting smarter, faster, and more flexible with a software centric approach. CRC Press.
  • [2] NFV White Paper, (2012), Network Functions Virtualisation–Introductory White
  • [3] Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
  • [4] Agrawal, S., & Agrawal, J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, 708-713.
  • Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407.
  • [5] Brockwell, P. J., Brockwell, P. J., Davis, R. A., & Davis, R. A. (2016). Introduction to time series and forecasting. springer.
  • [6] Roh, Y., Heo, G., & Whang, S. E. (2019). A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering.
  • [7] Openimscore
  • [8] What is Kubernetes ?