REFIT

MONITORING & RECONFIGURATION

Lead Partner: ATOS

Contributors: ATOS, HPE, IBM, JADS, ADPT, POLIMI

 

Pain / Gain

Data
  • Problem: Monitoring data management coming from different sources is not an easy task. In particular, when identifying a network issue, it is hard to pinpoint the application that is causing the problem. Additionally, it is difficult to do an accurate selection of an appropriate set of deployment options for a given context (workload, user location) from many possible deployment options by balancing different non-functional requirements such as performance, cost, thermal energy, security risks, and privacy risks. We also address the problem of the dynamic discovery and use of deployment options as new options become available/unavailable/changed.
  • Solution: The SODALITE monitoring technology collects, stores and aggregates data to simplify the access to it. For network monitoring, we use per-connection network metrics instead of just per network interface. 
  • Value: Unified monitoring data collection and presentation to simplify the monitoring process on heterogeneous infrastructures and can ease decision-making processes. Using Skydive can provide per-connection details for finer-grained analysis. Dynamic monitoring, runtime resource discovery and autonomous refactoring of application deployments will change the way we manage heterogeneous (multi-cloud and HPC) environments.

Product

REFIT
  • Functionality: The monitoring layer collects various application-level and infrastructure-level metrics and events, and sends the collected metrics to the reconfiguration and refactoring layer. The network monitoring extracts flows from the Skydive Analyzer (via API), processes them and sends the results upstream. On the other hand, the deployment refactoring capability can detect the violations in the application performance objectives/ SLAs based on runtime monitoring data, and refactor the current deployment of an application to maintain or improve performance objectives. New resources and deployment options can be dynamically discovered and used. The deployment refactoring can also identify and migrate the security and privacy vulnerabilities in a dynamic application deployment. 
  • Technology: The general monitoring is using a Prometheus exporter that provides monitoring metrics at the level of the Light-weight Runtime Environment (LRE). The network monitoring is based on the Skydive technology and its Flow Exporter providing a framework for building pipelines. The deployment refactoring functionality includes rule-based and machine-learning-based approaches to refactoring the deployment model of an application at runtime.
  • Status: SODALITE monitoring and reconfiguration provide a single entry-point for dynamic monitoring of runtime data in multi-cloud (including OpenStack) and HPC clusters (managed PBS Pro and SLURM schedulers) with an alerting mechanism that can be used to trigger the refactoring of deployments on several infrastructures for ensuring the highest QoS level. During Y3 the work will be focused on developing further improvements on the metrics gathered, monitored data visualization and some minor technical aspects with regards to the reconfiguration and integration of the different components.

What's Unique

Data monitoring
  • Differentiator: Integrate Skydive monitoring with Prometheus, showing the traffic of the network with Prometheus, Network protocol and topology SW (open-source Skydive) as part of data-driven feedback loop. Regarding deployment refactoring, the differentiator is strongly related to the high innovation potential of the developments in what refers to deployment configuration selection, dynamic discovery of refactoring options, and performance anti-pattern detection and correction in application deployments.
  • Innovation: Use network monitoring information as part of the decision process to re-deploy applications, along with the automated deployment and tracking for cloud exporters, and the component is responsible for managing node resources. Moreover, a deployment configuration selection methodology based on benchmarking, runtime monitoring, machine learning, and software product line techniques is put in place. 
  • Partnerships: This layer has engaged the partnership of big industrial partners leading monitoring - ATOS and IBM - in collaboration with strong research institutions like JADS, leading the deployment refactoring.

Infographics