Datacenter Automation: High-Performance Computing Monitoring and Management


Andrea Bartolini and Luca Benini

Presentation title

Datacenter Automation: High-Performance Computing Monitoring and Management

Authors

Andrea Bartolini and Luca Benini

Institution(s)

Alma Mater Studiorum - Università di Bologna, Italy

Presentation type

Technical presentation

Abstract

In the collective imagination Embedded Systems are at the opposite of High performance computing systems. The former is pervasive, ubiquitous, and inexpensive. The later is of the size of a warehouse, consumes megawatts of power and cost tens of millions of dollars. However, embedded systems play a central role in the evolution of today’s high-performance computing installations. Indeed, on the race toward exascale, high performance computing systems are facing important challenges which limits their efficiency. Among all, power and energy consumption fuelled by the end of Dennard's scaling start to show their impact on limiting supercomputers peak performance and cost effectiveness. In addition, the reliability and the security of the computing systems, HW components as well as SW components pose novel challenges on the management of the system at large. Embedded systems are at the heart of an energy and operational efficiency quantum leaps in this sector, where Internet of Things, Big Data and Edge Computing technologies are the key ingredients of upcoming datacentre automation solution. Already today, each compute node and processing element include several embedded systems which oversees the telemetry and power management. However, their architecture and composition is rudimental, and already with today’s datacenters scale they show their limits: built-in telemetry and control policies fail in capturing and react to the speeds at which computational phases happen as well as power management control policies are de facto disabled to avoid loosing performance. Being capable of collecting, correlate, optimize, and promptly react is still a key challenge. To face this issue a set of open standards and architectures are emerging trying to break the barriers brought by close and proprietary solutions. OpenPOWER as an example brings key innovations in the computing spectrum which can be leveraged to create innovative solutions solving the raising problems in power consumption control and real-time monitoring of the power and performance. D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), is an innovative High-Performance Computing cluster designed by E4 Computer Engineering in collaboration with the DEI department of the University of Bologna for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. on top of best-in-class HW components features custom embedded systems and an innovative system middleware software designed for fine grain power and performance monitoring and cluster power capping. We will present the role of embedded systems toward datacenter automation, detailing the results of designing extensions on the D.A.V.I.D.E. based on embedded systems targeting next generation green and self-aware High-Performance Computing Systems.


Additional material

  • Presentation slides: [pdf]

For more details on this presentation please click the button below: