White Papers – Power Efficiency

On this page you will find PRACE White Papers related to Power Efficiency.

Title: Investigating and Exploiting Application Dynamism For Energy-Efficient Exascale Computing

Authors: Venkatesh Kannana,*, Lubomír Ríhab, Michael Gerndtc, Anamika Chowdhuryc, Ondrej Vysockib, Martin Besedab, Horák Davidb, Radim Sojkab, Jakub Kruzikb, Michael Lysaghta
aIrish Centre for High-End Computing, Dublin, Ireland
b IT4Innovations, Ostrava, Czech Republic
c Institute für Informatik, Technical University of Munich, Germany

Abstract: READEX is a EU Horizon 2020 FET-HPC project whose objective is to exploit the dynamism found in high-performance computing applications at runtime to achieve efficient computation on Exascale systems. In this paper, we describe the use of the READEX methodology to investigate dynamic behaviour of PRACE-relevant applications at runtime and describe how such application dynamism can be exploited to tune a range of application-, system software- and hardware-level parameters for improved performance and energy efficiency on current and future European extreme-scale systems.

Download paper: PDF

Title: A System for Energy Measurement on Accelerators (SEMA)

Authors: S. Muralidharan, M. Lysaght*
*Irish Center for High-End Computing, Dublin, Ireland

Abstract: The fastest supercomputer in 2016 consumed over 17MW for a mere 33 Petaflops of performance. This makes energy efficiency a crucial obstacle to overcome in order to make Exascale computing a reality. Over 50% of the performance of this supercomputer is due to the presence of accelerators or coprocessors that are designed to do certain kind of mathematical operations in an energy efficient manner. Currently, accelerators such as NVIDIA GPGPUs and Intel Xeon Phi dominate this market. However, with the addition of FPGAs from Altera and Xilinx in HPC clusters, more options for accelerators are becoming available. Furthermore, several EU funded projects, including the Centre of Excellence and FETHPC projects such as ESCAPE, READEX, ESiWACE investigate HPC systems consisting of accelerators for Exascale-related applications. The dominance of such devices for energy efficient computing makes it crucial to understand the relationship between power consumption, performance and application code in order to exploit them in the most effective manner.
The measurement of power consumption is supported at different levels of accuracy by different vendors through their platforms. However, they lack any standardized metrics between them making it difficult to compare them directly. Further, the accuracy they support is at a coarse level making it impossible to use the measurements to profile the application code and to extract insights that help the programmer.
At the Irish Center for High-End Computing (ICHEC), we have developed a System for Energy Measurement on Accelerators (SEMA) that allows measurement of any accelerator or any number of them to an accuracy of milliwatt and a resolution of millisecond. SEMA works based on the standard current shunt-based power measurement technique. Such a methodology is not new and there has been prior work done within a lab environment. SEMA embarks to be different and we focus equally on usability as much as technical feasibility. To meet this endeavor, we have come up with set of novel ideas that abstract the technical details and provide the users with a very simple interface to measure energy and power. This interface can also be used to profile very short regions of code within the application. This allows extraction of insights into the performance and power consumption of different pieces of code at a level not possible before, thereby leading to improvements in understanding the code behavior and optimizations. These can also be fed back to the design of better energy efficient accelerator architectures in the future.
The current SEMA system is capable of performing power measurements on three different accelerators within a single system. However, we need to expand this system to work in a cluster environment in order to test large parallel applications. This requires improving the SEMA hardware with better integration into the host system and better programming infrastructure to manage the myriad of sensors. In the future, we hope such a system could be integrated as a standard interface in supercomputing clusters.

Download paper: PDF

Title: An Energy-centric Study of Conjugate Gradient Method

Authors: Konstantinos Nikas1, Dimitris Siakavaras1, Vasileios Karakasis1, Jan Christian Meyer2, and Lasse Natvig3
1Greek Research & Technology Network (GRNET), Greece
2High Performance Computing Section, IT Dept., NTNU, Norway
3Dept. of Computer and Information Science (IDI), NTNU, Norway

Abstract: This whitepaper focuses on the study of the conjugate gradient method and how storage formats for sparse matrices havea signi-cant impact on its performance and energy footprint. We perform the evaluation on a 32-core, NUMA platformthat provides energy measurements for the processors and the main memory. Our study reveals interesting aspects of theexecution of memory bound applications on state-of-the art multicore platforms which could be utilised by an automatictuning process towards a more energy efficient execution.

Download paper: PDF

Title: Implementation of an Energy-Aware OmpSs Task Scheduling Policy

Authors: Jan Christian Meyera, Thomas B. Martinsenb and Lasse Natvigb
a High Performance Computing Section, IT Dept., NTNU, Trondheim, NO-7491, Norway
b Dept. of Computer and Information Science (IDI), NTNU, Trondheim, NO-7491, Norway

Abstract: The OmpSs programming model supports task-based parallelism in a similar manner to OpenMP. This whitepaper explores the possibility of implementing an energy-aware scheduling policy in run-time component of the OmpSs programming model, to adapt task execution schedules for balancing energy efficiency with parallel performance. A high-level design description of a run-time scheduling plugin to achieve this is presented, as well as key results from studying its effectiveness with 4 performance metrics, using 17 application benchmarks. The results show that the approach can be leveraged to improve energy efficiency in scenarios where dynamic power accounts for a large component of total power consumption, to benefits that can be programmatically balanced with predicted performance loss.

Download paper: PDF

Title: Energy-efficient Sparse Matrix Auto-tuning with CSX

Authors: Jan Christian Meyera,* , Lasse Natvigb, Vasileios Karakasis, Dimitris Siakavaras, and Konstantinos Nikasc
aHigh Performance Computing Section, IT Dept., NTNU, Norway
b Dept. of Computer and Information Science (IDI), NTNU, Norway
c School of ECE, NTUA, Greece

Abstract: This whitepaper describes the programming techniques used to develop an auto-tuning compression scheme for sparse matrices with respect to accelerating matrix-vector multiplication and minimizing its energy footprint, as well as a method for extracting a power profile from a corresponding implementation of the conjugate gradient method. Using two example systems, we show how these techniques can be leveraged to automatically detect a non-trivial local optimum in the execution parameter space, suggesting that it is feasible to integrate the energy efficiency evaluation of the automatic adaptation with the automatic tuning process.

Download paper: PDF

Title: Power instrumentation of task-based applications using model-specific registers on the Sandy Bridge architecture

Authors: Jan Christian Meyera and Lasse Natvigb
a High Performance Computing Section, IT Dept., NTNU, Trondheim, NO-7491, Norway
b Dept. of Computer and Information Science (IDI), NTNU, Trondheim, NO-7491, Norway

Abstract: This whitepaper describes the technical side of a research work into the energy-efficiency tradeoffs of task-based execution with vectorization, through the application of recently available model-specific registers for counting energy use. It describes the mechanisms used to extract energy figures with respect to architectural and operating system concerns, and illustrates their utility in the process of collecting appropriate benchmark figures to examine such tradeoffs. A subset of obtained results is presented as an example, highlighting both the potential and limitations of the outlined measurement method.

Download paper: PDF