White Papers – Performance Prediction

On this page you will find PRACE White Papers related to Performance Prediciton.

Title: Profiling and Tracing Tools for Performance Analysis of Large Scale Applications

Authors: Jerry Erikssonb, Pedro Ojeda-Mayb, Thomas Ponweisera,*, Thomas Steinreitera
a RISC Software GmbH, Softwarepark 35, 4232 Hagenberg, Austria
b High Performance Computing Center North (HPC2N), MIT Huset, Umeå Universitet, 901 87 Umeå, Sweden

Abstract: The usage of modern profiling and tracing tools is vital for understanding program behaviour, performance bottlenecks and optimisation potentials in HPC applications. Despite their obvious benefits, such tools are still not that widely adopted within the HPC user community. The two main reasons for this are firstly unawareness and secondly the sometimes inhibitive complexity of getting started with these tools. In this work we aim to address this issue by presenting and comparing the capabilities of four different performance analysis tools, which are 1) HPCToolkit, 2) Extrae and Paraver, 3) SCALASCA and 4) the Intel Trace Analyzer and Collector (ITAC). The practical usage of these tools is demonstrated based on case studies on the widely used molecular dynamics simulation code GROMACS.

Download paper: Download paper: PDF

Title: Using GPU Accelerators for improving Performance and Scalability in Material Physics Simulations

Authors: M. Hruszowieca, P. Potaszb, A. Szymańska-Kwiecieńa, M. Uchrońskia
a Wroclaw Centre of Networking and Supercomputing (WCSS), Wroclaw University of Science and Technology
b Department of Theorethical Physics, Wroclaw University of Science and Technology

Abstract: This work will be focused on parallel simulation of electron-electron interactions in materials with non-trivial topological order (i.e. Chern insulators). A problem of electron-electron interaction systems can be solved by diagonalizing a many-body Hamiltonian matrix in a basis of configurations of electrons distributed among possible single particle energy levels – a configuration interaction method. The number of possible configurations exponentially increases with a number of electrons and energy levels; 6 electrons occupying 24 energy levels corresponds to the dimension of Hilbert space about 105, for 12 electrons it gives 106 configurations. Solving such a problem requires effective computational methods and highly efficient optimization of the source code. The project will focus on many-body effects related to strongly interacting electrons on flat bands with non-trivial topology. Such systems are expected to be useful in study and understanding of new topological phases of matter, and in a further future can be used to design novel nanomaterials. GPU accelerators will be used for improving performance and scalability in parallel simulation of electron-electron interaction in materials with a non-trivial topological order.

Download paper: Download paper: PDF

Title: Performance Improvement in Kernels by Guiding Compiler Auto-Vectorization Heuristics

Authors: William Killiana, Renato Micelia,*, EunJung Parka, Marco Alvarez Vegaa, John Cavazosaa
a University of Delaware, USA
a Irish Centre for High-End Computing (ICHEC), Ireland
a Universite de Rennes, France

Abstract: Vectorization support in hardware continues to expand and grow as we still continue on superscalar architectures. Unfortunately, compilers are not always able to generate optimal code for the hardware; detecting and generating vectorized code is extremely complex. Programmers can use a number of tools to aid in development and tunin, but most of these tools require expert or domain-specific knowledge to use. In this work we aim to provide techniques for determining the best way to optimize certain codes, with an end goal of guiding the compiler into generating optimized code without requiring expert knowledge from the developer.

Download paper: Download paper: PDF

Title: Auto-tuning 2D Stencil Applications on Multi-core Parallel Machines

Authors: Zhengxiong Houa, Christian Perez
INRIA, LIP, ENS-Lyon, France

Abstract: On multi-core clusters or supercomputers, how to get good performance when running high performance computing (HPC)applications is a main concern. In this report, performance oriented auto-tuning strategies and experimental results are presentedfor stencil HPC applications on multi-core parallel machines. A typical 2D Jacobi benchmark is chosen as the experimentalstencil application. The main tuning strategies include data partitioning within a multi-core node, number of threads within amulti-core node, data partitioning for a number of nodes, number of nodes in a multi-core cluster system. The results of theexperiments are based on multi-core parallel machines from PRACE or Grid’5000, such as Curie, and Stremi cluster.

Download paper: Download paper: PDF