White Papers – Other

On this page you will find general categories PRACE White Papers.

Title: Performance Assessment of Pipelined Conjugate Gradient method in Alya

Authors: Pedro Ojeda-Maya, Jerry Erikssona, Guillaume Houzeauxband Ricard Borrellb*
a High Performance Computing Center North (HPC2N), MIT Huset, Umeå Universitet, 90187 Umeå, Sweden
b Barcelona Supercomputing Center, C/Jordi Girona 29, 08034-Barcelona, Spain

Abstract: Currently, one of the trending topics in High Performance Computing is related to exascale computing. Although the hardware is not yet available, the software community is working on developing and updating codes, which can efficiently use exascale architectures when they become available. Alya is one of the codes that are being developed towards exascale computing. It is part of the simulation packages of the Unified European Applications Benchmark Suite (UEABS) and Accelerators Benchmark Suite of PRACE and thus complies with the highest standards in HPC. Even though Alya has proven its scalability for up to hundreds of thousands of CPU-cores, there are some expensive routines that could affect its performance on exascale architectures. One of these routines is the conjugate gradient (CG) algorithm. CG is relevant because it is called at each time step in order to solve a linear system of equations. The bottleneck in CG is the large number of collective communications calls. In particular, the preconditioned CG (PCG) already implemented in Alya utilises two collective communications. In the present work, we developed and implemented a pipelined version of the PCG (PPCG) algorithm which allows us to half the number of collectives. Then, we took advantage of non-blocking MPI communications to reduce the waiting time during message exchange even further. The resulting implementation was analysed in detail by using Extrae/Paraver profiling tools. The PPCG implementation was tested by studying the flow around a 3D sphere. Several tests were performed using a different number of processes/workloads to attest the strong and weak scaling of the implemented algorithms. This work has been developed in the context of the preparatory access program of PRACE, simulations were run on the MareNostrum 4 (MN4) supercomputer at Barcelona Supercomputing Center (BSC).

Download paper: PDF

Title: Evaluation of Linux Container and full virtualization for HPC Applications in PRACE 5IP

Authors: A. Azaba*, G. Muscianisib, G. Wiberc, C. Fernandezd
a University of Oslo, Oslo, Norway
b CINECA -Interuniversity Consortium, Italy
c FrenchAlternative Energies and Atomic Energy Commission (CEA), France
d Fundación Pública Galega Centro Tecnolóxico de Supercomputación de Galicia (CESGA), Spain

Abstract: ​Linux Containers with the build-once run-anywhere principle have gained huge attention in the research community where portability and reproducibility are key concerns. Unlike virtual machines (VMs), containers run the underlying host OS kernel. The container filesystem can include all necessary non-default prerequisites to run the container application at unaltered performance. For that reason, containers are popular in HPC for use with parallel/MPI applications. Some use cases include also abstraction layers, e.g. MPI applications require matching of MPI version between the host and the container, and/or GPU applications require the underlying GPU drivers to be installed within the container filesystem. In short, containers can only abstract what is above the OS kernel, not below. Consequently, portability is not completely granted. Here we focus in PRACE-relevant HPC applications, including MPI and GPU applications, evaluated together with other collaborators from Europe and the USA. In addition to security and performance, PRACE virtualisation-service activity is working on solutions for the portability with templates and guidelines for building portable containers. Interesting and complementary to containers are fully-virtualised workloads running as VM jobs. Such solution is useful in cases where specific OS kernel/platform is required. The management of fully-virtualised workloads are also being considered and evaluated. Regarding security and performance, different container platforms (Docker, Singularity, and uDocker) have been evaluated in this white paper carried out under PRACE-5IP virtualisation service.​

Download paper: PDF

Title: The PRACE Data Analytics Service

Authors: Agnès Ansari a, Alberto Garcia Fernandeza, Bertrand Rigaudb,Marco Rorroc,Andreas Vroutsis d
a CNRS/IDRIS
b CNRS/CC-IN2P3
c CINECA

Abstract: This paper describes the work completed in the scope of PRACE 5IP/WP6 – Service 6 Data Analytics in Task 6.2: Design and Development of new Service prototypes.
Among the technical domains covered by the “Data analytics” term, we decided to focus on the current trends that show a growing interest in the community of data scientists: machine learning and deep learning techniques, which make use of automated algorithms, as they can offer faster dataset analysis than more conventional methods. Thus, we evaluated how these techniques can benefit from HPC environments with powerful CPUs and GPUs to manage the models’ complexity and accelerate the training, as well as to handle the increased amount of training data.
We describe the PRACE Data Analytics service that relies on a set of coherent components: frameworks, libraries, tools and additional features to support, facilitate and promote the data analytics activities in PRACE and help users in running their machine and deep learning tasks over the PRACE systems.
Then, we present the results we obtained for a set of deep learning benchmarks and real use cases we ran on the different PRACE architectures while using these components, that confirm their efficiency.

Download paper: PDF

Title: Scalable Delft3D Flexible Mesh for Efficient Modelling of Shallow Water and Transport Processes

Authors: M. Mogé1a, M. J. Russchera,b, A. Emersonc, M. Gensebergerb
a SURFsara, The Netherlands
b Deltares, The Netherlands
c CINECA, Italy

Abstract: D-Flow Flexible Mesh (“D-Flow FM”) [1] is the hydrodynamic module of the Delft3D Flexible Mesh Suite [2]. Since for typical, real-life applications there is a need to make D-Flow FM more efficient and scalable for high performance computing, we profiled and analysed D-Flow FM for representative test cases. In the current paper, we discuss the conclusions of our profiling and analysis. We observed that, for specific models, D-Flow FM can be used for parallel simulations using up to a few hundred cores with good efficiency. It was however observed that D-Flow FM is MPI bound when scaled up. Therefore, for further improvement, we investigated two optimisation strategies described below.
The parallelisation is based on mesh decomposition and the use of deep halo regions may lead to significant mesh imbalance. Therefore, we first investigated different partitioning and repartitioning strategies to improve the load balance and thus reduce the time spent waiting on MPI communications. We obtained small performance gains in some cases, but further investigations and broader changes in the numerical methods would be needed for this to be usable in a general case.
As a second option we tried to use a communication-hiding conjugate gradient method, PETSc’s linear solver KSPPIPECG, to solve the linear system arising from the spatial discretisation, but we were not able to get any performance improvement or to reproduce the speedup published by the authors. The performance of this method turns out to be very architecture and compiler dependent, which prevents its use in a more general-purpose code like D-Flow FM.​

Download paper: PDF

Title: Mini-Workshop on Preparing for PRACE Exascale systems 

Abstract: PRACE-5IP WP7 T7.2 organised a mini-workshop on Preparing for PRACE Exascale Systems on June 1, 2017 at the Forschungszentrum Jülich (FZJ), Germany. This paper discusses the objectives, talks, and outcomes of the workshop.​

Download paper: PDF

Title: Polyhedra. HPC Optimisation of SDLPS distributed Simulator

Abstract: An unambiguous formal description of a model is one of the main challenges in simulation. In the area of social simulation, model conceptualisation is usually performed verbally creating a potential for translation errors between the conceptual model and its codification in a programming language. Here we present a simulator based on the SDLP and optimised for HPC environment that facilitates this process. The two main advantages of using SDLP for model description are the ease of communication between the context specialist and the programmer thanks to the visual interface and the full and unambiguous codification of the model enabling its execution in any environment. By optimising the simulator for HPC environments, the Polyhedra project has opened a new range of possibilities for computationally intensive models typical for social science applications.

Download paper: PDF