White Papers – Scalable Algorithms

On this page you will find PRACE White Papers related to Scalable Algorithms.

Title: Optimization of REDItools Package for investigating RNA Editing in Thousands of human deep sequencing Experiments

Authors: T. Flatia, S. Gioiosaa, G. Pesolea,c, T. Castrignanòb1, E. Picardia,c
a Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Bari, Italy
b SCAI, Cineca, Consorzio Interuniversitario di Supercalcolo, Roma, Italy
c Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari “A. Moro”, Bari, Italy

Abstract: RNA editing is a widespread post-transcriptional mechanism that alters primary RNA sequences through the insertion/ deletion or modification of specific nucleotides. In humans, RNA editing affects nuclear and cytoplasmic transcripts mainly by the deamination of adenosine (A) to inosine (I) through members of ADAR enzymes. A-to-I modifications increase transcriptome and proteome diversity, and contribute in modulating gene expression at RNA level. RNA editing by A-to-I change is prominent in non-coding regions containing Alu repetitive elements, whereas the list of ADAR substrates in protein coding genes is relatively small. RNA editing modifies several human neurotransmitter receptors and plays important roles in modulating their physiology. Indeed, its deregulation has been linked to a variety of human diseases, including neurological and neurodegenerative disorders, as well as cancer.
Current technologies for massive transcriptome sequencing, such as RNASeq, are providing accurate maps of transcriptional dynamics occurring in complex eukaryotic genomes, as the human one, and are facilitating the detection of post-transcriptional RNA editing modifications with unprecedented resolution. However, the computational detection of RNA editing events in RNAseq experiments is quite intensive, requiring the browsing of the human genome, position by position. To investigate RNA editing in very large cohort of RNAseq data, we have developed a novel algorithm called REDItools2.0. Here, we describe the core algorithm as well as optimization strategies used to efficiently analyze RNA editing in HPC systems.

Download paper: PDF

Title: Hybrid Iterative-Direct Solution Strategy for Improving the Scalability of Nuclear Waste Management Simulations

Authors: Emmanuel Agulloa, Luc Girauda, Matias Hastarana, Stéphane Lanterib, Laurent Lothc, Ludovic Moyab, Florent Pruvosta, Jean Romana and Olivier Rouchond
a Hiepacs project-team, Inria Bordeaux-Sud Ouest Research Center, Talence, France
b Nachos project-team, Inria Sophia Antipolis-Méditerranée, Sophia Antipolis, France
c ANDRA, Direction de la Recherche et du Développement, Chatenay-Malabry, France
d CINES, Montpellier, France

Abstract: We report on our work aiming at enabling large-scale simulations of nuclear waste management based on a simulation software combining a finite element discretization scheme formulated on a hybrid hexahedral-tetrahedral grid, and scalable sparse linear solvers. The enabling numerical tool is a domain decomposition solution strategy for the sparse system of linear equations resulting from the spatial discretization of the underlying PDEs (Partial Differential Equations). The PDEs at hand here are the Darcy’s equations. A concrete application is considered for assessing the benefits of using a domain decomposition sparse linear solution strategy as compared to the originally used approach, which is based on a more classical preconditioned iterative algorithm.

Download paper: PDF

Title: Optimising PICCANTE – an Open Source Particle-in-Cell Code
for Advanced Simulations on Tier-0 Systems

Authors: A. Sgattonia, L. Fedeliaa,b, S. Sinigardia,c, A. Marocchinod, A. Macchiaa,b,*, V. Weinberge, A. Karmakare
a CNR/INO Via Moruzzi 1, 56124 Pisa, Italy
b University of Pisa, Largo Pontecorvo 3, 56127 Pisa Italy
c University of Bologna, Dipartimento di Fisica e Astronomia, Via Irnerio 46, 40126, Bologna, Italy
d “La Sapienza” – Università di Roma, Italy
e Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities, 85748 Garching b. München, Germany

We present a detailed strong and weak scaling analysis of PICCANTE, an open source, massively parallel, fully-relativistic Particle-In-Cell (PIC) code. PIC codes are widely used in plasma physics and astrophysics to study the cases where kinetic effects are relevant. PICCANTE is primarily developed to study laser-plasma interaction.
Within a PRACE Preparatory Access Project, various revisions of different routines of the code have been analysed on the HPC systems JUQUEEN at Jülich Supercomputing Centre (JSC), Germany, and FERMI at CINECA, Italy, to improve scalability and I/O performance of the application. The diagnostic tool Scalasca is used to identify suboptimal routines. Different output strategies are discussed. The detailed strong and weak scaling behaviour of the improved code are presented in comparison with the original version of the code.

Download paper: PDF

Title: Large Scale Parallelized 3d Mesoscopic Simulations of the
Mechanical Response to Shear in Disordered Media

Authors: Luca Marradia,b, Dimitris Dellisc, Kirsten Martens*a,b
a Universite Grenoble Alpes, LIPHY, F-38000 Grenoble, France
b CNRS, LIPHY, F-38000 Grenoble, France
c Greek Research & Technology Network, Mesogeion 56, 11527, Athens, Greece

Abstract: In this paper we describe the development of a code that implements a coarse grained dynamics for the large scale modeleling of 3 dimensional athermal yielding and flow of disordered systems under externally applied steady shear. The stochastic lattice model for the heterogeneous flow response involves long range elastic interactions, that are resolved using fast Fourier techniques, implemented in parallel in an efficient and well scaling MPI algorithm.

Download paper: PDF

Title: Hybrid MIMD/SIMD High Order DGTD Solver for the Numerical Modeling of Light/Matter Interaction on the Nanoscale

Authors: S. Lanteriaa*, R. Légera, C. Scheid and J. Viquerata, T. Cabelb and G. Hautreuxb
a INRIA Sophia Antipolis – Méditerannée research center, France
b CINES, Montpellier, France

Abstract: This paper is concerned with the development of a scalable high order finite element type solver for the numerical modeling of light interaction with nanometer scale structures. From the mathematical modeling point of view, one has to deal with the differential system of Maxwell equations in the time domain, coupled to an appropriate differential model of the behavior of the underlying material (which can be a dielectric and/or a metal) at optical frequencies. For the numerical solution of the resulting system of differential equations, we have designed a high order DGTD (Discontinuous Galerkin Time-Domain) solver that has been adapted to hybrid MIMD/SIMD computing. Here we discuss about this later aspect and report on preliminary performance results on the Curie system of the PRACE research infrastructure.

Download paper: PDF

Title: Improving the Scalability of the Overlapping Fragments Method Code

Authors: Nenad Vukmirovića*, Petar Jovanovića, Marko Mladenovića
a Scientific Computing Laboratory, Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia

Abstract: The overlapping methods code for electronic structure calculations of large organic systems was benchmarked and profiled. Based on the profiling results, several performance and scalability bottlenecks caused by communication were identified. The code was then refactored to improve its performance with respect to these identified weak points. The modified code extends the maximal system size where good scaling is observed on Curie machine from 4,000 atoms to 16,000 atoms. On the other hand, we found that on Hermit machine the original code already exhibits rather good scaling and consequently the modified code leads to minor improvements only.

Download paper: PDF

Title: Scalability Improvement of the Projected Conjugate Gradient Method used in FETI Domain Decomposition Algorithms

Authors: Tomas Kozubeka, David Horaka, Vaclav Haplaa, Lubomir Rihaa
aIT4Innovations, VSB-Technical University of Ostrava (VSB-TUO)

Abstract: This report summarizes the results of the scalability improvements of the algorithms used in Total FETI TFETI). A performance evaluation of two new techniques is presented in this report: (1) a novel pipelined implementation of CG method in PETSc and (2) a MAGMA LU solver running on following many-cores accelerators: GPU Nvidia Tesla K20m and Intel MIC Xeon Phi 5110P.

Download paper: Download paper: PDF

Title: Scalable Parallel Nonlinear Parameter Optimization Algorithm with Parameter Pools

Authors: Ahmet Duranab* and Mehmet Tuncela,c
aIstanbul Technical University, National Center for High Performance Computing of Turkey (UHeM), Istanbul 34469, Turkey
bIstanbul Technical University,Department of Mathematics, Istanbul 34469, Turkey
cIstanbul Technical University, Informatics Institute, Istanbul 34469, Turkey

Abstract: In this project, we propose a new hybrid algorithm for parameter optimization and implement it using MPI. In particular, we study a scalable parallel nonlinear parameter optimization algorithm with parameter pools for a nonlinear dynamical system called the asset flow differential equations (AFDEs) in ?4. We generate time series pairs as proxy to market price and net asset value by using random walk simulation where the volatilities of the time series are similar to that of real closed-end funds traded on New York Stock Exchange (NYSE). When we apply the algorithm by using simulations for a set of time series, we observe that the computed optimal parameter values, average number of quasi-Newton iterations, the average nonlinear least squares errors, and the average maximum improvement factors can converge certain values within corresponding small ranges, after oscillations. Moreover, we tested for 64, 128, 256 and 512 cores using the 512 initial parameter vectors. We achieved speed-up for the time series to run up to 512 cores. The algorithm is applicable for parameter optimization of the related nonlinear dynamical system of differential equations with thousands of parameters as well.

Download paper: Download paper: PDF

Title: Massively parallel 3D Poisson equation solver formhybrid Intel Xeon – Xeon Phi HPC systems

Authors: Peicho Petkov, Damyan Grancharov, Stoyan Markov, Georgi Georgiev, Elena Lilkova, Nevena Ilieva, Leandar Litov
National Centre for Supercomputing Application, Bulgaria

Abstract: We have optimized an implementation of a massively parallel algorithm for solving the Poisson equation using a 27-stencil discretization scheme based on the stabilized biconjugate gradient method. The code is written in the C programming language with OpenMP parallelization. The main objective of this whitepaper lies in the optimization of the code for native Intel Xeon Phi execution, where we observe nearly linear scalability on the MIC architecture for the bigger problem sizes.

Download paper: Download paper: PDF

Title: Experiences with Parallel Multi-threaded Network Maximum Flow Algorithm

Authors: Seren Soner, Can Ozturan
Computer Engineering Department, Bogazici University, Istanbul, Turkey

Abstract: The problem of computing the maximum flow problem on capacitated networks arises in many application areas. In the area of heterogeneous computing, it arises in job or process scheduling when allocations of resources to jobs/processes need to be tuned. The maximum flow solver is difficult to parallelize. Highly optimized sequential version of maximum flow solvers such as those by Goldberg exists. This work describes how some of the concurrency problems are resolved in our existing Pmaxflow (Download paper: https:/)solver. Pmaxflow employs a parallel pre-flow push algorithm to compute the maximum flow. Results of various tests that compare Goldberg’s sequential solvers and Pmaxflow on a NUMA shared memory computer are presented.

Download paper: Download paper: PDF

Title: Optimized Collective Algorithm with Automated Runtime MPISub-group Creation

Authors: Chandan Basu, Johan Raber
National Supercomputer Center, Linköping University, SE-581 83 Linköping

Abstract: A network topology aware MPI all-to-all algorithm which was developed in earlier PRACE project is improved. The interface of the all-to-allroutine is made similar to MPI all-to-all routine. The routine is being tested for accuracy.
A new data classifier routine is developed to collect network characteristics data. The classifier works very well and in a scalable way whencategories are as well separated in feature space. A new data collection scheme is developed. The scalability of the data acquisition has beenpartly addressed but there are remaining issues to be solved.

Download paper: Download paper: PDF

Title: Solving Large Sparse Linear Systems using Asynchronous Multisplitting

Authors: Nick Brown, J. Mark Bull, Iain Bethune
EPCC, The University of Edinburgh, King’s Buildings, Mayfield Road, Edinburgh, EH9 3JZ, U.K

Abstract: Iterative methods for solving large sparse systems of linear equations are widely used in many HPC applications. Extreme scaling of these methods can be difficult, however, since global communication to form dot products is typically required at every iteration.
To try to overcome this limitation we propose a hybrid approach, where the matrix is partitioned into blocks. Within each block, we use a highly optimised (parallel) conventional solver, but we then couple the blocks together using block Jacobi or some other multisplitting technique that can be implemented in either a synchronous or an asynchronous fashion. This allows us to limit the block size to the point where the conventional iterative methods no longer scale, and to avoid global communication (and possibly synchronisation) across all processes.
Our block framework has been built to use PETSc, a popular scientific suite for solving sparse linear systems, as the synchronous intra-block solver, and we demonstrate results on up to 32768 cores of a Cray XE6 system. At this scale, the conventional solvers are still more efficient, though trends suggest that the hybrid approach may be beneficial at higher core counts.

Download paper: Download paper: PDF

Title: Structural Analysis of Large Sparse Matrices for Scalable Direct Solvers

Authors: Ahmet Durana,b, M. Serdar Celebia,c, Mehmet Tuncela,c, Figen Oztopraka,c
a Istanbul Technical University, National Center for High Performance Computing of Turkey (UHeM), Istanbul 34469, Turkey
bIstanbul Technical University,Department of Mathematics, Istanbul 34469, Turkey
cIstanbul Technical University, Informatics Institute, Istanbul 34469, Turkey

Abstract: It is significant to perform structural analysis of large sparse matrices in order to obtain scalable direct solvers. In this paper, we focus on spectral analysis of large sparse matrices. We believe that the approach for exception handling of challenging matrices via Gerschgorin circles and using tuned parameters is beneficial and practical to stabilize the performance of sparse direct solvers. Nearly defective matrices are among challenging matrices for the performance of solver. We observe that the usage of super-nodal storage parameters affects the number of fill-ins and memory usage accordingly.

Download paper: Download paper: PDF