Loading Events

« All Events

  • This event has passed.


20 May 2014 - 22 May 2014


20 May 2014
22 May 2014
Event Category:
PRACE Scientific and Industrial Conference 2014 HPC for Innovation: When Science meets Industry In 2014, PRACE organised its first Scientific and Industrial Conference – the first edition of the PRACE days – under the motto HPC for Innovation – when Science meets Industry. The conference combined the previously separate PRACE Scientific Conferences and PRACE Industrial Seminars and brought together experts from academia and industry who presented their advancements in HPC-supported science and engineering.

The conference programme, consisting of keynote speeches, parallel sessions and a poster session, as well a meeting of the PRACE User Forum is posted here. A satellite event was also organised on Monday 19 and Tuesday 20 May, entitlted “Workshop on exascale and PRACE Prototypes”.

The PRACE Scientific and Industrial Awards as well as a prize for Best Poster were also presented.


PRACEdays14 Poster Session



PRACEdays14 Official Schedule



Plenary Session (1)

This session took place Tuesday 20 May 2014.

Title: In silico exploration of the most extreme scenarios in astrophysics and in the laboratory: From gamma ray bursters to ultra intense lasers


  • Luís O. Silva, Instituto Superior Técnico, Lisbon, Portugal

I will describe how massively parallel simulations are advancing our understanding of extreme scenarios where ultra intense flows of particles and light, in the laboratory and in astrophysics, combined with nonlinear relativistic effects define the complex evolution of the system. After presenting the algorithms describing the collective dynamics of charged particles in intense fields that allows for the use of the largest supercomputers in the World, I will cover recent progresses in relativistic shocks and cosmic ray acceleration in extreme astrophysical events, advanced plasma based accelerators for intense x-ray sources, and novel ion acceleration mechanisms for cancer therapy and fusion energy. I will show how petaflop scale simulations, combined with unique astronomical observatories and the emergence of multi PetaWatt laser systems, are triggering and opening novel exciting opportunities for innovation and new avenues for scientific discovery.


Title: ETP4HPC – European Technology Platform for HPC


  • Jean-François Lavignon, ETP4HPC

ETP4HPC, the European Technology Platform (ETP) for High-Performance Computing (HPC) (www.etp4hpc.eu) is an organisation led by European HPC Technology providers with an objective to build a competitive HPC value chain in Europe. ETP4HPC also included HPC research centres and end-users. It has issues a Strategic Research Agenda (SRA) which outlines the research priorities of European HPC on its way to achieve Exascale capabilities within the Horizon 2020 Programme. ETP4HPC is also one of the partners of the Contractual Public-Partnership (cPPP) for HPC (together with the European Commission) the aim of which is building a competitive HPC Eco-system in Europe based on the provision of Technologies, Infrastructure and Applications.

ETP4HPC intends to play a key role in the coordination of the European HPC Eco-system. Our intention is to form a project team that will respond to the Commission’s FETHPC-2 -2014(Part A) Call on that topic.

The objective of this parallel session it to:

  • Outline the assumptions and suggestions of the SRA
  • Explain the concept of the cPPP and how it will affect the European HPC arena
  • Discuss the preparations for the Coordination of the HPC strategy call as above.


Title: PRACE and HPC Centers of Excellence working in synergy


  • Sergi Girona, PRACE
  • Leonardo Flores Añover, European Commission
  • Jean-François Lavignon, ETP4HPC

Under the Work Programme 2014 – 2015 of the new Horizon 2020 EU Research and Innovation programme, the European Commission launched Call EINFRA 5-2015, entitled “Centers of Excellence for Computing Applications”.

This Call invites the establishment of a limited number of Centers of Excellence (CoE) to ensure EU competitiveness in the application of HPC for addressing scientific, industrial or societal challenges. PRACE will co-operate with the HPC CoE, finding synergies in the efforts of both parties, including the identification of suitable applications for co-design initiatives relevant to the development of HPC technologies.

This session will present and explain Call EINFRA-5-2015 and open the floor to participants to identify and bring forward the services and possible synergies required.



Plenary Session (2)

This session took place Wednesday 21 May 2014 – 09:00 to 12:30.

Title: Building an Ecosystem to Accelerate Data-Driven Innovation


  • Francine Berman, Rensselaer Polytechnic Institute, United States

Digital data has transformed the world as we know it, creating a paradigm shift from information-poor to information- rich that impacts nearly every area of modern life. Nowhere is this more apparent than in the research community. Today, digital data from high performance computers, scientific instruments, sensors, audio and video, social network communications and many other sources are driving our ability to discover, innovate, and understand the world around us.

In order to best utilize this data, an ecosystem of technical, social and human infrastructure is needed to support digital research data now and in the future. In this talk, we discuss the opportunities and challenges for the stewardship and support of the digital data needed to drive research and innovation in today’s world.


Title: Drive safe, green and smart: HPC-Applications for sustainable mobility


  • Alexander F. Walser, Automotive Simulation Centre Stuttgart, Germany

The automotive industry is facing the challenge of sustainable mobility. This is a demanding task characterized by fulfilling legal safety requirements globally increasing, improving fuel economy, reducing CO2, noise emissions and pollutants just as increasing consumer demands. In recent years numerical simulation made its way in the design phase of automotive development and production as a useful tool for faster problem analysis and reduction of cost and product development time. High Performance Computing (HPC) is significant in the automotive industry for competitiveness and innovation. HPC is used in areas where high-performance computing power is needed to solve computationally intensive problems e.g. computational fluid dynamics (external aerodynamics, coolant flow or in-cylinder combustion) and dynamic finite element analysis (crashworthiness and occupant safety simulation). New aspects such as cloud computing or big and smart data will increase the research and innovation challenges of HPC for the automotive industry. Optimizing pro cess chains, closing methodical gaps and increasing forecast quality, the cooperation between science and industry through sustainable partnerships in the industrial pre-competitive collaborative research is needed. Pioneering cooperation between science and industry, the Automotive Simulation Center Stuttgart – asc(s – was founded in 2008. The asc(s business model is based on the Competence Network principle. With its 23 members (OEMs, ISVs, IHVs, research facilities and natural members) the asc(s is a transfer platform setting trends for the interaction of science and industry in Europe. The asc(s offers an environment to develop new software applications, scalable algorithms and tools to make HPC systems easy-to-use and to make researchers highly innovative and productive. Linking specific practical projects with the numerical basic research ensures a rapid economic availability of research results with high quality and provides new impulses for product development.


Title: European HPC strategy


  • Augusto Burgueño Arjona, European Commission, Belgium

With its communication on HPC of February 2012, the Commission committed to an ambitious action plan for European leadership in HPC. In May 2013, the Council invited the Commission to develop and elaborate its plans for HPC and to explore all possible support for academic and industrial research and innovation under

Horizon 2020. Since then, the first calls of Horizon 2020 have been launched and the HPC Public-Private

Partnership with ETP4HPC has been formally launched. There is however much more work ahead of us. In my presentation I will delineate the expected contributions of all HPC stakeholders to make Europe Union’s vision on HPC a reality.


Computer Science

This session took place Wednesday 21 May 2014 – 13:30 to 15:30.

Title: Large Scale Graph Analytics Pipeline


  • Cristiano Malossi, IBM Research – Zurich, Rüschlikon, Switzerland


  • Yves Ineichen, IBM Research – Zurich, Rüschlikon, Switzerland
  • Costas Bekas, IBM Research – Zurich, Rüschlikon, Switzerland
  • Alessandro Curioni, IBM Research – Zurich, Rüschlikon, Switzerland

In recent years, graph analytics has become one of the most important and ubiquitous tools for a wide variety of research areas and applications. Indeed, modern applications such as ad hoc wireless telecommunication networks, or social networks, have dramatically increased the number of nodes of the involved graphs, which now routinely range in the tens of millions and out-reaching to the billions in notable cases.

We developed novel near linear (O(N)) methods for sparse graphs with N nodes estimating:

  • The most important nodes in a graph, the subgraph centralities, and
  • Spectrograms, that is the density of eigenvalues of the adjacency matrix of the graph in a certainunit of space.

The method to compute subgraph centralities employs stochastic estimation and Krylov subspace techniques to drastically reduce the complexity which, using standard methods, is typically O(N3). This technique allows to approximate centralities fast, highly scalable and accurately, and thereby opens the way for cen trality based big data graph analytics that would have been nearly impossible with standard techniques. This can be employed to identify possible bottlenecks, for example in the European street network with 51 million nodes in only a couple of minutes on only 16 threads.

Spectrograms are powerful in capturing the essential structure of graphs and provide a natural and human readable (low dimensional) representation for comparison. How about comparing graphs that are almost similar? Of course, this is a massive dimensionality reduction, however at the same time the shape of the spectrogram yields a tremendous wealth of information.

In order to tackle arising big data challenges an efficient utilization of available HPC resources is key. Both developed methods exhibit an efficient parallelization on multiple hierarchical levels. For example, computing the spectrogram can be parallelized on three levels: bins and matrix-vector products can be computed independently, and the each matrix-vector product can be computed in parallel. The combination of a highly scalable implementation and algorithmic improvements enable us to tackle big data analytics problems that are nearly impossible to solve with standard techniques.

A broad spectrum of applications in industrial and societal challenges can profit from fast graph analytics, for example routing and explorative visualization. We continuously focus our efforts to extend the coverage of our massively parallel graph analytics software stack to a variety of application domains in science and industry.


Title: Big models simulations and optimization through HPC. An effective way of improving performances for cloud targeted services Speaker
  • Gino Perna, Enginsoft, Italy
  • Alberto Bassanese, Enginsoft, Italy
  • Stefano Odorizzi, Enginsoft, Italy
  • Carlo Janna, M3E, Italy
Abstract Woven fabric composites have been object of several researches investigating their mechanical properties since their introduction in the aeronautic and industrial applications more than twenty years ago: their good conformability makes them the material of choice for complex geometries. Fatigue problems are very complicated because fibers are bundled in yarns that are interlaced to form a specific pattern so the complex geometry of the fabric architecture strongly affects which one of the constituents fails first and the way a local failure propagates up to cause the final failure of the entire lamina. By dealing with the problem just in terms of mean (macro) stresses at laminate level as if the material was homogeneous anisotropic, it is not possible toembrace the stress concentrations and the intra-laminar shear stresses within each component. Multi-Scale analysis approaches are therefore the obvious way to link macroscopic and microscopic structural behaviours of composite materials. However, numerous are the parameters controlling the final composite mechanical properties. These parameters are typically the fiber architecture and the volume fraction, the mechanical properties of the fiber, the matrix and the fiber-matrix interface. FEA and continuously enhanced hardware performances, nowadays hardware’s multi-core architectures havebeen offering a convenient solution to the problem of modelling by accounting for their inherently multi-scalestructural nature to the point that Virtual Prototyping can nowadays almost replace some of the physical tests required for the mechanical characterization of different material systems. To solve the problem and perform optimization of the whole structure a great number of computational cores are required but one of the main obstacles are performances in mechanical analysis, that should be removed to try to perform at the same level as CFD codes. New conjugate gradient techniques are very promising in those scenarios to cut down considerably computational time thus leaving space for more analyses and optimization studies to maximizeperformances and design better and safe products. PRESENTATION PDF DOWNLOAD

Title: Mont-Blanc – Engaging Industry in low-energy HPC technology design process


  • Alex Ramirez, Barcelona Supercomputing Center, Spain


  • Marcin Ostasz, Barcelona Supercomputing Center, Spain

The aim of the Mont-Blanc project has been to design a new type of computer architecture capable of setting future global High-Performance Computing (HPC) standards, built from energy efficient solutions used in embedded and mobile devices. This will help address the Grand Challenge of energy consumption and environment protection, as well as potentially help Europe achieve leadership in world-class HPC technologies and satisfy the European industry’s need for low-power HPC.

The project has been in operation since Oct 2011. The European Commission has recently granted an additional 8 million Euro to extend the project activities until 2016. This will enable further development of the OmpSs parallel programming model to automatically exploit multiple cluster nodes, transparent application check pointing for fault tolerance, support for ARMv8 64-bit processors, and the initial design of the Mont-Blanc Exascale architecture. Several new partners have joined this second phase of Mont-Blanc, including Allinea, STMicroelectronics, INRIA, University of Bristol, and University of Stuttgart.

Mont-Blanc are looking for members of the European HPC industrial user eco-system to join our Industrial End-User Group (IUG). As the project produces novel HPC technologies and solutions (i.e. low-energy HPC), it will request the members of the IUG to validate these products and provide feedback to the projects in order to align its objectives, deliverables and address issues such as end-user compatibility. An Industrial End-User Group coordinator has been appointed to coordinate this process. The IUG will consist of representatives of various industries, including, but not limited to Automotive, Energy, Oil/Gas, Aerospace, Pharma, and Financial.

The objective of this session is to:

  • Familiarise the audience with the IUG: membership rules and obligations,
  • Explain the processes of testing the Mont-Blanc technology,
  • Share the latest project results,
  • Instigate other industrial organisations to join or work closely with the IUG, and Collect feedback and suggestions in relation to the IUG.

The session will have two parts:

  • Technical – explaining the project, its achievements and the latest results available as above,
  • Moderated discussion on the current and future work of the IUG.


Life Sciences

This session took place Wednesday 21 May 2014 – 16:00 to 17:20.

Title: Numerical Simulation of sniff in the respiratory system


  • Hadrien Calmet, Barcelona Supercomuting Center, Spain

Direct numerical simulation (DNS) in the human nose-throat is a great challenge. As far as the author knows, this is the first time that DNS is carry out in all the respiratory system. This massive simulation is very useful to obtain a high level of detail in all the human nose-throat. The flow structure, the turbulence or the power spectrum could be post-processed anywhere along the human conduct. Is the guarantee also, that the inflow along the airway will be realistic. Simplified boundary conditions are not necessary.

Here a subject-specific model of the domain that extends from the face to the third branch generation in the lung is used to carry out the simulation. This model is coming from an extraction of Computed Tomography (CT). The inlet boundary condition is a profile on time of the flow rate during sniff (peak at 30l/min), it is modeled with statistic analysis of a few patients.

When two unstructured meshes with finely resolved boundary layers are used, there are 44 millions and 350 millions of elements. The second is the result of a first using an parallel algorithm to produce an uniform mesh multiplication, resulting finer mesh. The second mesh is used to detail the turbulence analysis and ensure sufficient resolution of the first. Due to a lighter data analysis, the first mesh is generally used for the description of the flow.

The complex flow forces to analyse each part of the large airways separately and is tending to explain the main characteristics and main features of each region. The time scale is different in the nose and in the throat, the physic also is different. In addition a large number of turbulence statistics are computed and the main feature of the flow for each region is performed with the power spectra in few set points of each region and compared with the two different meshes.


Title: Large scale DFT simulation of a mesoporous silica based drug delivery system


  • Massimo Delle Piane, University of Torino, Department of Chemistry and NIS (Nanostructured Interfaces and Surfaces) Centre, Torino, Italy


  • Marta Corno, University of Torino, Department of Chemistry and NIS (Nanostructured Interfaces and Surfaces) Centre, Torino, Italy
  • Alfonso Pedone, University of Modena and Reggio Emilia, Department of Chemistry, Modena, Italy
  • Piero Ugliengo, University of Torino, Department of Chemistry and NIS (Nanostructured Interfaces and Surfaces) Centre, Torino, Italy

Mesoporous materials are characterized by an ordered pore network with high homogeneity in size and very high pore volume and surface area. Among silica-based mesoporous materials, MCM-41 is one of the most studied since it was proposed as a drug delivery system. Notwithstanding the relevance of this topic, the at omistic details about the specific interactions between the surfaces of the above materials and drugs and the energetic of adsorption are almost unknown.

We resort to a computational ab-initio approach, based on periodic Density Functional Theory (DFT), to simulate the features of the MCM-41 mesoporous silica material with respect to adsorption of ibuprofen, starting from our previous models of a silica-drug system. We sampled the potential energy surface of the drug-silica system by docking the drug on different spots on the pore walls of a realistic MCM model. The drug loading was then gradually increased resulting in an almost complete surface coverage. Furthermore, we performed ab-initio molecular dynamics simulations to check the stability of the interaction and to investigate the drug mobility.

Through our simulations we demonstrated that ibuprofen adsorption seems to follow a quasi-Langmuirian model. Particularly, we revealed that dispersion (vdW) interactions play a crucial role in dictating the features of this drug/silica system. Finally, simulations of IR and NMR spectra provided useful information to interpret ambiguous experimental data.

Simulations of this size (up to almost 900-1000 atoms), at this accurate (and onerous) level of theory, were possible only thanks to the computational resources made available by the PRACE initiative. We have demonstrated that the evolution of HPC architectures and the continuous advancement in the development of more efficient computational chemistry codes have been able to take the Density Functional Theory approach out of the realm of “small” chemical systems, directly into a field that just a few years ago was an exclusive of the much less computationally demanding Molecular Mechanics methods. This opens the path to the accurate ab-initio simulation of complex chemical problems (in material science and beyond) without many of the simplifications that were necessary in the recent past.


Chemistry / Materials Science

This session took place Wednesday 21 May 2014 – 13:30 to 15:30.

Title: Ab initio modelling of the adsorption in giant Metal-Organic Frameworks: From small molecules to drugs


  • Bartolomeo Civalleri, Department of Chemistry, University of Torino, Torino, Italy


  • M. Ferrabone, Department of Chemistry, University of Torino, Torino, Italy
  • R. Orlando, Department of Chemistry, University of Torino, Torino, Italy

Metal-Organic Frameworks (MOFs) are a new class of materials that are expected to play a huge impact in the development of next-generation technologies. They consist of inorganic nodes connected through organic linkers to form a porous three-dimensional framework. The combination of different nodes and linkers makes MOFs very versatile materials with promising applications in many fields, including: gas adsorption, catalysis, photo-catalysis, drug delivery, sensing and nonlinear optics.

We will show results on the ab-initio modeling of the adsorptive capacity of the so-called giant MOFs. They possess pores with a very large size and, in turn, a huge surface area. Among giant MOFs, the most representative one is probably MIL-100. It ideally crystallizes in a non-primitive cubic lattice with 2788 atoms in the primitive cell. MIL-100 is characterized by the presence of a large number of coordinatively unsaturated metal atoms exposed at the inner surface of the pores that are crucial in determining its adsorption capacity. In particular, we are investigating MIL-100 for its ability of capture carbon dioxide, which is one of the hottest topic in MOFs research, and the adsorption of large molecules such as drugs, for drug delivery purposes. The project is ongoing and available results will be shown.

Giant MOFs, with thousands of atoms in the unit cell, represent a tremendous challenge for current ab-initio calculations. The use of Tier-0 computer resources provided by PRACE is essential to tackle this challenging problem. All calculations have been carried out with the B3LYP-D method by using the massive parallel (MPP) version of the ab-initio code CRYSTAL (http://www.crystal.unito.it/).


Title: Ab Initio Quantum Chemistry on Graphics Processing Units: Rethinking Algorithms for Massively Parallel Architectures


  • Jörg Kussmann, University of Munich (LMU), Germany


  • Simon Maurer, University of Munich (LMU), Germany
  • Christian Ochsenfeld, University of Munich (LMU), Germany

Conventional ab initio calculations are limited in their application to molecular systems containing only a few hundred atoms due to their unfavorable scaling behavior, which is at least cubical [O(N3)] for the most simple mean-field approximations (Hartree-Fock, Kohn-Sham density functional theory). In the last two decades, a multitude of methods has been developed that reduce the scaling behavior to linear for systems with a significant HOMO-LUMO gap, allowing for the computation of molecular properties of systems with more than 1000 atoms on single-processor machines.

The advent of general-purpose GPUs (GPGPU) in recent years promised significant speed-ups for scientific high-performance computing. However, quantum chemical methods seem to pose a particularly difficult case due to the heavy demand of computational resources. Thus, first implementations of the rate-determining integral routines on GPUs were strongly limited to very small basis sets and employed intermediate single-precision quantities. Furthermore, a straightforward and efficient adaptation of O(N) integral algorithms for GPUs is not possible due to their inherent book-keeping, branching, random memory access, and process interdependency.

We present general strategies and specific algorithms to efficiently utilize GPUs for electronic structure calculations with the focus on a fine-grained data organization for efficient workload distribution, reducing inter-process communication to a minimum, and minimizing the use of local memory.

Thus, we are able to use large basis sets and double-precision-only GPU-kernels in contrast to previously suggested algorithms. The benefits of our approach will be discussed for the example of the calculation of the exchange matrix, which is the by far most time-consuming step in SCF calculations.

Here, we recently proposed a linear-scaling scheme based on pre-selection (PreLinK) which has been proven to be highly suitable for massively parallel architectures.

Thus, we are able to perform SCF calculations on GPUs using larger basissets to determine not only energies and gradients, but also static and dynamic higher order properties like NMR shieldings or excitation energies. Apart from discussing the performance gain as compared to conventional ab initio calculations on a single server, we also compare different architectures based on CUDA, OpenCL, MPI/OpenMP, and MPI/CUDA.

Furthermore, we present the – to our knowledge – first efficient use of GPUs for post-HF methods beyond the mere use of GPUs for linear algebra operations at the example of second-order Møller-Plesset perturbation theory (MP2).


Title: Shedding Light On Lithium/Air Batteries Using Millions of Threads On the BG/Q Supercomputer


  • Teodoro Laino, IBM Research – Zurich, Rüschlikon, Switzerland


  • V. Weber, IBM Research – Zurich, Rüschlikon, Switzerland
  • A. Curioni, IBM Research – Zurich, Rüschlikon, Switzerland

In 2009, IBM Research embarked into an extremely challenging project, the ultimate goal of which is to deliver a new type of battery that will allow to drive an electric vehicle for 500 miles without intermediate recharging. The battery considered the most promising candidate to achieve this goal is based on lithium and oxygen, commonly known as Lithium/Air battery, potentially delivering energy densities one order of magnitude larger than state-of-the-art electrochemical cells.

With few exceptions carbonate-based electrolytes, for instance propylene carbonate (PC) or ethylene carbonate (EC), have been the preferred choice for most experimental setups related to Lithium/Air batteries to date. By using massively parallel molecular dynamics simulations, we modeled the reactivity of a surface of Li2O2 in contact with liquid PC, revealing the high susceptibility of PC to chemical degradation by the peroxide anion.

Moreover, by using increasingly detailed and realistic simulations we were able to provide an understanding of the molecular processes undergoing at the cathode of the Li/Air cell, showing that the electrolyte holds the key role in non-aqueous Lithium/Air batteries in producing the appropriate reversible electrochemical reduction.

A crucial point when modeling such complex systems is the level of accuracy of DFT calculations, which is key for improving the predictive capabilities of molecular modeling studies and for addressing material discovery challenges.

In order to achieve a reliable level of accuracy we implemented a novel parallelization scheme for a highly efficient evaluation of the Hartree–Fock exact exchange (HFX) in ab initio molecular dynamics simulations, specifically tailored for condensed phase simulations. We show that our solutions can take great advantage of the latest trends in HPC platforms, such as extreme threading, short vector instructions and highly dimensional interconnection networks. Indeed, all these trends are evident in the IBM Blue Gene/Q supercomputer. We demonstrate an unprecedented scalability up to 6,291,456 threads (96 BG/Q racks) with a near perfect parallel efficiency, which represents a more than 20-fold improvement as compared to the current state of the art. In terms of reduction of time to solution we achieved an improvement that can surpass a 10-fold decrease of runtime with respect to directly comparable approaches.

By using the PBE0 hybrid functional (HFX), so to enhance the accuracy of DFT based molecular dynamics, we characterized the reactivity of different classes of electrolytes with solid Li2O2. In this talk, we present an effective way to screen different solvents with respect to their intrinsic chemical stability versus Li2O2 solid particles [3]. Based on these results, we proposed alternative solvents with enhanced stability to ensure an appropriate reversible electro-chemical reaction and finally contribute to the optimization of a key technology for electric vehicles.


Environmental Science

This session took place Wednesday 21 May 2014 – 16:00 to 17:20.

Title: Next generation pan-European climate models for multi- and many-core architecture


  • Jun She, Danish Meteorological Institute


  • Jacob Weismann Poulsen, Danish Meteorological Institute
  • Per Berg, Danish Meteorological Institute
  • Lars Jonasson, Danish Meteorological Institute

To generate more consistent and accurate climate information for climate adaptation and mitigation, high resolution coupled atmosphere-ocean-ice models are needed in large regional scale, e.g., pan-European and Arctic-N. Atlantic scales. The computational load of these models can be hundreds times heavier than current global coupled models (e.g. those used in IPCC AR5). The vision is to make the regional coupled models efficient on multi-and many-core architecture. To reach this goal, the most challenging part is the ocean model optimization as the model domain is highly irregular with straits of a few hundred meter width to open ocean in a scale of a few thousand kilometres. Based on achievements made in PRACE project ECOM-I (Next generation pan-European coupled climate-ocean model – phase 1), this presentation will show methods and results in optimizing a pan-European two-way nested ocean-ice model, with focusing on coding standard, I/O, halo communication, load balance and multi-grid nesting. The optimization was tested on different architectures e.g. Curie Thin, CRAY XT5/XT6 and Xeon Phi etc. The results also show that different model setups lead to very different computational complexity. A single real domain setup for Baffin Bay shows scalability to 16000 cores and Amdahl ratio of >99.5%. However, a pan-European setup with 10 interconnected nesting domains only reaches scalability of less than two thousand and Amdahl ratio 92%. Key issues on evaluating computational performance of models, such as run2run reproducibility, scalability, Amdahl ration and their relation with job size, ratio of computational points (wet points) and multi-grids will be addressed. Finally a roadmap for next generation pan-European coupled climate models for many-core architecture is discussed.


Title: Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model


  • George S. Markomanolis, Barcelona Supercomputing Center, Spain

The Earth Sciences Department of the Barcelona Supercomputing Center (BSC) is working on the development of a new chemical weather forecasting system based on the NCEP/NMMB multiscale meteorological model. In collaboration with the National Centers for Environmental Prediction (NOAA/NCEP/EMC), the NASA Goddard Institute for Space Studies (NASA/GISS), and the University of California Irvine (UCI), the group is implementing aerosol and gas chemistry inlined within the NMMB model. The new modeling system, namely NMMB/BSC Chemical Transport Model (NMMB/BSC-CTM), is a powerful tool for research in physico-chemical processes occurring in the atmosphere and their interactions. We present our efforts on porting and optimizing the NMMB/BSC-CTM model. This work is done under Severo Ochoa program and the purpose is to prepare the model for large scale experiments and increase the resolution of the executed domain. However, in order to achieve high scalability of our application is needed to optimize the various parts of the code. It is well known through the discussion about the exascale era that the coprocessors will play an important role. Currently there are two main types of coprocessors, GPUs and Intel Xeon Phi. In order to use both approaches without the need to rewrite most of the code, the programming model OmpSs, which is developed at BSC-CNS, is used. Through this procedure we extend the usage of our model by porting part of our code to be executed on GPUs and Xeon Phi coprocessors. The performance analysis tool Paraver is used to identify the bottleneck functions. Afterwards, the corresponding code is ported in OpenCL either optimized for being executed on GPUs and Xeon Phi respectively. We execute our model with various configurations in order to test it under extreme load by enabling the chemistry modules which take under consideration much more species (water, aerosols, gas) and we observe that the bottleneck functions depend on each case. We solve load balancing issues and whenever possible we take advantage of the available cores from NVIDIA GPU and Intel Xeon Phi. To the best of our knowledge, the use of the programming model OmpSs on an earth science application with future purpose to be used operationally is without any precedence.


Automotive / Engineering

This session took place Wednesday 21 May – 13:30 to 15:30.

Title: INCITE in the International Research Community


  • Julia C. White, INCITE, United States

The Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program promotes unprecedented scientific and engineering simulations through extremely large awards of computer time on high-performance computers that are among the most powerful in the world. Successful INCITE projects deliver high-impact science that could not otherwise be achieved without access to leadership-class systems at the US Department of Energy’s Argonne and Oak Ridge Leadership Computing Facilities. INCITE does not distinguish between funding sources or country of affiliation, instead selecting the research of highest impact from the worldwide community of researchers.

Julia C. White, INCITE program manager, will highlight the history of the INCITE program and the role of international researchers over the program’s ten-year history. White will describe the importance of a broad geographical diversity of not just proposal applicants, but of peer-review panels that assess applications and even the INCITE program itself.

Paul Messina of Argonne National Laboratory will speak about industry use of leadership-class resources. White will focus on international access to these resources through the INCITE program.


Title: High fidelity multiphase simulations studying primary breakup


  • Mathis Bode, RWTH Aachen University, Germany

A variety of flows encountered in industrial configurations involve both liquid and gas. Systems to atomize liquid fuels, such as diesel injection systems, are one example. The performance of a particular technical design depends on a cascade of physical processes, originating from the nozzle internal flow, potential cavitation, turbulence, and the mixing of a coherent liquid stream with a gaseous ambient environment. This mixing stage is critical, and the transfer occurring between liquid and gas is governed by an interface topology.

The most serious gap in understanding of spray formation is primary breakup, but it is also the first physical process to be modeled. This means that uncertainties in the modeling of primary breakup will influence, for example, the design and performance of atomizers in diesel combustion systems all the way down to emission and pollutant formation.

Typical diesel injection systems have outlet diameters of the order of 100 micrometers and the resulting smallest droplets and turbulent structures are even much smaller. This illustrates two of the major problems for studying primary breakup: First, experiments characterizing the atomization process are very difficult due to the small length scales. Second, huge meshes are required for simulating primary breakup because of the necessity to resolve the broad spectrum of length scales in play within a single simulation. Thus, studying primary breakup is not possible without using massively parallel code frameworks.

We use the CIAO code which was already run on up to 65000 parallel cores on SuperMuc in connection with recently developed highly accurate interface tracking methods. This so-called 3D unsplit forward/backward Volume-of-Fluid method that is coupled to a level set approach overcomes the traditional issues of mass conservation and interface curvature computation in the context of multiphase simulations. Due to its robustness, it also enables the simulation of arbitrarily high density ratios.

In this project, a novel approach combining spatial and temporal jet simulations of multiphase flows is used to study primary breakup from first principles. The results of these high fidelity multiphase simulations are used to further the understanding and accurate modeling of primary breakup in turbulent spray formation of industrial relevance.


Title: Fluid saturation of hydrocarbon reservoirs and scattered waves: Numerical experiments and
field study


  • Vladimir A. Tcheverda, Novosibirsk State University, Russia


  • V. Lisitsa, Novosibirsk State University, Russia
  • A. Merzlikina, Novosibirsk State University, Russia
  • G. Reshetova, Novosibirsk State University, Russia

Over the last decade the use of scattered waves took a significant place among the wide range of modern seismic techniques. But so far their main area of application is spatial localization of clusters of subseismic-scale heterogeneities, like cracks, fractures and caverns, in other words these waves are using just in order to say “yes” or “no” to the presence of this microstructure. Therefore the main goal of our efforts within the framework of the PRACE Project Grant 2012071274 (supercomputer HERMIT at Stutgart University) is to understand which kind of knowledge about the fine structure of the target object like a cavernous fractured reservoir can be achieved from this constituent of the full seismic wave field. The key instrument for the studying the scattering and diffraction of seismic waves in realistic models is a full scale numerical simulation. In order to describe correctly waves’ propagation in media with heterogeneities of both large scale (3D heterogeneous background) and fine scaIe (distribution of caverns and fracture corridors) we apply finite-difference schemes with local refinement in time and space. On this base we are able to simulate wave propagation in very complicated realistic models of 3D heterogeneous media with subseismic heterogeneities.

This simulation was done for realistic digital model derived from all available data about some specific deposits. It happens that fluid saturation has very specific impact in synthetic seismic image which can be used as predictive criterion in real life data processing and interpretation. This criterion is confirmed by real life deep well.


Astrophysics and Mathematics

This session took place Wednesday 21 May 2014 – 16:00 to 17:20.

Title: EAGLE: Simulating the formation of the Universe


  • Richard Bower, Durham University, United Kingdom

The EAGLE (Evolution and Assembly of Galaxies and their Environments) project aims to create a realistic virtual universe on the PRACE computers. Through a suite of state of the art hydrodynamic simulations, the calculations allow us to understand how the stars and galaxies we see today have grown out of small quantum fluctuations that are seeded in the big bang. The simulations track and evolve dark matter and dark energy using physical processes such as metal dependant gas cooling, the formation of stars, the explosion of supernovae and the evolution of giant black holes. The resolution of the simulations is sufficient to resolve the onset of the Jeans instability in galactic disks, allowing us to study the formation of individual galaxies in detail. At the same time the largest calculation simulates a volume that is 100 Mpc on each side, recreating the full range of galaxy environments from the isolated dwarves to dense rich galaxy clusters.

During my talk I will explain why this is a formidable challenge. The physics of galaxy formation couples the large scale force of gravity to the physics of star formation and black hole accretion. In principle, the simulation needs to cover a dynamic range of at least 108 in length scale (from 100 Mpc to 1 pc). To make matters worse, these scales are strongly coupled. While the small-scale phenomena are driven by large scale collapse, the small scale also generate feedback by generating gas flows on large scales. Even with large computer time allocations on the fastest computers available today, this is impossible and we must adopt a multi-scale approach.

A key philosophy of the EAGLE simulations has been to use the simplest possible sub-grid models for star formation and black hole accretion, and for feedback from supernovae and AGN. Using a stochastic approach, efficient feedback is achieved without hydrodynamic decoupling of resolution elements. The small number of parameters in these models are calibrated by requiring that the simulations match key observed properties of local galaxies. Having set the parameters using the local Universe, I will show that the simulations reproduce the observed evolution of galaxy properties extremely well.

The resulting universe provides us with deep insight into the formation of galaxies and black holes. In particular, we can use the simulations to understand the relationship between local galaxies and their progenitors at higher redshift and to understand the role of interactions between galaxies and the AGN that they host. I will present an overview of some of the most important results from the project, and discuss the computational challenges that we have met during the project. In particular, we found it necessary to develop a new flavour of the Smooth Particle Hydrodynamics (SPH) framework in order to avoid artificial surface tension terms.

The improved formulation has the potential to influence other areas of numerical astronomy and could also be used in more industrial applications such as turbine design or tsunami prevention where the SPH technique is commonly used.

The EAGLE project has shown that it is possible to simulate the Universe in unprecedented realism using an extremely simple approach to the multi-scale problem. It has allowed us to meet the grand challenge of understanding the origin of galaxies like our own Milky Way. I will briefly describe what can be learned from the novel approach to sub-grid physics and potentially applied to other areas.


Title: A massively parallel solver for discrete Poisson-like problems


  • Yvan Notay, University of Brussels, Belgium

AGMG (AGgregation-based algebraic MultiGrid solver) is a software package that solves large sparse systems of linear equations; it is especially well suited for discretized partial differential equations. AGMG is an algebraic solver that can be used black box and thus substitute for direct solvers based on Gaussian elimination. It uses a method of the multigrid type with coarse grids obtained automatically by aggregation of the unknowns. Sequential AGMG is scalable in the sense that the time needed to solve a system is (under known conditions) proportional to the number of unknowns.

AGMG is also a parallel solver since the beginning of the project in 2008. Within the framework of a PRACE project, we faced the challenge to port it on massively parallel systems, with up to several hundred thousands of cores. Some relatively simple yet not straightforward adaptations were needed. Thanks to them, we obtained excellent weak scalability results: when the size of the linear system to solve is increased pro portionally to the number of cores, the time is first essentially constant, and then increases but moderately, the penalty never exceeding a factor of 2 (this maximal factor is seen on JUQUEEN when using more than 370,000 cores, that is, more than 80% of the machine ranked eighth in the top 500 supercomputer list). More importantly, when considering scalability results, one should never forget that their relevance depends on the quality of the sequential code one starts from. And comparative tests show that, on a single node, our solver is more than 3 times faster than HYPRE, which is often considered as the reference parallel solver for the considered type of linear systems.



This session took place Wednesday 21 May 2014 – 13:30 to 15:30.

Title: The SHAPE Programme for Competitive SMEs in Europe


  • Giovanni Erbacci, PRACE 3IP WP5 leader, CINECA, (Italy)

The adoption of HPC technologies in order to perform wide numerical simulation activities, investigate complex phenomena and study new prototypes is crucial to help SMEs to innovate products, processes and services and thus to be more competitive.

SHAPE, the SME HPC Adoption Programme in Europe is a new pan-European programme supported by PRACE. The Programme aims to raise awareness about HPC among European SMEs and provide them with the expertise necessary to take advantage of the innovation possibilities created by HPC, thus increasing their competitiveness. The programme allows SMEs to benefit from the expertise and knowledge developed within the top-class PRACE research infrastructure.

The programme aims to deploy progressively a set of complementary services towards SMEs such as in formation, training, access to computational expertises for co developing a concrete industrial project to be demonstrated using PRACE HPC resources.

The SHAPE Pilot is a trial programme issued to prove the viability and the value of the SHAPE Programme, with the objective to refine the details of the initiative and prepare its launch in a fully operational way. The Pilot works with ten selected SMEs to introduce HPC-based tools and techniques into their business, operational, or production environment.

This session presents some preliminary results of the Pilot, showing the work carried out together with the selected SMEs to adopt HPC solutions.


Title: Design improvement of a rotary turbine supply chamber through CFD analysis


  • Roberto Vadori, Thesan


  • Claudio Arlandini, CINECA

This work deals with the optimization of a volumetric machine. The machine is under active development, and a prototype is already working and fully monitored in an experimental mock-loop setup. This prototype operates under controlled conditions on a workbench, giving as an output the efficiency of the machine itself. Maingoal is to obtain an increased efficiency through the design and realization of the moving chambers in which fluid flows. In order to obtain such a task, an extensive CFD modeling and simulation is required to perform virtual tests on different design solutions to measure the physical quantities assessing the performance of a given geometry. The final goal is to design a better geometry of the different components, mainly the supply and exhaust chambers, cutting down time and resources needed to realize a physical prototype and to limit the physical realization only on a single geometry of choice. The modeling should allow then, through an optimization strategy, to perform parametric studies of key parameters of the design of the moving chambers in which fluid flows, in order to identify the main geometrical parameters able to drive the optimal configuration.

High Performance Computing facilities and Open-Source tools, such as OpenFOAM, are therefore of capitol interest to handle the complex physical model under consideration and to perform a sufficient amount of design configuration analysis.


Title: Electromagnetic simulation for large model using HPC Speaker
  • José-Maria Tamayo-Palau, NEXIO Simulation
  • Pascal de-Reseguir, NEXIO Simulation
Abstract Nexio Simulation has recently started migrating from an electromagnetic simulation software (CAPITOLE-EM) developed for regular Personal Computers to High Performance Computing systems (CAPITOLE-HPC). This has been possible thanks first to the French HPC-PME initiative and then to the European Shape project. HPC-PME initiative is a project targeted to help and encourage Small and Medium size Enterprises (SME) towards HPC. Under the Shape project we expect to scale-up this initial step in the sense of computational time, resource usage and optimization. The industry has become more and more exigent asking for simulation of very large problems. In particular, in the electromagnetic environment, we can fall very rapidly into full linear systems with several millions of unknowns. The solution of these systems requires some matrix compression techniques based on the physics of the problem and mathematical algorithms. When these techniques are not enough it claims for the use of HPC with a good number of CPUs and a large amount of memory. The main workload in the migration to HPC systems is the parallelization of the code, trying to optimize the machine usage as well as a good memory treatment depending on the architecture of the particular machine. PRESENTATION PDF DOWNLOAD

Title: Novel HPC technologies for rapid analysis in bioinformatics


  • Paul Walsh, Nsilico, Ireland

NSilico is an Irish based SME that develops software to the life sciences sector, providing bioinformatics and medical informatics systems to a range of clients. One of the major challenges that their users face is the exponential growth of high-throughput genomic sequence data and the associated computational demands to process such data in a fast and efficient manner. Genomic sequences contain gigabytes of nucleotide data that require detailed comparison with similar sequences in order to determine the nature of functional, structural and evolutionary relationships. In this regard Nsilico has been working with computational experts from CINES (France) and ICHEC (Ireland) under the PRACE SHAPE programme to address a key problem that is the rapid alignment of short DNA sequences to reference genomes by deploying the Smith-Waterman algorithm on an emerging many-core technology, the Intel Xeon Phi co-processor. This presentation will give an overview of the technical challenges that have been overcome during this project, performance achievements and implications, as well as our immensely positive experience in working with PRACE within this successful collaboration.


Title: HPC application to improve the comprehension of ballistic impacts behaviour on composite materials


  • Paolo Cavallo, AMET


  • Claudio Arlandini, CINECA

The damage phenomenon occurring on composite materials when subjected to a ballistic impact is a complex problem.

Therefore, the understanding of the influence of the parameters describing the material behavior is not a straightforward task; moreover, due to the fact that these influences are mutually connected, the task of designinga new structure with improved characteristics in terms of resistance to ballistic impacts is a very hard one. Only resorting to a massive use of DOE analyses, supported by suitable computing resources, may lead to a better understanding of the problem and to a definition of the parameters mostly influencing the physical phenomenon.

We present an overview of the methodology used in this research together with the first results obtained, and their relevance in the context of composite materials industrial manufacturing.


Title: PRACE SHAPE Project: OPTIMA pharma GmbH


  • Ralph Eisenschmid, OPTIMA pharma GmbH


  • B. Große-Wöhrmann, Bärbel, HLRS

OPTIMA pharma produces and develops filling and packaging machines for pharmaceutical products. Sterile filling lines are enclosed in clean rooms, and a detailed and reliable knowledge of the airflow inside the clean rooms would enhance the design of the filling machines and support the CAE job. The goal of this project is to simulate the airflow with OpenFOAM meeting the requirements of industrial production.

We looked for the best strategy for the generation of very large meshes including domain decomposition and reconstruction using the standard tools provided by OpenFOAM. Then, we tested and compared different turbulence models on large meshes and studied the scalability of the relevant OpenFOAM solvers. Overall, we found a compromise between the required mesh resolution and the feasible mesh size which allows reliable simulations of the airflow in the entire clean room. We found serial tools like decomposePar as walltime- and memory-critical bottlenecks in performing CFD with OpenFOAM on large grids with mesh sizes larger than 50 M cells. Results will be presented at the talk.


Title: Testing LES turbulence models in race boat sail with the involvement of Juan Yacht Design


  • Herbert Owen, Barcelona Supercomputing Centre, Spain

Currently, race boat design depends more heavily on the CFD modeling of turbulent free surface flows than on tank and wind tunnel testing. Simulations are cheaper, faster and more reliable than traditional tests for boat design. Enhanced flow visualization and force decomposition, provides much richer information than the one measurable in tank tests leading to a much better understanding of the flow phenomena. The early adoption of RANS CFD has been a key competitive advantage in the design of America’s Cup and Volvo Ocean Race wining boats. Nowadays commercial RANS CFD codes have become standard practice and more innovative simulation tools would provide a technological advantage. RANS models work well for most problems but their accuracy is reduced when there are important regions of separated flow. This happens at the boat sails for certain wind directions. Large eddy simulation (LES) turbulence models are needed for such flows.

In this work, we test LES models implemented in the finite element CFD code Alya for the flow around boat sails in conditions where RANS models fail. Alya uses a Variational Multiscale formulation that can take into account the LES modeling relying only on the numerical model. Alternatively eddy viscosity models such as the WALE model can be used. The results obtained with these models will be compared to results obtained with RANS on the same mesh to allow the company JYD to have a better idea of the advantages this new technology could contribute to their work and the feasibility of incorporating it to their available tools.

Plenary Session (3)

This session took place Thursday 22 May 2014 – 09:00 to 12:30.

Title: Observing the bacterial membrane through molecular modeling and simulation


  • Matteo Dal Peraro, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL)

The physical and chemical characterization of biological membranes is of fundamental importance for understanding the functional role of lipid bilayers in shaping cells and organelles, steering vesicle trafficking and promoting cellular signaling. In bacteria this cellular envelop is highly complex, providing a robust barrier to permeation and mechanical stress and an active defense to external attack. With the constant emergence of drug resistant strains that poses a serious threat to global health, understanding the fine molecular details of the bacterial cellular wall is of crucial importance to aid the development of innovative and more efficient antimicrobial drugs. In this context, molecular modeling and simulation stand as powerful resources to probe the properties of membranes at atomistic level. In this talk I will present the efforts of my laboratory (i) to cre ate better models of bacterial membrane constituents, (ii) to develop efficient tools for assembling realistic bacterial membrane systems, and (iii) to investigate their interactions with signaling protein complexes and antimicrobial peptides, exploiting the computational power of current HPC resources.


Title: Observations on the evolution of HPC for Science and Industry


  • Paul Messina, Argonne Leadership Computing Facility (ALCF) of Argonne National Laboratory

Scientific computing has advanced dramatically during the last four decades, despite several upheavals in computer architectures. The evolution of high-end computers in the next decade will again pose challenges as well as opportunities. The good news is that many applications are able to utilize today’s massive levels of parallelism, as will be shown by presenting a sampling of varied scientific, engineering, and industrial applications that are using high-end systems at the Argonne Leadership Computing Facility and other centers.

As we look towards the use of exascale computers, availability of application software and building blocks is as always a key factor. This is especially the case for industrial users but is also true for many academic and research laboratory users. Support is needed to enable the transition of widely used codes, programming frameworks, and libraries to new platforms and evolution of capabilities to support the increased complexity of the applications that are enabled by the more powerful systems.

Providing access to state-of-the-art systems — and training on their use — to interested industrial and academic researchers in an effect approach and should be used more widely. Training is also an important factor in enabling the productive use of HPC. Few university courses teach scientists and engineers how to use effectively leading-edge HPC platforms, software engineering practices, how to build and maintain community codes, what high-quality software tools and building blocks are available, and how to work in teams — yet all those skills are necessary in the use of HPC.

Finally, close involvement of applications experts in guiding the design of future hardware and software, supplemented by funding to address development of key technologies and features, has proven to be effective and will be needed more than ever in the exascale era and beyond.


PRACEdays14 Posters

The following Posters were presented at the PRACEdays14 Poster Sesssion.

Poster Title: Simulating an Electrodialysis Desalination Process with HPC

Poster Authors

  • Kannan Masilamani, Siemens AG, Corporate Technology, Erlangen, Germany; Simulation Techniques and Scientific Computing, University of Siegen, Germany
  • J. Zudrop, Simulation Techniques and Scientific Computing, University of Siegen, Germany
  • M. Johannink, Aachener Verfahrenstechnik – Process Systems Engineering, RWTH Aachen University Germany
  • H. Klimach, Simulation Techniques and Scientific Computing, University of Siegen, Germany
  • S. Roller, Simulation Techniques and Scientific Computing, University of Siegen, Germany

Electrodialysis can be use for efficient seawater desalination. For this, an electric field is used in combination with selective membranes to separate salt ions from the seawater. Those membranes are kept apart by a complex spacer structure. Within the spacer filled flow channel, the process involves the transport of ions and the bulk mixture. A multi-species Lattice Boltzmann Method (LBM) for liquid mixture is implemented in our highly scalable simulation framework Adaptable Poly-Engineering Simulator (APES) and deployed on High Performance Computing (HPC) systems to gain some insights to this complex process.

For relevant results, it is necessary to simulate the full device used in the laboratory and industrial scale, which results in simulations with half a billion elements. A performance analysis is done for the method on the Cray XE6 system Hermit, HLRS, Stuttgart.


Poster Title: Perturbation-Response Scanning method reveals hot residues responsible for conformational transitions of human serum transferrin protein

Poster Authors

  • Haleh Abdizadeh, Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Istanbul
  • Ali Rana Atilgan, Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Istanbul
  • Canan Atilgan, Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Istanbul

Proteins usually undergo conformational changes between structurally different forms to fulfill their functions.

The large-scale allosteric conformational transitions are believed to involve some key residues that mediate the conformational movements between different regions of the protein. In the present work, we have employed Perturbation- Response Scanning (PRS) method based on the linear response theory in order to predict the key residues involved in protein conformational transitions. The key functional sites are identified as the residues whose perturbations largely influence the conformational transition between the initial and target conformations. Ten different states of the human serum transferrin (hTF) protein in apo, holo and partially open forms under different initial conditions have been used as case studies to identify critical residues responsible for closed- partially open- open transitions. The results show that the functionally important residues mainly are confined to highly specific regions. Interestingly, we observe a rich mixture of both conservation and variability within the identified sites. In addition, perturbation directionality is an important factor in recovering the conformational change, implying that highly selective binding must occur near these sites to invoke the necessary conformational change.

Moreover, our extensive Molecular Dynamics (MD) simulations of holo hTF in physiological and endosomal pH are in remarkable agreement with experimental observations. Our results indicate domain motions in the N-lobe as well as domain rigidity in the C-lobe at physiological pH. However, the C lobe goes through more flexible dynamics at low pH, achieved as a result of protonation of pKa upshifted residues. This flexibility in turn leads to the selective release of iron within this cellular compartment.


Poster Title: Old-fashioned CPU optimisation of a fluid simulation for investigating turbophoresis

Poster Authors

  • John Donners, SURFsara
  • Hans Kuerten, TU/e

Turbulent flows with embedded particles occur frequently in the environment and in industry. These flows have richer physics than flow of a single-phase fluid and new numerical simulation techniques have been developed in recent years. One of the main interests of this research is turbophoresis, the tendency of particles to migrate in the direction of decreasing turbulence. This principle tends to segregate particles in a turbulent flow toward the wall region and is expected to increase the deposition rate onto a surface. High-resolution simulations with a spectral model are used to correctly predict the particle equation of motion in models that do not resolve all turbulent scales.

Long integrations are required to reach statistical equilibrium of the higher-order moments of the particle velocities. Most of the runtime of the spectral model is taken up by Fourier transforms and collective com munications. To reach the required performance, the MPI-only parallellization scheme was extended with the use of MPI datatypes, multi-threaded FFTW calls and OpenMP parallellization. To maximize efficiency,

MPI communication and multi-threaded FFTW calls are overlapped: the master thread is used to complete the blocking collective communication, while computations are split across the other threads. To accomplish this overlap, communication and computation of multiple variables is interleaved. When no communication is required, computations are split across all threads. The core count was increased by a factor of 5.2, while the total runtime could be reduced by a factor of 6.7.

Faster simulations allow for a tighter loop of hypothesis building and testing, which result in faster scientific discovery. The parallellization techniques presented here only require relatively small modifications to the code, without introducing revolutionary new paradigms for accelerators. This can keep the focus of the scientist on the generation of knowledge.


Poster Title: Toward next stage of design method of polymer nano-composites by X-ray scattering
analysis and large-scale simulations on supercomputers

Poster Authors

  • Katsumi Hagita, National Defense Academy of Japan

Polymer Nano-Composites (PNC), ex. polymer films and tire rubber, is widely used in our usual life. Geometry of nano-fillers has much important role to tune its function. Recently, nano-science and technology can perform molecular level control of synthesis to make various branching of polymer, modification of end of polymer and grafting to a substrate or a nano-particle, and observation of nano space from nano-meter to submicron meter. With benefits of recent progress of massively parallel supercomputing, virtual experiments to study effect by polymer architecture, morphology of nano particles can be performed for basic science by current top supercomputers and will be for R&D of industrial products by future top supercomputers. We proposed an approach combined X-ray scattering analysis and large scale simulations of bead spring model of PNC. Overview of our simulation model and approach, and results are shown in my Poster Presentation.

This work is partially supported by JHPCN (Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures) in JAPAN for efficient and advanced use of networked supercomputers.

Poster Title: The development of scalable unstructured high-resolution discretisations for Large Eddy
Simulation of turbomachinery flows

Poster Authors

  • Koen Hillewaert, Cenaero
  • Corentin Carton de Wiart, Cenaero

To allow the design for more reliable off-design operation, higher overall efficiency and lower environmental nuisance of jet engines, more precise CFD tools will be required in complement to the currently available tools.

The industry state of the art in CFD is largely based on statistical turbulence modeling. Scale-resolving approaches on the other hand compute (large) turbulent flow structures directly, thereby removing turbulence modeling altogether (Direct Numerical Simulation/DNS) or reducing its scope to the smaller turbulent structures, which are more universal in nature (Large Eddy Simulation/LES). Given the limitation of statistical models to the prediction of near-design aerodynamic performance, there is a need for scale-resolving approaches for the prediction of off-design aerodynamic performance, noise generation, combustion, transitional flows.

The stumbling block towards an industrial use of DNS and LES is the huge computational cost. The detailed representation of turbulent flow structures impose huge resolution and accuracy requirements, unobtainable by the low order discretisation methods currently used in industry. High resolution codes used for the fundamental study of turbulence are on the other hand not sufficiently flexible to tackle real industrial geometries, and often do not provide possibilities for adaptive resolution, which could drastically enhance solution reliability. The combination of high performance computing to adaptive unstructured high-resolution codes promises a breakthrough in modeling capabilities.

This talk discusses the recent developments in the development of the discontinuous Galerkin Method for the large-scale DNS and LES of turbomachinery flows. Due to its elementwise defined discontinuous interpolation, this method features high accuracy on unstructured meshes, excellent serial and (strong) parallel performance and high flexibility for adaptive resolution. The main focus of the talk will be the further assessment of the LES models on benchmark test cases as well as the assessment of the benefits of local or- der-adaptation currently persued in the PRACE project ‘PadDLES’. Furthermore, serial and parallel efficiency optimisation will be discussed.


Poster Title: Accelerating Simulations of Hydrogen Rich Systems by a Factor of 2.5

Poster Authors

  • Himanshu Khandelia, University of Southern Denmark, Denmark

Biological molecules are hydrogen-rich. Fast vibrations of H-bonded atoms and angles limit the time-step in molecular dynamics simulations to 2 fs. We implement a method to improve performance of all-atom lipid simulations by a factor of 2.5. We extend the virtual sites procedure to POPC lipids, thus permitting a time-step of 5 fs. We test our algorithm on a simple bilayer, on a small peptide in a membrane, and on a large transmembrane protein in a lipid bilayer, the latter requiring the use of HPC at Hector. Membrane properties are mostly unaffected, and the reorientation of a small peptide in the membrane and the lipid binding and ion binding of a large membrane protein are unaffected by the new VS procedure.

The procedure is compatible with the previously implemented virtual sites method for proteins, thus allowing for VS simulations of protein-lipid complexes.

Currently, the method has been implemented for the CHARMM36 force field, and is applicable to other lipids, proteins and force fields, thus potentially accelerating molecular simulations of all lipid-containing biological complexes.


Poster Title: Self-consistent charge carrier mobility calculation in organic semiconductors with explicit
polaron treatment

Poster Authors

  • Pascal Friederich, Karlsruhe Institute of Technology (KIT), Germany
  • Ivan Kondov, Karlsruhe Institute of Technology (KIT), Germany
  • Velimir Meded, Karlsruhe Institute of Technology (KIT), Germany
  • Tobias Neumann, Karlsruhe Institute of Technology (KIT), Germany
  • Franz Symalla, Karlsruhe Institute of Technology (KIT), Germany
  • Angela Poschlad, Karlsruhe Institute of Technology (KIT), Germany
  • Andrew Emerson, SuperComputing Applications and Innovation Dept, Cineca, Italy
  • Vadim Rodin, Sony Deutschland GmbH, Stuttgart Technology Center, Germany
  • Florian von Wrochem, Sony Deutschland GmbH, Stuttgart Technology Center, Germany
  • Wolfgang Wenzel, Karlsruhe Institute of Technology (KIT), Germany

Whole-device simulation of organic electronics is important for improving device performance. We present a multi-step simulation of electronic processes in organic light-emitting diodes (OLEDs) achieved by multi-scale modelling, i.e. by integrating different simulation techniques covering multiple length scales. A typical model with 3000 molecules consists of about 1000 pairs of charge hopping sites in the core region, which contains about 100 electrostatically interacting molecules. The energy levels of each site depend on the local electrostatic environment yielding a significant contribution to the energy disorder. This effect is explicitly taken into account in the quantum mechanics sub-model in a self-consistent manner, which represents however, a considerable computational challenge. Thus we find that the total number of computationally expensive density functional theory (DFT) calculations needed is very high (about 105). Each of these calculations is parallelized using the MPI library and scales up to 1024 Blue Gene/Q cores for small organic molecules of about 50-100 atoms. Next data are exchanged between all contained molecules at each iteration of the self-consistence loop to update the electrostatic environment of each site. This requires that the quantum mechanics sub-model is executed on a high-performance computing system employing a special scheduling strategy for a second-level parallelisation of the model. In this study we use this procedure to investigate charge transport in thin films based on the experimentally known electron-conducting small molecule Alq3, but the same model can be applied to, for example, two-component organic guest/host systems.

Poster Title: CFD Simulations by Open Source Software

Poster Authors

  • Tomas Kozubek, National supercomputing center IT4Innovations, VSB – TU Ostrava, Czech Republic
  • Tomas Brzobohaty, National supercomputing center IT4Innovations, VSB – TU Ostrava, Czech Republic
  • Tomas Karasek, National supercomputing center IT4Innovations, VSB – TU Ostrava, Czech Republic

Demand from end users who need to solve their problems which are in many cases very complex is and always has been driving force for developing of new efficient algorithms. This is even more apparent in era of supercomputers. Nowadays high performance computers give their users computational power unimaginable few years ago. Demand for algorithms able to tame and utilize this power has been lately driving force for parallelization of existing and development of new parallel algorithms.

At this poster examples of engineering problems such as external aerodynamics, urban flow and thermodynamics solved on High Performance Computing (HPC) platform are presented. To obtain high fidelity results numerical models consisting of meshes with huge number of cells has to be created. As a consequence large number of equations has to be solved to obtain final solution. To do so in acceptable time supercomputer

Anselm at National supercomputing center IT4Innovations, Czech Republic, was employed. To emphasize advantage of supercomputers when it comes to computational time results of scalability for all cases are presented at this poster as well.

Deployment of open source codes on HPC systems together with development of new algorithms for solving large number of equations will enable researchers and engineers to solve even more challenging problems in many areas and industries such as aerospace, automotive, biomechanics or urban flow.


Poster Title: GPGPU based Lanczos algorithm for large symmetric eigenvalue problems

Poster Authors

  • Vishal Mehta, Trinity College Dublin, Ireland

Eigen value problems are heart of many science and engineering applications. However, they are computationally expensive, especially when the eigenvalue systems are very large. There are techniques like power iteration, Arnoldi’s algorithm, and Lanczos procedure when only few of large or small Eigen values are required.

The use of GPGPU for these computations is challenging. The CUDA computing model and PTX assembly from Nvidia does provide flexible environment for a programmer to use the hardware to its threshold.

The Implicit restarted Lanczos has been developed for an NVIDIA GPU, providing notable speed up over standard shared memory OpenMP model. The salient features include householder transformations for QR decomposition and strum sequencing techniques for eigen values of symmetric tridiagonal matrix. The memory levels like shared memory, caches, and registers have been efficiently used along with highly efficient PTX assemblies. PTX assembly optimization includes reducing registers in use; by managing assembly instructions pertaining to false shared memory initializations and false movements of values around registers.


Poster Title: Car body design in crash: A new optimization challenge

Poster Authors

  • Marc Pariente, Renault SAS
  • Thuy Vuong, Yves Tourbier, Jean Christophe Allain, IRT System X; ESI-Group

The presentation will focus on the results of a PRACE HPC project initiated in March 2013 and completed in March 2014. The purpose of the project is the optimal design of a vehicle body to reach the safety objective, with representative means and targets that the automakers will use in the next 3-5 years. The project consists of two complementary phases:

  • The development of a crash numerical model integrating a more precise representation of the physics than the current models (about 20 MFE, calculated with 1024 cores within 24hrs)
  • The use of this model in a design study by optimization techniques in large dimension (about 100 parameters) , and representative of the combinatorial aspects of the industrial issues, such as the re-use of existing parts up to design a new vehicle.

An application of the model reduction techniques in crash will help to conclude on the prospects for large- scale optimization problems with heavy numerical simulations.


Poster Title: Harnessing Performance Variability for HPC Applications

Poster Authors

  • Antonio Portero, IT4Innovations National Supercomputer Center, Czech Republic

The overall goal of the HARPA project is to provide architectures for High Performance Computing (HPC)-oriented with efficient mechanisms to offer performance dependability guarantees in the presence of unreliable time-dependent variations and aging throughout the lifetime of the system. This will be done by utilizing both proactive (in the absence of hard failures) and reactive (in the presence of hard failures) techniques. The term “performance dependability guarantee” refers to time-criticality (i.e., meeting deadlines), and a predefined bound on the performance deviation from the nominal specifications in the case of HPC. The promise is to achieve this reliability guarantee with a reasonable energy overhead (e.g. less than 10% average). A significant improvement is hence achieved compared to the SotA, which now provides guarantees at the payoff of at least 50% overhead. In addition, we will provide a better flexibility in the platform design while still achieving power savings of at least 20%. To the best of our knowledge, this is the first project to attempt a holistic approach of providing dependable performance guarantees in HPC systems. This is done while taking into account various non-functional factors, such as timing, reliability, power, and aging effects.

The HARPA project aims to address several scientific challenges in this direction:

    • Shaving margins. Similar to the circuit technique Razor, but with different techniques at the microarchitecture and middleware, our aim is to introduce margin shaving concepts into aspects of a system that are typically over-provisioned for the worst case.
    • A more predictable system with real-time guarantees, where needed. The different monitors, knobs, and the HARPA engine will make the target system more predictable and pro-actively act on performance variability prior to hard failures. (iii) Implementation of effective platform monitors and knobs. HARPA will select the appropriate monitors and knobs and their correct implementation to reduce efficiency and performance overheads.

Technical Approach: HARPA Engine Overview
Figure shows the main concepts of the HARPA architecture and the main components of an architecture that can provide performance-dependability guarantees. The main elements that distinguish a HARPA-enabled system are: (i) Monitors and knobs, (ii) User requirements and (iii) HARPA Engine. The HARPA engine actuates the knobs to bias the execution flow as desired, based on the state of the system and the performance (timing/throughput) requirements of the application.

The concepts that are to be developed within the HARPA context address the HPC. More specifically, from HPC domain we will use Disaster and Flood Management Simulation.

Web page: www.harpa-project.eu


Poster Title: Engineering simulations at CSUC

Poster Authors

  • Pere Puigdomènech, Consorci de Serveis Universitaris de Catalunya (CSUC)
  • David Tur, Consorci de Serveis Universitaris de Catalunya (CSUC)
  • Alfred Gil, Consorci de Serveis Universitaris de Catalunya (CSUC)
  • Cristian Gomollon, Consorci de Serveis Universitaris de Catalunya (CSUC)

The Consorci de Serveis Universitaris de Catalunya (CSUC) shares academic, scientific, library, transfer of knowledge and management services to associated entities to improve effectiveness and efficiency by enhancing synergies and economies of scale. The center provide services to public and private universities, research centers and institutes, offering a wide range of services such as supercomputing, communications, advanced communications library resources, digital repositories, e-administration and shared services

The HPC&applications area of CSUC offers its knowledge to accademic and industrial users providing technical and scientific support hence they can obtain the maximum benefit of the use of the HPC systems.

The poster will present Benchmark results of most used industrial codes showing performance behaviour in real cases from:

  • Ansys FLUENT 14: Truck_111m: Flow around a truck body (DES 111e6 elements) and Donaldson
    LES (LES 20 e6 elements)
  • Pamcrash 2012: Barrier: Entire car crash model (3e6 elements)
  • ABAQUS 6.12 Explicit and Implicit: Cylinder head-block linear elastic analysis (5e6 elements) and Wave propagation (10e6 elements)
  • STAR-CCM + 7.02: Aeroacustic model (60e6 elements)
  • OpenFOAM 2.0.0: Motorbike fluid dynamics (RANS 70e6 elements)


Poster Title: Solving Large non-Symmetric Eigenvalue problems using GPUs

Poster Authors

  • Teemu Rantalaiho, Department of Physics and Helsinki Institute of Physics, University of Helsinki, Finland
  • David J. Weir, Department of Physics and Helsinki Institute of Physics, University of Helsinki, Finland
  • Joni M. Suorsa, Department of Physics and Helsinki Institute of Physics, University of Helsinki, Finland

We present an implementation of the Implicitly restarted Arnoldi method (IRAM) with deflation optimized for CUDA capable graphics processing units. The resulting code has been published online and is free to use with two levels of APIs that can be tailored to meet many needs. The IRAM method is a Krylov subspace method that can be used to extract a part of the eigenvalue/vector spectrum of a large nonsymmetric (non hermitean) matrix. Our use case was the extraction of the low-lying eigenvalue distribution of the Wilson-Dirac operator in the context of Lattice QCD and the large amount of computations needed for a single calculation combined with our already CUDA capable QCD code warranted the use of a custom solution for IRAM. Our approach followed the strategy of our QCD code where abstraction of parallel algorithms allows us to decouple the actual scientific code from the underlying hardware; This way one can run the same code on both CPUs and GPUs, greatly reducing development time, which is one of the key performance metrics in production codes.

Benchmarks on a single Tesla k20m (ECC on – 175GB/s mem bw) GPU show that our algorithm runs about 18.5 times faster than ARPACK++ on a single core of a Xeon X5650 @ 2.67GHz (32GB/s) with a 786432 sized system of a sparse (QCD) matrix with about 6 percent of the time spent in matrix-vector multiplies (on the GPU). On this use-case the GPU code achieved 146 Gbytes/s, which is 83 percent of theoretical peak memory bandwidth. Our code supports multiple GPUs through MPI and the code scales well as long as there is enough work to fill the GPUs.


Poster Title: High Performance Computing aspects of acoustic simulations of an air-intake system in OpenFOAM

Poster Authors

  • Jan Schmalz, University of Duisburg-Essen, Chair of Mechanics and Robotics, Duisburg, Germany
  • Wojciech Kowalczyk, University of Duisburg-Essen, Chair of Mechanics and Robotics, Duisburg, Germany

Air-intake systems of combustion engines emit sound mainly based on turbulences. But often the acoustic parameters and the sound emission are considered not before an existing prototype. Unfortunately changes of concepts are hardly feasible in that stage of development process. Numerical methods, like finite volume methods for computational fluid dynamics, applied on virtual prototypes are helpful tools during the early stages of product development processes. Concerning the acoustical behavior commonly used methods of computational fluid dynamics are extended to compute e.g. the sound pressure level in the far field at a specific observer point. The contributed data is comparable to the results of common acoustic measurements.

In this paper the open source computational fluid dynamics framework OpenFOAM is used to solve the complex fluid dynamic task of an air-intake system of a combustion engine. Due to the used numerical approach it also has the principle functionality to solve aero acoustic problems. A computational aero acoustic approach based on acoustic analogies is implemented in OpenFOAM 2.1.1. This novel approach is mainly based on Curle’s acoustic analogy where existing surfaces within the computation domain are rigid and stationary. The CAA approach is added to originally distributed transient incompressible and compressible application solvers, pisoFoam and rhoPimpleFoam respectively, which are both parallelized already and are able to run on several compute cores.

The presented method takes into account the possibility and availability of high performance computing resources. It provides the advantage to compute the flow fields, acoustic sources and the corresponding sound propagation in an extended near field on one mesh only which might be done during the first phases in product development. The specific behavior of parallel computation of acoustical fields in a HPC environment will be discussed by means of the mentioned computing case of an air intake system.


Poster Title: Linear Algebra Library for Heterogeneous Computing in Scientific Discovery

Poster Authors

  • Thomas Soddemann, Fraunhofer SCAI, Germany

Current hardware configurations evolve more and more to highly heterogeneous environments combining traditional CPU based systems with accelerator boards. Obtaining good performance on such systems is challenging and implies code adaption, integration of new components and using different libraries.

Application domains from various industrial fields including aerospace, automotive, engineering and oil & gas exploration often can be subsumed to simulations solving big sparse linear systems of equations, which can be challenging e.g. due to numerical stability and scalability.

The Library for Accelerated Math Applications (LAMA) accomplishes both: new and altering hardware systems with efficient backends for various architectures and accelerated calculation through a wide set of linear solvers. LAMA affords full sparse BLAS functionality with a maximum of flexibility in hardware and software decisions at the same time. The configuration of the whole LAMA environment can be set up by a Domain Specific Language and can therefore be reconfigured on run time. Concepts of solvers, distributions, matrix formats are exchangeable and users can switch between compute locations e.g. GPU or Intel® MIC. As new hardware architectures and features are hitting the market in much shorter time intervals than ever before it will be necessary to rely on flexible software technologies to adapt these changes and to be able to maintain existing methods in time to benefit from them and stay competitive.


Workshop on exascale and PRACE prototypes

Lessons learned from Mont-Blanc, DEEP and PRACE

The “Workshop on exascale and PRACE prototypes” took place on the 19 and 20 May in Barcelona. Almost 60 attendees came together to discuss five different prototypes, alternative cooling technologies and heat re-use.

Understanding the growing relationship among computer architectures, new programming models and new cooling technologies in order to progress towards exascale supercomputing was identified as one of the new issues to be confronted.

Alex Ramirez, coordinator of the Mont-Blanc project and one of the speakers at the Workshop, explains that “supercomputing is no longer a matter of assembling shiny and powerful pieces of hardware but it involves several important aspects, from programming techniques up to building constructions for power efficient computer facilities, crossing very different areas of science, computer science and engineering”. He also pointed out that “having in the same room experts from whole Europe sharing they experiences in several of these fields is a great moment of enrichment for the European supercomputer community”.

The agenda of the workshop was as follows: