45e Forum : Quelles précisions pour le HPC ?

Compte tenu de la situation sanitaire le Forum se tiendra en ligne les 24 et 26 novembre 2020. Les modalités techniques seront précisées ultérieurement.

24 novembre 2020 10h-12h

Introduction (5″)
E. Audit, Maison de la simulation

Representation of numbers in today’s applications (30″)
David Defour (University of Perpignan)

Abstract Scientific programs rely on floating-point numbers with comfort, thanks to the few experts in the domain, that continue to address potential issues that arise from its usage. These issues aren’t considered due to several factors. The first being that most major issues have been addressed before they have been blown out of proportion, in other words incidents like the Pentium Bug and the crash of the Ariane V rocket haven’t occurred in a long time which misleads us to believe that those problems belong in the past! Secondly, it would seem that developers and software users aren’t aware of and do not see the need to familiarize themselves with how numbers can affect their daily usage of softwares.

For many years, in order to avoid numerical issues, the common rule of thumb used by developers of HPC software has been to rely on overestimated formats such as the double precision. This behavior was acceptable as the hardware was able to sustain acceptable performance for such applications and dataset. This corresponds to the time of the “free lunch” of numerical resources. This is no longer satisfactory for two opposites reasons. On the low-end side, some applications (ex: IA), using numbers represented on reduced formats lead to important speedup. On the high-end side, the Exascale era which is just around the corner, would mean dealing with software’s running on larger problems with longer chains of floating-point operations. In this case, the double precision might not be able to manage the important increase of numerical errors.

In this talk, we will give an overview on how numerical quantity is handled both in software and hardware, the numerical issues and trade-offs that users and developers are facing and the class of solutions that are offered to you.

Bio David Defour is an associate professor at the University of Perpignan. He is serving as the scientific coordinator for the past 8 years at the regional HPC center MESO@LR and his research interests include computer arithmetic and computer architecture. For the past 20 years, he has been working on developing solutions for « unconventional » arithmetic’s targeting multicore architecture and more specifically GPUs.

Fast and Cautious: Numerical Portability, Reproducibility, and Mixed Precision (20″)
Eric Petit (Intel)

Abstract The strong trend driven by AI workloads towards lower-precision datatypes provides a strong motivation for HPC to use sub-64bit FP arithmetic. It is now safe to assume that FP32 and BF16 are the required formats for neural network training, while inference can even work with fewer bits. The business predicted for AI will require future systems to support these usage scenarios with highest performance and efficiency. This is posed to result in a shift from mainly FP64-optimized CPUs towards much higher throughput for smaller datatypes like FP16, BF16 and FP32.
The existing and upcoming throughput-oriented processor features such as larger vectors, dedicated SIMD extensions, and new GPU-inspired processors for HPC and AI, make the usage of lower precision format much more attractive since it almost directly translates in linear gain in performance and energy efficiency and reduce requirements on memory volume and bandwidth. Achieving prescribed accuracy of results and numerical stability particularly for implementations of iterative algorithms are the main challenges to be overcome, and support by analysis and emulation tools will be needed to help identify which parts of complex HPC codes could safely be modified to use lower precision. Indeed, when such changes occurs, many industrial HPC codes reveal unknown bugs or are unable to take full advantage of the latest hardware accelerators. With ECR Lab (CEA-UVSQ-Intel), we propose the Verificarlo framework (https://github.com/verificarlo/verificarlo) that address floating point verification, debugging, and optimization, including mixed precision usage. It carries the three following objectives: reproducibility, portability across HW and SW, performance optimization.

Bio Dr. Eric Petit joined Intel in 2016, he is now part of the Math Library Pathfinding team in Hillsboro, OR. He received in 2009 his Ph.D. from University of Rennes at INRIA working on compiler technology for GPGPU. After 2 years at University of Perpignan working on computer arithmetic, he joined for 6 years as a senior researcher the University of Versailles Saint-Quentin where he leads a team participating in various EU project. In 2015, together with Dr. Pablo Oliveira from UVSQ, he launches the Verificarlo project inside the ECR lab (CEA-UVSQ-Intel). Dr. Petit’s research interests are on proposing and leveraging new hardware and software platform to accelerate numerical computing. His current focuses are on computer arithmetic and innovative runtime environments.

Reduced Numerical Precision in Weather and Climate Models (15″)
Peter Dueben (ECMWF)

Abstract In atmosphere models values of relevant physical parameters are oftenuncertain by more than 100% and weather forecast skill is decreasing significantly after a couple of days. Still, numerical operations are typically calculated with 15 decimal digits of numerical precision for real numbers. If we reduce numerical precision, we can reduce power consumption and increase computational performance significantly. Savings can be reinvested to allow simulations at higher resolution.
We aim to reduce numerical precision to the minimal level that can be justified by information content in the different components of weather and climate models. But how can we identify the optimal precision for a complex model with chaotic dynamics? We found that a comparison between the impact of rounding errors and the influence of sub-grid-scale variability can provide valuable information and that the influence of rounding errors can actually be beneficial for simulations since variability is increased. We have performed multiple studies that investigate the use of reduced numerical precision for atmospheric applications of different complexity (from Lorenz’95 to global circulation models) and studied the trade of numerical precision against performance. 

Bio Peter is the Coordinator of machine learning and AI activities at ECMWF and holds a University Research Fellowship of the Royal Society that allows him to follow his research interests in the area of numerical weather and climate modelling, machine learning, and high-performance computing. Before moving to ECMWF, he wrote his PhD thesis at the Max-Planck Institute for Meteorology in Hamburg, Germany, on the development of a finite element dynamical core for Earth System models. During the subsequent Postdoctoral Position with Professor Tim Palmer at the University of Oxford, he was focusing on the study of reduced numerical precision to speed-up simulations of Earth System models.

Scalable polarizable molecular dynamics using Tinker-HP: massively parallel implementations on CPUs and GPUs (15″)
Jean-Philip Piquemal (Laboratoire de Chimie Théorique, Sorbonne Université)

Abstract Tinker-HP is a CPU based, double precision, massively parallel package dedicated to long polarizable molecular dynamics simulations and to polarizable QM/MM. Tinker-HP is an evolution of the popular Tinker package (http://tinker-hp.ip2ct.upmc.fr/) that conserves it simplicity of use but brings new capabilities allowing performing very long molecular dynamics simulations on modern supercomputers that use thousands of cores. Tinker-HP proposes a high performance scalable computing environment for polarizable force fields giving access to large systems up to millions of atoms. I will present the performances and scalability of the software in the context of the AMOEBA force field and show the incoming new features such as the “fully polarizable” QM/MM capabilities. As the present implementation is clearly devoted to petascale applications, the applicability of such an approach to future exascale machines will be exposed and future directions of Tinker-HP discussed including the new GPUs-based implementation that uses mixed precision.

Bio Jean-Philip Piquemal est Professeur de classe exceptionnelle en chimie théorique à Sorbonne Université et Directeur du Laboratoire de Chimie Théorique (LCT) de Sorbonne Université (UMR CNRS 7616). Il est également membre junior de l’IUF. Récemment, il a fait partie des équipes lauréates du Prix Atos-Joseph Fourier en calcul haute performance ainsi que d’un financement ERC Synergy pour le projet Extreme-Scale Mathematics for Computational Chemistry

26 novembre 2020 10h-12h

Introduction (5″)
E. Audit, Maison de la simulation

Prix EDF Paul Caseau : Space-Time Parallel Strategies for the Numerical Simulation of Turbulent Flows (10″)
Thibaut Lunet (University of Geneva)

Abstract Unsteady turbulent flow simulations using the Navier-Stokes equations are complex and computationally demanding problems, especially when using Direct Numerical Simulation (DNS) for highly accurate solution. The development of supercomputer architectures over the last century allowed to use massively space parallel computation to perform DNS of extremely large size (e.g DNS of Turbulent Channel Flow by Lee and Moser, 2015, up to 600 Billions degrees of freedoms). However, new supercomputer architectures available in the next decade will be characterized with increased computational power based on a larger number of cores rather than significantly increased CPU frequency (e.g Summit, current top super-computing system, 2.5 Millions cores). Hence most of the current generation CFD software will face critical efficiency issues if bounded to massive spatial parallelization (O(10^{7-8}) cores).

Since six decades, an alternative solution to exclusive space parallelization has been investigated, and consists on adding parallel decomposition in the time dimension, namely Parallelization in Time (PinT). It has received renewed attention in the last two decades with the invention of the Parareal algorithm (Lions, Maday and Turinici), and the development of other PinT algorithms have shown that they could be an attractive alternative to enhance efficiency on multi-cores architectures.

In this talk, we introduce the basic ideas of PinT algorithms, and present a short state of the art of current solutions with their associated results. Then, we detail the main challenges when applying PinT algorithms to enable space-time parallelization for large scale DNS of turbulent flows, and illustrate this by some applications. Finally, we conclude by giving prospects on future developments towards generalized use of PinT methods within the next generation CFD softwares.

Bio Thibaut Lunet is a post doctorate at the University of Geneva, in the team of M. Gander. He received a Ph.D. in Applied Mathematics and Computational Fluids Dynamics, after conducting a doctorate at ISAE-Supaero and CERFACS (Toulouse), supervised by S. Gratton, J. Bodart and X. Vasseur. The thesis, focusing on the development of space-time parallel strategies for turbulent flow simulation, was awarded the Paul Caseau Prize, by EDF and the French Academy of Technology, in November 2020.

Numerical Debugging and Optimization of High-Performance Scientific Computing Codes (15″)
François Févotte (TriScale innov)

Abstract The analysis of Floating-Point-related issues in HPC codes is becoming a topic of major interest: parallel computing and code optimization often break the reproducibility of numerical results across machines, compilers and even executions of the same program. At the same time, optimizing the use of FP precision is often key to achieving high performance on modern hardware: using smaller-precision FP numbers allows reducing the memory bandwidth usage as well as increasing the number of simultaneous FP operations performed by a single SIMD instructions. However, it is important to keep in mind that optimizing mixed precision programs should always be considered as a search for the optimal balance between results accuracy and run times. The quantification of FP-related losses of accuracy is an essential part of this process.
This talk presents how the Verrou tool can help during all stages of the FP analysis of HPC codes. Examples involving industrial software such as code_aster illustrate how Verrou’s implementation of stochastic arithmetic can be used to detect, reproduce and quantify FP-related errors (diagnosis) and relate such errors with parts of the analyzed source code (debugging).
Later stages of the process will also be briefly mentioned. For software that are shown to be stable, Verrou can emulate the use of reduced precision, therefore allowing the same techniques and tools to be re-used for mixed-precision optimization. Where numerical instabilities are found, algorithmic techniques can be used to mitigate their effect; we show in particular how compensated summation and dot product algorithms can be implemented with minimal loss of performance on modern hardware.

Bio François Févotte is co-founder and Chief Scientist of TriScale innov, a start-up dedicated to technical and scientific computing. Prior to that, he graduated in 2008 with a PhD in applied mathematics from the CEA and spent more than ten years with a team dedicated to numerical analysis and modeling at EDF R&D. François’ approach aims at achieving high performance in scientific software by focusing the effort where it matters most. This includes using state-of-the art numerical techniques in combination with FP-aware algorithms and implementations targeting modern hardware architectures.

The need of precision level in medical physics simulations (15″)
Julien Bert (CHRU Brest – LaTIM)

Abstract Monte Carlo simulations (MCS) play a key role in medical applications, both for imaging and radiotherapy by accurately modelling the different physical processes and interactions between particles and matter. However, MCS are also associated with long execution times, which is one of the major issues preventing their use in routine clinical practice for both image reconstruction and dosimetry applications. Within with context we are developing methods to speed-up and parallelize MCS, especially using GPU. Results from the MCS need different levels of precision according the target applications. In addition, precision is also depending of the algorithms used inside the MCS core engine to ensure physics calculation and particle propagation. This presentation will talk about the different needs of computing precision in medical physics applications. We will discuss the advantage and the inconvenient to have multiple or a single level of precision within the same simulation software.

Bio J. Bert received a Ph.D. in control engineering in 2007, and a Habilitation to conduct researches (HDR) in health technologies in 2018. He holds a permanent research scientist position at the Brest Regional University Hospital and he is member of the LaTIM – INSERM UMR1101. His main research interest is in image-guided therapy especially in medical physics. This include medical applications in external beam and intra-operative radiotherapy and also in X-ray guided interventional radiology. Within this context, he is leading a research group of 20 peoples. A group that is implicated in several national and European research projects, working on multidisciplinary domains: treatment planning system, Monte-Carlo simulation, image processing, robotics, computer-vision and virtual reality

Précision pour la QCD (librairie Quda de Nvidia)  (15″)
Mathias Wagner (Nvidia)

Precision auto-tuning and control of accuracy in high performance simulations (20″)
Fabienne Jézéquel (Panthéon-Assas University)

Abstract In the context of high performance computing, new architectures, becoming more and more parallel, offer higher floating-point computing power. Thus, the size of the problems considered (and with it, the number of operations) increases, becoming a possible cause for increased uncertainty. As such, estimating the reliability of a result at a reasonable cost is of major importance for numerical software.

In this talk we present an overview of different approaches for accuracy analysis (guaranteed or probabilistic ones) and the related software. We also describe methods to improve the results accuracy. We present the principles of Discrete Stochastic Arithmetic (DSA) that enables one to estimate rounding errors in simulation codes. DSA can be used to control the accuracy of programs in half, single, double and/or quadruple precision via the CADNA library, and also in arbitrary precision via the SAM library. Thanks to DSA, the accuracy estimation and the detection of numerical instabilities can be performed in parallel codes on CPU and on GPU. Most numerical simulations are performed in double precision, and this can be costly in terms of computing time, memory transfer and energy consumption. We present tools for floating-point auto-tuning that aim at reducing the numerical formats used in simulation programs

Bio Fabienne Jézéquel is Associate Professor in Computer Science in Panthéon-Assas University in Paris, France. She leads the PEQUAN (PErformance and QUality of Algorithms for Numerical applications) team in the Computer Science Laboratory LIP6 of Sorbonne University in Paris. She received from Pierre-and-Marie Curie University in Paris a PhD in 1996 and an HDR (Habilitation à Diriger des Recherches) in 2005. Her work is centered around designing efficient and reliable numerical algorithms on various parallel architectures. She is particularly interested in optimizing convergence criteria of iterative algorithms by taking into account rounding errors.