45e Forum : Quelles précisions pour le HPC ?

Jeudi 2 avril à la Maison de la Simulation (comment y aller)

En raison de l’épidémie de Covid-19 le Forum est reporté à une date ultérieure

Le formulaire d’inscription est ici.

9:15 – 9:30 Introduction

E. Audit, Maison de la simulation

9:30 – 10:15 Representation of numbers in today’s applications

David Defour (University of Perpignan)

Abstract

Scientific programs rely on floating-point numbers with comfort, thanks to the few experts in the domain, that continue to address potential issues that arise from its usage. These issues aren’t considered due to several factors. The first being that most major issues have been addressed before they have been blown out of proportion, in other words incidents like the Pentium Bug and the crash of the Ariane V rocket haven’t occurred in a long time which misleads us to believe that those problems belong in the past! Secondly, it would seem that developers and software users aren’t aware of and do not see the need to familiarize themselves with how numbers can affect their daily usage of softwares. 

For many years, in order to avoid numerical issues, the common rule of thumb used by developers of HPC software has been to rely on overestimated formats such as the double precision. This behavior was acceptable as the hardware was able to sustain acceptable performance for such applications and dataset. This corresponds to the time of the “free lunch” of numerical resources. This is no longer satisfactory for two opposites reasons. On the low-end side, some applications (ex: IA), using numbers represented on reduced formats lead to important speedup. On the high-end side, the Exascale era which is just around the corner, would mean dealing with software’s running on larger problems with longer chains of floating-point operations. In this case, the double precision might not be able to manage the important increase of numerical errors.

In this talk, we will give an overview on how numerical quantity is handled both in software and hardware, the numerical issues and trade-offs that users and developers are facing and the class of solutions that are offered to you.

Bio

David Defour is an associate professor at the University of Perpignan. He is serving as the scientific coordinator for the past 8 years at the regional HPC center MESO@LR and his research interests include computer arithmetic and computer architecture. For the past 20 years, he has been working on developing solutions for « unconventional » arithmetic’s targeting multicore architecture and more specifically GPUs. 

10:15 – 10:45 Precision auto-tuning and control of accuracy in high performance simulations

Fabienne Jézéquel (Panthéon-Assas University)

Abstract

In the context of high performance computing, new architectures, becoming more and more parallel, offer higher floating-point computing power. Thus, the size of the problems considered (and with it, the number of operations) increases, becoming a possible cause for increased uncertainty. As such, estimating the reliability of a result at a reasonable cost is of major importance for numerical software. In this talk we describe the principles of Discrete Stochastic Arithmetic (DSA) that enables one to estimate rounding errors in simulation codes. DSA can be used to control the accuracy of programs in half, single, double and/or quadruple precision via the CADNA library (http://cadna.lip6.fr), and also in arbitrary precision via the SAM library (http://www-pequan.lip6.fr/~jezequel/SAM). Thanks to DSA, the accuracy estimation and the detection of numerical instabilities can be performed in parallel codes on CPU and on GPU. Most numerical simulations are performed in double precision, and this can be costly in terms of computing time, memory transfer and energy consumption. We also present the PROMISE tool (PRecision OptiMISE, http://promise.lip6.fr) that aims at reducing in numerical programs the number of double precision variable declarations in favor of single precision ones, taking into account a requested accuracy of the results.

Bio

Fabienne Jézéquel is Associate Professor in Computer Science in Panthéon Assas University in Paris, France. She leads the PEQUAN (PErformance and QUality of Algorithms for Numerical applications) team in the Computer Science Laboratory LIP6 of Sorbonne University in Paris. She received from Pierre-and-Marie Curie University in Paris a PhD in 1996 and an HDR (Habilitation à Diriger des Recherches) in 2005. Her work is centered around designing efficient and reliable numerical algorithms on various parallel architectures. She is particularly interested in optimizing convergence criteria of iterative algorithms by taking into account rounding errors.

10:45 – 11:15 Coffee break

11:15 – 11:45 Numerical Debugging and Optimization of High-Performance Scientific Computing Codes

François Févotte (TriScale innov)

Abstract
The analysis of Floating-Point-related issues in HPC codes is becoming a topic of major interest: parallel computing and code optimization often break the reproducibility of numerical results across machines, compilers and even executions of the same program. At the same time, optimizing the use of FP precision is often key to achieving high performance on modern hardware: using smaller-precision FP numbers allows reducing the memory bandwidth usage as well as increasing the number of simultaneous FP operations performed by a single SIMD instructions. However, it is important to keep in mind that optimizing mixed precision programs should always be considered as a search for the optimal balance between results accuracy and run times. The quantification of FP-related losses of accuracy is an essential part of this process.
This talk presents how the Verrou tool can help during all stages of the FP analysis of HPC codes. Examples involving industrial software such as code_aster illustrate how Verrou’s implementation of stochastic arithmetic can be used to detect, reproduce and quantify FP-related errors (diagnosis) and relate such errors with parts of the analyzed source code (debugging).
Later stages of the process will also be briefly mentioned. For software that are shown to be stable, Verrou can emulate the use of reduced precision, therefore allowing the same techniques and tools to be re-used for mixed-precision optimization. Where numerical instabilities are found, algorithmic techniques can be used to mitigate their effect; we show in particular how compensated summation and dot product algorithms can be implemented with minimal loss of performance on modern hardware.

Bio
François Févotte is co-founder and Chief Scientist of TriScale innov, a start-up dedicated to technical and scientific computing. Prior to that, he graduated in 2008 with a PhD in applied mathematics from the CEA and spent more than ten years with a team dedicated to numerical analysis and modeling at EDF R&D. François’ approach aims at achieving high performance in scientific software by focusing the effort where it matters most. This includes using state-of-the art numerical techniques in combination with FP-aware algorithms and implementations targeting modern hardware architectures.

11:45 – 12:15 Accelerating scientific discovery with CUDA mixed precision architecture

François Courteille (NVIDIA)

Abstract

Availability of reduced precision floating-point arithmetic, which provides advantages in speed, energy, communication costs and memory usage over single and double precisions is changing the landscape of HPC. Initially motivated by deep learning the hardware support of IEEE half precision and bfloat16 arithmetic is opening a new processing path in scientific computing by enabling mixed precision algorithms that work in single or double precision but carry out part of a computation in reduced precision. In this talk after briefly touching tools to analyze application precision sensitivity we will introduce NVIDIA hardware (Tensor cores) architecture and the software environment to seamlessly implement and deploy mixed precision applications in HPC and/or Machine Learning fields; use cases will be presented.

Bio

François Courteille is a principal solution architect at NVIDIA working with customers to develop accelerated High Performance Computing and Machine Learning solutions. He is particularly focused on applications from education and research and energy industry verticals. Prior to joining NVIDIA, François spent three decades as technical leader for HPC companies, Control Data Corporation, Evans & Sutherland, Convex, NEC Corporation, where he ported and tuned a broad portfolio of HPC application software on large scale parallel and vector systems. He has a MS degree in Computer Science from Institut National des Sciences Appliquées (INSA) de Lyon, France.

12:15 – 12:45 Fast and Cautious: Numerical Portability, Reproducibility, and Mixed Precision

Eric Petit (Intel)

Abstract

The strong trend driven by AI workloads towards lower-precision datatypes provides a strong motivation for HPC to use sub-64bit FP arithmetic. It is now safe to assume that FP32 and BF16 are the required formats for neural network training, while inference can even work with fewer bits. The business predicted for AI will require future systems to support these usage scenarios with highest performance and efficiency. This is posed to result in a shift from mainly FP64-optimized CPUs towards much higher throughput for smaller datatypes like FP16, BF16 and FP32.
The existing and upcoming throughput-oriented processor features such as larger vectors, dedicated SIMD extensions, and new GPU-inspired processors for HPC and AI, make the usage of lower precision format much more attractive since it almost directly translates in linear gain in performance and energy efficiency and reduce requirements on memory volume and bandwidth. Achieving prescribed accuracy of results and numerical stability particularly for implementations of iterative algorithms are the main challenges to be overcome, and support by analysis and emulation tools will be needed to help identify which parts of complex HPC codes could safely be modified to use lower precision. Indeed, when such changes occurs, many industrial HPC codes reveal unknown bugs or are unable to take full advantage of the latest hardware accelerators. With ECR Lab (CEA-UVSQ-Intel), we propose the Verificarlo framework (https://github.com/verificarlo/verificarlo) that address floating point verification, debugging, and optimization, including mixed precision usage. It carries the three following objectives: reproducibility, portability across HW and SW, performance optimization.

Bio

Eric Petit joined Intel in 2016 as a senior research engineer in the Exascale Computing Research Lab (CEA-UVSQ-Intel). He received in 2009 his Ph.D. from University of Rennes at INRIA working on compiler technology for GPGPU. After 2 years at University of Perpignan working on computer arithmetic, he joined for 6 years as a senior researcher the University of Versailles Saint-Quentin where he leads a team participating in various EU project. Dr. Petit’s current research interests are on preparing HPC applications for future exascale systems. His focuses are on computer arithmetic and innovative runtime environment

12:45 – 14:00 Lunch

14:00 – 14:15 Space-Time Parallel Strategies for the Numerical Simulation of Turbulent Flows

Thibaut Lunet (University of Geneva)

Abstract

Unsteady turbulent flow simulations using the Navier-Stokes equations are complex and computationally demanding problems, especially when using Direct Numerical Simulation (DNS) for highly accurate solution. The development of supercomputer architectures over the last century allowed to use massively space parallel computation to perform DNS of extremely large size (e.g DNS of Turbulent Channel Flow by Lee and Moser, 2015, up to 600 Billions degrees of freedoms). However, new supercomputer architectures available in the next decade will be characterized with increased computational power based on a larger number of cores rather than significantly increased CPU frequency (e.g Summit, current top super-computing system, 2.5 Millions cores). Hence most of the current generation CFD software will face critical efficiency issues if bounded to massive spatial parallelization (O(10^{7-8}) cores).

Since six decades, an alternative solution to exclusive space parallelization has been investigated, and consists on adding parallel decomposition in the time dimension, namely Parallelization in Time (PinT). It has received renewed attention in the last two decades with the invention of the Parareal algorithm (Lions, Maday and Turinici), and the development of other PinT algorithms have shown that they could be an attractive alternative to enhance efficiency on multi-cores architectures.

In this talk, we introduce the basic ideas of PinT algorithms, and present a short state of the art of current solutions with their associated results. Then, we detail the main challenges when applying PinT algorithms to enable space-time parallelization for large scale DNS of turbulent flows, and illustrate this by some applications. Finally, we conclude by giving prospects on future developments towards generalized use of PinT methods within the next generation CFD softwares.

Bio

Thibaut Lunet is a post doctorate at the University of Geneva, in the team of M. Gander. He received a Ph.D. in Applied Mathematics and Computational Fluids Dynamics, after conducting a doctorate at ISAE-Supaero and CERFACS (Toulouse), supervised by S. Gratton, J. Bodart and X. Vasseur. The thesis, focusing on the development of space-time parallel strategies for turbulent flow simulation, was awarded the Paul Caseau Prize, by EDF and the French Academy of Technology, in November 2020.

14:15 – 14:30 Point Europe

Jean-Philippe Nominé (CEA)

14:30 – 14:45 Plan stratégique et PRACE 3

Stéphane Requena (Genci)

14:45 – 15:15 Scalable polarizable molecular dynamics using Tinker-HP: massively parallel implementations on CPUs and GPUs

Jean-Philip Piquemal (Laboratoire de Chimie Théorique, Sorbonne Université)

Abstract

Tinker-HP is a CPU based, double precision, massively parallel package dedicated to long polarizable molecular dynamics simulations and to polarizable QM/MM. Tinker-HP is an evolution of the popular Tinker package (http://tinker-hp.ip2ct.upmc.fr/) that conserves it simplicity of use but brings new capabilities allowing performing very long molecular dynamics simulations on modern supercomputers that use thousands of cores. Tinker-HP proposes a high performance scalable computing environment for polarizable force fields giving access to large systems up to millions of atoms. I will present the performances and scalability of the software in the context of the AMOEBA force field and show the incoming new features such as the “fully polarizable” QM/MM capabilities. As the present implementation is clearly devoted to petascale applications, the applicability of such an approach to future exascale machines will be exposed and future directions of Tinker-HP discussed including the new GPUs-based implementation that uses mixed precision.

Bio

Jean-Philip Piquemal est Professeur de classe exceptionnelle en chimie théorique à Sorbonne Université et Directeur du Laboratoire de Chimie Théorique (LCT) de Sorbonne Université (UMR CNRS 7616). Il est également
membre junior de l’IUF. Récemment, il a fait partie des équipes lauréates
du Prix Atos-Joseph Fourier en calcul haute performance ainsi que d’un
financement ERC Synergy pour le projet Extreme-Scale Mathematics for
Computational Chemistry.

15:15 – 15:45 Reduced Numerical Precision in Weather and Climate Models

Peter Dueben (ECMWF)

Abstract

In atmosphere models values of relevant physical parameters are oftenuncertain by more than 100% and weather forecast skill is decreasing significantly after a couple of days. Still, numerical operations are typically calculated with 15 decimal digits of numerical precision for real numbers. If we reduce numerical precision, we can reduce power consumption and increase computational performance significantly. Savings can be reinvested to allow simulations at higher resolution.
We aim to reduce numerical precision to the minimal level that can be justified by information content in the different components of weather and climate models. But how can we identify the optimal precision for a complex model with chaotic dynamics? We found that a comparison between the impact of rounding errors and the influence of sub-grid-scale variability can provide valuable information and that the influence of rounding errors can actually be beneficial for simulations since variability is increased. We have performed multiple studies that investigate the use of reduced numerical precision for atmospheric applications of different complexity (from Lorenz’95 to global circulation models) and studied the trade of numerical precision against performance. 

Bio

Peter is the Coordinator of machine learning and AI activities at ECMWF and holds a University Research Fellowship of the Royal Society that allows him to follow his research interests in the area of numerical weather and climate modelling, machine learning, and high-performance computing. Before moving to ECMWF, he wrote his PhD thesis at the Max-Planck Institute for Meteorology in Hamburg, Germany, on the development of a finite element dynamical core for Earth System models. During the subsequent Postdoctoral Position with Professor Tim Palmer at the University of Oxford, he was focusing on the study of reduced numerical precision to speed-up simulations of Earth System models.

15:45 – 16:15 Coffee break

16:15 – 16:45 The need of precision level in medical physics simulations

Julien Bert (CHRU Brest – LaTIM)

Abstract

Monte Carlo simulations (MCS) play a key role in medical applications, both for imaging and radiotherapy by accurately modelling the different physical processes and interactions between particles and matter. However, MCS are also associated with long execution times, which is one of the major issues preventing their use in routine clinical practice for both image reconstruction and dosimetry applications. Within with context we are developing methods to speed-up and parallelize MCS, especially using GPU. Results from the MCS need different levels of precision according the target applications. In addition, precision is also depending of the algorithms used inside the MCS core engine to ensure physics calculation and particle propagation. This presentation will talk about the different needs of computing precision in medical physics applications. We will discuss the advantage and the inconvenient to have multiple or a single level of precision within the same simulation software.

Bio

J. Bert received a Ph.D. in control engineering in 2007, and a Habilitation to conduct researches (HDR) in health technologies in 2018. He holds a permanent research scientist position at the Brest Regional University Hospital and he is member of the LaTIM – INSERM UMR1101. His main research interest is in image-guided therapy especially in medical physics. This include medical applications in external beam and intra-operative radiotherapy and also in X-ray guided interventional radiology. Within this context, he is leading a research group of 20 peoples. A group that is implicated in several national and European research projects, working on multidisciplinary domains: treatment planning system, Monte-Carlo simulation, image processing, robotics, computer-vision and virtual reality.