42e Forum : l’IA pour le HPC et le HPC pour l’IA / AI for HPC and HPC for AI

42e Forum : l’IA pour le HPC et le HPC pour l’IA / AI for HPC and HPC for AI

6 novembre 2018, CNRS rue Michel-Ange, Paris

  • Introduction

    • Introduction (15″) : Introduction de la Journée (Antoine Petit, Président Directeur Général du CNRS)
  • 9:30 – 10:30 Introduction à l'IA (Nicolas Vayatis, CMLA)

    Résumé :

    La troisième vague de l’intelligence artificielle est beaucoup plus forte que les précédentes puisque les briques d’automatisation peuvent maintenant s’appuyer sur trois éléments clés : des corpus de données enfin constitués, des moyens de calcul croissants et la maturité des théories et des algorithmes en apprentissage automatique. Dans le domaine de l’internet et de ses fonctionnalités phares incarnées par des moteurs numériques (fouille d’information, traduction, ciblage recommandation), les capacités décuplées des algorithmes répondent de manière remarquablement pertinente aux usages quotidiens de tout internaute, quel que soit son âge, son origine sociale, son ancrage culturel ou géographique. Dans le monde physique, après les rails et les airs, il est démontré à présent que les routes peuvent également être parcourues par des véhicules sans pilotage humain. Cependant, dans certains domaines spécifiques comme ceux de l’ingénierie et de la simulation, mais aussi de la santé ou de la formation où les réalisations basées sur l’IA sont au stade de la preuve de concept et peinent à passer à l’échelle. Il est intéressant de comprendre quels sont les verrous pour l’industrialisation, ainsi que les défis technologiques et scientifiques que ce passage à l’échelle soulève. La place des experts du métier dans ce processus sera également discutée.

  • 10:30 – 11:00 Realtime capable first-principle-based fusion reactor turbulence modeling using neural networks (Karel van de Plassche, DIFFER)


    Predicting particle and energy transport in fusion reactors is essential for the interpretation of current-day fusion experiments, and in extrapolating to future reactors. In fusion-relevant plasmas turbulence is the main transport channel, and calculating this turbulent transport is computationally expensive. Fortunately, reduced turbulence models have been successful in reproducing experimental profiles in many cases, offering a 6 orders of magnitude speedup compared to their nonlinear counterparts. However, these models are still not fast enough to be real-time capable. We sketch a pathway towards circumventing the conflicting constraints of accuracy and tractability in turbulence modelling, towards real-time capability. We use the QuaLiKiz reduced model [Bourdelle PPCF 2016] to generate a large database of turbulent fluxes. Neural networks are then trained on this dataset, offering a surrogate model that when coupled to the control-oriented fast tokamak simulator RAPTOR is able to simulate 1 second of plasma evolution in 10 CPU seconds, 4 orders of magnitude faster than the original QuaLiKiz model.


    Karel van de Plassche is a fusion researcher and software engineer focusing on applying machine learning for creating surrogate models within fusion modeling frameworks, employing HPC for turbulence model dataset generation and neural network training. He gained his MSc in 2018 in Science and Technology of Nuclear Fusion at the Eindhoven University of Technology, concurrent with working on Software Defined Networking at the startup PhotonX within the COSIGN EU-FP7 project. He is currently employed at the Dutch Institute for Fundamental Energy Research (DIFFER), in the Integrated Modelling and Transport Group.

  • 11:00 – 11:30 Pause

  • 11:30 – 12:00 Machine Learning applied on time series of HPC metrics (Théo Saillant, CEA and CMLA-ENS Paris-Saclay)

    Abstract :

    In the 2020-2023 timeframe the largest supercomputing centers should have scaled up to exascale computing power. In this perspective it becomes more and more complex to overcome the end of the Moore law for CPU frequency, and it becomes more and more challenging to analyse all the produced data.

    Machine Learning provides smart methodologies to process big data and can provide the appropriate tools to improve the computing center design and monitoring.

    CEA gathers continuous production data in its multi-petascale computing infrastructures, via HPC metrics about the use of systems.

    In our application we focus on the study of byte-level (raw) communication data between cluster nodes, so as to group the computing nodes by tasks and at the same time reveal different phases of a job.

    Machine Learning appears to be particularly appropriate here. We are developing specific algorithms based on signal processing and convex optimisation to automate this task. This allows to get more relevant statistics about the jobs submitted on the supercomputer.

    More generally, AI offers many opportunities in HPC monitoring, for practical improvements of production, and possibly upfront setup and tuning of supercomputing infrastructures.

    Bio :

    Théo Saillant is currently PhD student at CEA and CMLA (ENS  Paris-Saclay), under the supervision of Nicolas Vayatis and Jean-Christophe Weill. He works on Machine Learning methods for HPC applications on CEA  computing centers. He has an engineer grade from CentraleSupelec and gained his MVA MSc in Applied Math and Learning in 2016 at ENS Paris-Saclay.

  • 12:00 – 12:30 Besoins en HPC pour l’IA (Stéphane Canu, LITIS)


    In this talk, we will present the needs of the IA research community – in particular that of machine learning, whose needs are the greatest – in terms of computing resources. We will begin by giving a brief presentation of research in artificial intelligence and its experimental specificity. Then we give a number of illustrations of computing and storage needs through significant examples and societal challenges. We will end by presenting some existing international computing center devoted to AI


    Stéphane Canu is a professor at the LITIS research laboratory and the computer science department of the National Institute of Applied Sciences of Rouen (INSA). He was the Dean of the Department of Computer Science, which he created in 1998, until 2003, when he was appointed Director of the IT Services and Facilities Unit. In 2004, he joined for a sabbatical year the machine learning group at ANU/NICTA (Canberra) with Alex Smola and Bob Williamson. Over the last five years, he has published about thirty papers in conference proceedings or journals in the fields of theory, algorithms and applications using kernel machine learning algorithms and deep learning. His research focuses on deep learning, kernel machines, regularization, machine learning applied to signal processing and optimization for machine learning.

  • 12:30 – 13:00 Point Genci (Stéphane Requena, Genci)

  • 14:15 – 14:45 Machine learning and the post-Dennard era of climate simulation (V. Balaji, Princeton University, Visiting Scientist, LSCE)


    Conventional computational hardware has reached some physical limits: the phenomenon known as ‘Dennard scaling’ gave rise to Moore’s Law, and many cycles of exponential growth in computing capacity. The consequence is that we now anticipate a computing future of increased concurrency and slower arithmetic. Earth system models, which are weak-scaling and memory-bandwidth-bound, face a particular challenge given their complexity in physical-chemical-biological space, to which mapping single algorithms or approaches is not possible. A particular aspect of such ‘multi-scale multi-physics’ models that is under-appreciated is that they are built using a combination of local process-level and global system-level observational constraints, for which the calibration process itself remains a substantial computational challenge. In this talk, we examine approaches to Earth system modeling in the post-Dennard era. The possibilities include following the industry trend toward machine learning and build models that learn; stochastic methods and emulators for fast exploration of uncertainty; using fewer bits of precision, among others. The talk will present ideas and challenges and the future of Earth system models as we prepare for a post-Dennard future.


    Dr. V. Balaji (https://www.gfdl.noaa.gov/v-balaji-homepage/) has headed the Modeling Systems Division at NOAA/GFDL since 2004, with appointments in Princeton University’s Cooperative Institute for Modeling the Earth System (CIMES), and associate faculty at the Princeton Institute for Computational Science and Engineering (PICSciE) and the Princeton Environmental Institute (PEI). With a background in physics and climate science, he has also become an expert in the area of parallel computing and scientific infrastructure. He serves on the Scientific Advisory Board of the Max-Planck Institute for Meteorology in Hamburg, and the National Center for Atmospheric Research. He is a sought-after speaker and lecturer and is committed to provide training in the use of climate models in developing nations, leading workshops for advanced students and researchers in South Africa and India.

    In 2017, he was among the first recipients of French President Macron’s ‘Make Our Planet Great Again’ award marking the second anniversary of the Paris Climate Accord.

  • 14:45 – 15:15 Post-K: A Game Changing Supercomputer for Convergence of HPC and Big Data / AI (Satoshi Matsuoka, Director Riken-CCS / Professor, Tokyo Institute of Technology)


    With rapid rise and increase of Big Data and AI as a new breed of high-performance workloads on supercomputers, we need to accommodate them at scale, and thus the need for R&D for HW and SW Infrastructures where traditional simulation-based HPC and BD/AI would converge, in a BYTES-oriented fashion. The TSUBAME3 supercomputer at Tokyo Institute of Technology which has become online in Aug. 2017, embodies various BYTES-oriented features to allow for such convergence to happen at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning.. TSUBAM3’s technologies ave been commoditized to construct one of the world’s largest BD/AI focused open and public computing infrastructure called ABCI (AI-Based Bridging Infrastructure), hosted by AIST-AIRC (AI Research Center), the largest public funded AI research center in Japan. Although not a supercomputer for HPC, its Linpack ranking is No.1 in Japan and No.5 in the world, as well as embodying 550 AI-Petaflops for AI, as well as being extremely energy efficient with novel warm water cooling pod design. Finally, Post-K is the flagship next generation national supercomputer being developed by Riken and Fujitsu in collaboration. Post-K will have hyperscale class resource in one exascale machine, with well more than 100,000 nodes and number of sever-class Arm CPU cores approaching 10 million. Post-K is slated to perform 100 times faster on some key applications c.f. its predecessor, the K-Computer, but also will likely to be the premier big data and AI/ML infrastructure.  Currently, we are conducting research to scale deep learning to more than 10,000 nodes on Post-K, where we would obtain near top GPU-class performance on each node.

  • 15:45 – 16:45 Big Data Challenge in Human Brain Research (Katrin Amunts, Institut of Neuroscience and Medicine)


    The human brain has a multi-level organisation and high complexity. New approaches are necessary to decode the brain with its 86 billion nerve cells, which form complex networks. To elucidate brain architecture at the level of nerve cells and their axons while preserving the topography of the whole organ makes it necessary to analyse data sets of several petabytes per brain, which should be actively accessible while minimizing their transport. Thus, ultra-high resolution models pose massive challenges in terms of data processing, visualisation and analysis. The Human Brain Project addresses such data challenges. It creates a cutting-edge European infrastructure to enable cloud-based collaboration among researchers coming from different disciplines around the world, and develops platforms with databases, workflow systems, petabyte storage, and supercomputers opening new perspective to decode the human brain.

  • 16:45 – 17:15 Modèles IA pour l’agro/botanique (Alexis Joly, Plant@NET)


    Automated identification of plants and animals have improved considerably in the last few years, in particular thanks to the recent advances in deep learning. In 2017, a challenge on 10,000 plant species (PlantCLEF) resulted in impressive performances with accuracy values reaching 90%. One of the most popular plant identification application, Pl@ntNet, nowadays works on 18K plant species. It accounts for million of users all over the world and already has a strong societal impact in several domains including education, landscape management and agriculture. The big challenge, now, is to train such systems at the scale of the world’s biodiversity. Therefore, we built a training set of about 12M images illustrating 275K species. Training a convolutional neural network on such a large dataset can take up to several months on a single node equipped with four recent GPUs. Moreover, to select the best performing architecture and optimize the hyper-parameters, it is often necessary to train several of such networks. Overall, this becomes a highly intensive computational task that has to be distributed on large HPC infrastructures. In order to address this problem, we used the deep learning framework Intel CAFFE coupled with Intel MLSL library. This experiment was carried out on two french national supercomputers, their access was offered by GENCI. The first experiment was carried out on Occigen@CINES, a 3.5 Pflop/s Tier-1 cluster based on Broadwell-14cores@2.6Ghz nodes. The second uses the Tier-0 «Joliot-Curie»@TGCC, a BULL-Sequana-X1000 cluster integrating 1656 nodes Intel Skylake8168-24cores@2.7GHz. We will report our experience using these two platforms.


     Alexis Joly is computer scientist at Inria working on multimedia information retrieval challenges with related interests in representation learning, computer vision and data management. He received his PhD degree in Computer Science in 2005 from the University of La Rochelle. He was involved in the steering board of several European projects (CHORUS+ coordination action, MUSCLE Network of Excellence, VITALAS & GLOCAL Integrated Projects) and many national initiatives related to audiovisual archives, web user generated contents and biodiversity informatics. Since 2011, he is co-leader of the Pl@ntNet project which develops a million-users platform dedicated to automated plant identification and monitoring. Since 2014, he is the PI of the LifeCLEF international research platform dedicated to the computer-assisted identification of living organisms (involving tens of research groups world-wide). Lately, he co-edited a Springer book on Multimedia Tools and Applications for Environmental & Biodiversity Informatics (involving  about 50 contributors from all over the world). More generally, he regularly serves on numerous scientific program and organising committees in international journals (Ecological informatics, TPAMI, Trans. on Multimedia, CVIU, MTAP) and conferences (ACM Multimedia, ACM ICMR, CVPR, CLEF). He co-authored a large number of scientific publications in these venues.

Forum sponsorisé par la société Intel