© Simon Raffeiner/SCC
Author profile picture

AI has become an indispensable part of our lives and is helping to determine the future. For example in the development of self-driving cars, autonomous robot systems, new materials, improving climate models, or optimizing energy systems. Or in the fight against the corona pandemic. Artificial intelligence (AI) and machine learning (ML) play a central role in this. Scientists at the Karlsruhe Institute of Technology (KIT) in Germany have been doing research in these two fields for a very long time.

In order to further advance research, the Helmholtz Artificial Intelligence Cooperation Unit (HAICU) has now become the first location in Europe to commission the innovative NVIDIA DGX A100 AI system. The Steinbuch Center for Computing (SCC) at KIT has, therefore “made use of the Karlsruhe (HoreKa) high-performance computer currently being built at SCC” and has also entered into a partnership with the processor market leader NVIDIA. This will give access to the company’s most advanced AI systems.

Five quadrillion computing operations per second

The aim is “to identify and exploit similarities between applications and to advance the development of new methods,” the researchers explain. “One thing above all that is essential for this – is extremely high computing power,” says Martin Frank, director at the Steinbuch Centre for Computing (SCC) at KIT and a professor at the Institute for Applied and Numerical Mathematics (IANM) also at KIT. He adds: “Conventional computer systems reach their limits when training an AI using large datasets. However, many AI algorithms can be accelerated by utilizing special hardware. Access to these kinds of computer systems gives our researchers a decisive competitive edge nowadays.”

The three newly installed DGX A100 computer systems are high-performance servers each with eight NVIDIA A100 Tensor Core GPUs. The total computing power is 5 AI-PetaFLOP/s. That is five quadrillion computing processes per second. The scientists point out that compared to its predecessor and their fastest model to date, the NVIDIA V100, this represents an acceleration in speed by a factor of five. In addition, the new processors were given a “significantly larger and faster core memory.” The processing rate of the special NVLink network between the individual chips has been increased to 600 gigabits per second. “The researchers are now able to train significantly larger neural networks than before in a much shorter time with even larger amounts of data,” Frank states.

Help in the fight against the corona pandemic

Thanks to NVIDIA’s new systems, researchers can optimize their applications for the next-generation KIT supercomputer, HoreKa. It is expected to be one of the ten fastest computers in Europe when it is put into operation in the summer of 2021. “Artificial intelligence and machine learning can drastically accelerate research in all application areas. In other words, where the pressing problems of humanity can be solved,” explains Marc Hamilton, NVIDIA vice president of development. “Our new DGX A100 systems with Tensor Core GPUs and NVIDIA Mellanox HDR InfiniBand connections support this accelerated research and will propel scientific progress across a broad range of key research areas.”

KIT’s new AI systems could also be used in the fight against the corona pandemic. “For example, by speeding up the detection of infection hotspots, predicting spread patterns, or supporting medical personnel in the analysis of X-ray images. The KIT and the Helmholtz Association already have related AI research initiatives.

Cover picture: The new DGX A100 computer systems are high-performance servers each with eight NVIDIA A100 Tensor Core GPUs. Together, the eight accelerators provide a computing power of 5 AI-PetaFLOP/s (Photo: Simon Raffeiner/SCC)