Cerebras CS-1 System Integrated Into Lassen Supercomputer

A new case study carried out by Cerebras in partnership with Lawrence Livermore National Laboratory (LLNL) details how the Cerebras CS-1 system was integrated into LLNL’s Lassen supercomputer to enable advances in nuclear fusion simulations. 

LLNL is a federal research facility in Livermore, California, and it is primarily funded by the US Department of Energy’s National Nuclear Security Administration (NNSA). According to LLNL, its mission is to strengthen US security by developing and applying world-class science, technology and engineering. 

The laboratory contains the National Ignition Facility (NIF), which carries out nuclear fusion research with the most powerful laser in the world. With that said, some of the major hurdles include expensive and time consuming inertial confinement experiments, so the lab runs simulated experiments with a multi-physics software package called HYDRA on the Lassen supercomputer. HYDRA models are validated through real-world data from NIF, which enables the models to be more accurate in predicting the outcome of real-world experiments. 

Part of HYDRA models atomic kinetics and radiation, and this part is called CRETIN. It predicts how an atom will behave under certain conditions, and CRETIN can represent tens of percent of total compute load for HYDRA.

By replacing CRETIN with a deep neural network model (DNN), or the CRETIN-surrogate, the LLNL researchers can reduce computational intensity. 

Cerebras CS-1 System

The Cerebras CS-1 system was chosen by LLNL to perform their CRETIN-surrogate inference. The system was integrated with the Lassen supercomputer, and installation took less than 20 hours. Cerebras technicians also installed a “cooling shell” and the mechanical support rails and hardware. 

Machine learning software engineers worked with LLNL colleagues to write a C ++ API that allows HYDRA code to call the CRETIN-surrogate model. The model relies on an autoencoder to compress the input data into lower dimensional representations, and these are then processed by a predictive model built with DJINN, which is a novel deep neural network algorithm. This algorithm automatically chooses an appropriate neural network architecture for the given data, and it doesn’t require a user to manually tune settings.

Results of the Case Study

The early results demonstrated that the combination of the Lassen system with the Cerebras accelerator is extremely efficient. By plugging the CS-1 system into Lassen’s InfiniBand network, 1.2 terabits-per-second bandwidth to the CS-1 system could be achieved.

Because of its 19GB of SRAM memory coupled to 400,000 AI compute cores, the CS-1 system was able to run many instances of the relatively compact DNN model in parallel. Through the combination of bandwidth and horsepower, HYDRA was able to perform inference on 18 million samples every second. 

All of this means that LLNL can now run experiments that were previously computationally intractable with the Cerebras system, and it only involves simple integration and a fraction of the cost. 

The research will now focus on steering the simulation and providing insight into the simulation while it’s running, which enables researchers to monitor and halt the run if the simulation is not working well. Each run’s results then become part of the model’s training set, so it can be continuously trained. An “active learning” model can be created, and it could optimize future runs by picking the parameters and initial boundary conditioning for the next experiment.  

Credit: Source link

Comments are closed.