In January 2017, one of my graduate students and I introduced a novel approach to search for and find gravitational waves (Daniel George & EA Huerta, Phys. Rev. D 97, 044039). The idea consisted of using convolutional neural networks to enable rapid classification and regression of gravitational wave signals in noisy time-series data streams. In that foundational article, we introduced several ideas that have played a central role in the design, training and use of deep learning for gravitational wave astrophysics. The main findings of that study were that, in the context of simulated waveforms embedded in Gaussian noise, deep learning significantly outperformed conventional machine learning methods (random forest, nearest neighbors, support vector machine, Markov model, among others), and was as sensitive as template matching, but several orders of magnitude faster, and at a fraction of the computational cost. Indeed, a single, inexpensive graphics processing unit (GPU) was enough to process simulated advanced LIGO data faster than real-time. I presented these results at the 2017 Winter Conference “The Dawning Era of Gravitational-Wave Astrophysics” at the Aspen Center for Astrophysics. I still recall numerous, engaging conversations with colleagues about the use of deep learning, and what our findings meant.
By June 2017, we had constructed the first artificial intelligence (AI) model that was capable of doing regression and parameter estimation of real gravitational wave events reported by advanced LIGO (Daniel George & EA Huerta, Physics Letters B, 778 (2018) 64-70). We were elated to find that AI was capable of handling the non-Gaussian and non-stationary nature of advanced LIGO noise. Then again, we found that AI was as sensitive as template matching, but far more computationally efficient. These findings provided strong reassurance that the use of AI for gravitational wave astrophysics was no longer a haze in the horizon. It was an idea that was taking shape and momentum across the world, since researchers in Europe and Asia were independently validating our results.
We then moved on to the next challenge. Our results had only been tested assuming a 2-D signal manifold (the masses of the binary components), and our AI models were reporting 1 misclassification for every 200 seconds of searched data. Certainly, there was room for improvement. Taking our disruptive ideas into sustained innovation required a number of elements.
Reducing time-to-insight. We realized that training AI models that describe quasi-circular, spinning, non-precessing compact binary systems (equivalent to the 4-D signal manifold considered for low-latency gravitational wave searches) would require novel advances at the interface of AI and supercomputing. This is because the training dataset needed to densely sample a 4-D signal manifold had millions of modeled waveforms, and our benchmarks indicated that this computing challenge translated into an entire month of training using one NVIDIA V100 GPU. To address this challenge, we started using the Hardware-Accelerated Learning (HAL) deep learning cluster at the National Center for Supercomputing Applications. Using the entire cluster, 64 NVIDIA V100 GPUs, we were able to reduce the training stage to 12 hours. Since we were already using physics-inspired optimization schemes, we required additional computing power to test a number of ideas. I was fortunate to get access to the Summit supercomputer in 2020. First through a Director’s Discretionary Allocation and then through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. Working with colleagues at Oak Ridge National Laboratory and NVIDIA, we were able to deploy new optimizers in Summit to accelerate the training of AI models using thousands of NVIDIA V100 GPUs. Once we achieved the milestone of training physics-inspired AI models that describe a 4-D signal manifold in just over one hour using 1536 NVIDIA V100 GPUs in Summit, we focused on developing algorithms to process long batches of advanced LIGO noise.
Improving sensitivity. We tested several ideas to improve the performance of our AI models when applied to process hours-, days-, and eventually months-long advanced LIGO datasets. We eventually found that using an ensemble of four AI models was sufficient to reduce the number of misclassifications to zero when we processed the Hanford and Livingston datasets that span the entire month of August 2017. We also found that our AI ensemble was able to identify all four binary black hole mergers previously identified in that dataset.
Computational efficiency and scalability. Our fully trained AI models for 4-D signal manifolds were still able to process advanced LIGO noise faster than real-time. Furthermore, we developed algorithms to accelerate AI inference. Using the entire HAL cluster, our AI ensemble processed an entire month of advanced LIGO noise in less than 7 minutes.
Open source and reproducibility. We released our AI ensemble and post-processing software through Argonne National Laboratory’s Data and Learning Hub for Science (DLHub), a repository that enables researchers to share and invoke published models on distributed computing resources. Argonne and University of Chicago scientists (Maksim Levental, Ryan Chard, Ben Blaiszik, and Ian Foster) connected DLHub and HAL through a novel distributed computing service (funcX), and reproduced our results by conducting an independent gravitational wave search using our AI ensemble and post-processing software.
Bringing together advances in AI, distributed supercomputing, modern computing environments and scientific data infrastructure into a unified, end-to-end framework for AI-driven gravitational wave detection was an exhilarating journey. The highlights of this interdisciplinary and multi-institutional research were recently published in Huerta et al., Accelerated, scalable and reproducible AI-driven gravitational wave detection. Nat Astron (2021).
While our article provides a clear description of how to harness disparate elements into a consistent AI framework, the actual process was highly non-linear. I cherish the weekly, and sometimes biweekly meetings, with my graduate (Asad Khan, Minyang Tian and Wei Wei) and undergraduate (Xiaobo Huang and Maeve Heflin) students. Their diverse expertise and perspectives–physics, maths and computer science–as well as spectacular creative sparks are clearly imprinted in this project.
The take home message from this study is that we are no longer arguing about the feasibility of using AI to address computational grand challenges. The combination of AI and innovative computing is moving so fast that researchers are working towards the automation of reproducible and interpretable AI for big data experiments. Strategic data infrastructure investments such as DLHub, funcX and the combination of exascale and edge computing will continue to accelerate AI advances at an ever increasing pace within the next few years. This is the time to be bold, to be creative, and to nimbly work across disciplines to enable transformational AI breakthroughs.