How a Deep Learning Algorithm discovered 8 New SETI candidates

In July 2021, the Breakthrough Listen Initiative performed a deep learning based SETI search for radio technosignatures uncovering 8 new signals previously missed by classical techniques. This is the story of how we are using artificial intelligence in the search for extraterrestrial intelligence.
Published in Astronomy
How a Deep Learning Algorithm discovered 8 New SETI candidates
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

On an uneventful evening of August 2021, I was on an arduous four day long cross country drive from Vancouver to Toronto with my family, when I decided to check some preliminary results on an algorithm that I’ve set to run while I was away. I hooked up to the spotty wifi of some motel in the middle of Manitoba and began scrolling.

That summer I was working on a Deep Learning based search algorithm for radio technosignatures to help investigate the prevalence of extraterrestrial intelligence (ETI) from nearby stars. I was building a new addition to our classical search algorithm, algorithms that are now older than my parents. The goal for this shiny new algorithm is to run faster and to produce better candidates by leveraging AI and modern computer vision techniques. Nonetheless I was expecting to find radio frequency interference (RFI), junk that my algorithm had been returning for months prior to this. Instead, I had found something much more interesting.

Examples of "junk" that my algorithm was initially picking up.

My algorithm started to find signals, most importantly ones that matched closely to simulated ETI signals. When I first saw this I dismissed it. I closed my laptop and headed to bed exhausted by the thought of two more days of driving awaiting my family.  

Figure a) shows a completely simulated ideal SETI signal. Figure b) shows a real observation.
Figure a) shows an simulated ideal (perfect) SETI signal. Figure b) shows a real observation we detected.

When I got back to Toronto I started compiling my results. With the help of my colleague Leo Rizk, my algorithm had returned 30,000 results, each requiring me to manually inspect. As the undergrad and the one who built this thing, I suppose this was a rite of passage. In total we had searched through 150 TB of data of 820 nearby stars, on a dataset that had previously been searched through in 2017 by classical techniques but had been labelled as devoid of interesting signals. I began reviewing all the results by eye and there it was again. 

That same signal. Weird. Then there it was again but this time it looked different. These came from a different star. Then again, and again. I began writing them down. Soon my list had grown to more than 10 rather suspicious looking signals. I thought this had to be interference, or it must’ve been picked up by previous searches. Looking them up in our database, I found no matches. I told my supervisor Cherry Ng about this and we were both confused. Were we the first to ever look at these signals?

Figure shows the top 8 candidates and what they look like in our data. Note they look similar to our ideal SETI candidate!

Funnily enough, these looked almost perfect. Many of the signals had all the key characteristics we were looking for. 

  1. The signals were narrow band, meaning they had narrow spectral width, on the order of just a few Hz.  This is important because natural phenomena are much more broadband.
  2. The signals had non-zero drift rates, which means the signals had a slope.  This could indicate a signal’s origin had some relative acceleration with our receivers, hence not local to the radio observatory.
  3. The signals appeared in ON-source observations and not in OFF-source observations.  If a signal originates from a specific celestial source, it appears when we point our telescope toward the target and disappears when we look away.  Human radio interference usually appears in ON and OFF observations due to the source being close by.

We were able to rule a few signals out that didn't pass our visual checks, but ultimately we were still left with eight signals of interest - the eight appearing in our manuscript.

When we showed our colleagues working in the  Breakthrough Listen program, we were still scratching our heads. These were all different signals, of different drift rates originating from different stars and they weren’t picked up by our classical algorithms? This was news. Here we successfully demonstrated for the first time, a complete end-to-end search algorithm using deep learning that discovered signals that no classical algorithms were able to pick up. It finally worked!

Originally this project began nearly two years ago. Back then, I was still in high school sitting in my senior computer science class. I was given a final software project, the goal of which was to come up with an idea and pair up with classmates to work on an app or program to solve a problem. I had previously taught myself machine learning in 11th grade and having an interest in SETI/astronomy I proposed this idea to fellow classmates. Unfortunately, I only received strange stares so I decided to do it alone. 

I worked tirelessly, and eventually I had built what became the basis for this paper’s work. At the end of 2019 and into 2020, I began cold emailing everyone at the UC Berkeley SETI group and with a few encouraging exchanges I had faith in my direction. You can still find my high school project on Github here.

Fundamentally what I came up with is a way of leveraging unsupervised and supervised learning paired with a novel transfer learning method. I found that regular supervised models were too restrictive in searching for signals of interest. These methods found candidates that only matched simulated signals they were trained on, and couldn’t generalise to arbitrary anomalies. On the other hand the unsupervised methods were uncontrollable, and they basically identified anything with some slightly weird signal as anomalous, thus returning mostly junk. I found that by intermediately swapping the weights during the training phase of a supervised and an unsupervised model we could balance the best of both worlds. Eventually, in the algorithm ultimately implemented in this paper, this semi-supervised technique evolved into an autoencoder plus random forest technique. Although my high school experiments were unsuccessful, mostly because I was running code locally on my  laptop, the groundwork had been set. 

I stuck with this project and when I graduated high school began working with the Breakthrough Listen team where I was supervised by Dr. Steve Croft and Dr. Cherry Ng. In 2021, I received funding for this project from the Laidlaw Foundation, and with the support of my supervisors I was off. I spent two months battling RFI, and after orchestrating an armada of 12 GPU’s running non-stop, full throttle, for two weeks, we came out of the trenches with the results in our paper: a successful search for technosignatures using deep learning. We found candidates that no other algorithm has previously found. 

Looking forward, today we’re scaling this search effort to 1 million stars with the MeerKAT telescope and beyond. We believe that work like this will help accelerate the rate we’re able to make discoveries in our grand effort to answer the question  “are we alone in the universe?”. Although I, like many others, have wondered if we’ll ever find that elusive technosignature needle in the vast haystack of anthropogenic interference, I hope that readers of our paper will agree that the new capabilities provided by deep learning provide grounds for new excitement and optimism in the search for extraterrestrial intelligence.

To learn more technical details about the project please see here

Acknowledgements

I couldn't have done this work without my wonderful co-authors. Special thanks to Dr. Cherry Ng for supervising me and for co-writing our paper. Thanks to Leandro Rizk for his instrumental help in developing the visualisations we see in our paper. And thanks to Dr. Steve Croft, Dr. Andrew Siemion and the entire UC Berkeley SETI research centre for the helpful comments and for taking a chance on that high school kid who was just having fun.

Breakthrough Listen is managed by the Breakthrough Initiatives, sponsored by the Breakthrough Prize Foundation. We are grateful to the staff of the Green Bank Observatory for their help with installation and commissioning of the Breakthrough Listen backend instrument and extensive support during Breakthrough Listen observations. I (Peter Ma) was supported by the Laidlaw foundation which has funded this project as part of the undergraduate research and leadership funding initiative. We thank Yuhong Chen for his helpful discussion on the Machine Learning framework. I (Peter Ma) would also like to thank the kind support of Dr. Laurance Doyle and Dr. Sarah Marzen for their generous guidance and encouragement to him when he first began his research career.  

Contact

Find more about Breakthrough Initiatives here. Find more about my personal research see here.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Astronomy, Cosmology and Space Sciences
Physical Sciences > Physics and Astronomy > Astronomy, Cosmology and Space Sciences

Related Collections

With collections, you can get published faster and increase your visibility.

Progress towards the Sustainable Development Goals

The year 2023 marks the mid-point of the 15-year period envisaged to achieve the Sustainable Development Goals, targets for global development adopted in September 2015 by all United Nations Member States.

Publishing Model: Hybrid

Deadline: Ongoing

Wind, water and dust on Mars

In this Collection, we bring together recent work, and invite further contributions, on the nature and characteristics of the Martian surface, the processes at play, and the environmental conditions both in the present-day and in the distant past.

Publishing Model: Hybrid

Deadline: Ongoing