Sharing my experience publishing the article "HorusEye: a self-supervised foundation model for generalizable X-ray tomography restoration" in Nat. Com. Sci.

Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Our paper, HorusEye: a self-supervised foundation model for generalizable X-ray tomography restoration (Nat Comput Sci (2026). https://doi.org/10.1038/s43588-026-00973-3), has finally been published in Nature Computational Science. As the first author, I want to take this opportunity to share the journey behind the paper, from writing to publication, as well as some of my own reflections along the way. I hope this post offers something beyond the paper itself: a few thoughts and lessons that may resonate in a different way.

Origins and Motivation

The starting point of this work was actually quite far from what it eventually became.

It goes back to the first project I worked on after starting my PhD, around 2021: pulmonary artery and vein segmentation (see Chu, Y., Luo, G., Zhou, L. et al. Deep learning-driven pulmonary artery and vein segmentation reveals demography-associated vasculature anatomical differences. Nature Communications 16, 2262 (2025)). At that time, the methodology was already mostly in place when our hospital collaborators sent us a new batch of data and asked us to test the model on it.

That dataset was extremely noisy. The segmentation results were dramatically worse than what we had seen during testing. Many vessel branches appeared fragmented or broken, far beyond anything we had expected. We first tried to improve the robustness of the segmentation model itself, but none of those attempts really worked. In the end, we began to wonder whether denoising the images before segmentation might be a better direction.

After digging into the literature, we found that CT denoising was already a fairly mature field, with many open-source methods available. So we went to GitHub, tried several representative approaches, and quickly realized that none of them worked well on this dataset either. Much of the noise remained, and the downstream segmentation problem was still unresolved. That was the moment we started asking a deeper question: why were these methods failing?

One obvious reason was that most existing methods relied on the log-Poisson noise assumption. Roughly speaking, this assumes that photon attenuation through matter follows a Poisson process, and that the resulting noise can therefore be modeled accordingly. But as we looked more carefully across different datasets, we found that this assumption often did not hold. Real-world noise was usually much more complex, and the correlations between pixels were often much stronger than what log-Poisson noise would suggest. That observation pushed us toward self-supervised denoising, in the hope of finding a different way forward.

The Turning Point

At the time, Deep Image Prior (DIP) was very popular. Its key idea is that neural networks tend to recover continuous signals more easily, and this inductive bias had been successfully applied to many restoration problems.

But in CT, the problem is that the noise itself can also exhibit a surprising amount of continuity. So DIP did not work as well as we had hoped. That led us to ask a more specific question: what aspect of the noise is actually discontinuous?

A natural answer was inter-slice noise.

We started to think about transferring the DIP idea from within-slice modeling to across-slice modeling, and that was where the core idea of this paper was born.

In simple terms, we wanted to use DIP to separate noise from a sequence of continuous slices. This led to the first module: we take the slices above and below as input, and the model predicts the middle slice. The residual between the prediction and the original image then largely captures the noise components that are discontinuous across adjacent slices.

Of course, that residual inevitably contains prediction errors as well. So we applied a high-frequency filter to the residual to remove the low-frequency part of the prediction error. After that, what remained could be treated, to a large extent, as noise extracted directly from real data.

To further preserve structure, we introduced a second module: a direct denoising module. We add the extracted noise to clean images, which turns the training process into something closer to a standard denoising setup.

However, the whole idea depends on one crucial step being accurate: extracting noise by predicting the target slice from its neighbors. And that is exactly where things can go wrong, because the extracted “noise” may still contain some continuous components inherited from adjacent slices.

To improve this, we introduced a mutual positive-feedback loop for co-refinement. The intuition is fairly simple. By injecting the extracted noise back into the process, we can artificially amplify inter-slice discontinuity in the noise, making the model more sensitive to continuous structural information and therefore better at isolating truly discontinuous noise. At the same time, this co-refinement process also acts as a form of data augmentation, which is important because the number of clean images available for the direct denoising module was actually quite limited. Empirically, the co-refinement strategy did help. It created a positive loop that further improved the model’s capability (see Supplementary Note 6). The full training process is described in detail in the paper, so I will not repeat it all here.

The Long and Winding Road of Writing and Publishing

Once the full pipeline was in place, the results were much better than we had expected.

The denoising performance was very strong, and when applied to that noisy dataset, it led to a clear improvement in artery-vein segmentation as well. At the time, our advisor believed the biggest highlight of the work was that it challenged the traditional Poisson-based assumption, so we aimed high in terms of journals.

Looking back now, CT denoising is a much deeper field than we realized then. Our understanding of it was still quite limited, and on top of that, we had very little experience writing papers. The result was predictable: the manuscript was rejected again and again. To be honest, I also felt that although the methodology was effective, its level of “novelty” was still somewhat limited compared with many methods papers published at top conferences. That period left me feeling lost for quite a long time. I had invested an enormous amount of effort, but I no longer knew where the project should go next.

Later, we became interested in the Nature Methods paper “Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration.” It was also about restoration, but framed as a foundation model and connected to multiple downstream tasks. That got us thinking: could we do something similar?

In fact, from the very beginning, we had already noticed that our methodology could potentially generalize across many CT modalities, including veterinary CT and even some micro-CT applications such as fossils and minerals. We also reached out to veterinary hospitals in China to seek data support. We believed this work could be naturally extended into a foundation-model framework, so we started to push in that direction: collecting as many modalities as possible, expanding the scope from conventional CT to X-ray tomography, and evaluating the model on many downstream tasks.

The results turned out to be promising, and that eventually led to the current paper.

When we first submitted to Nature Computational Science, we were genuinely excited after the first round. But when the reviews came back, it was a brutal rejection. The main issue was that Reviewer questioned the credibility of some of our results, along with the validity of several claimed innovations.

We had originally prepared a demo to showcase the method, but later removed it because of concerns about future commercialization, and decided to release the code instead. To respond to the reviewers, we ended up writing nearly 200 pages of rebuttal and adding a huge number of new experiments (many of which are now in the supplementary materials). In the end, that was what finally convinced them.

Reflections and Limitations

There is no shortage of CT denoising work in the literature, and our contribution certainly has its own limitations. If this paper were eventually published, luck was definitely part of the story as well, especially since it caught the last wave of excitement around foundation models.

From today’s perspective, I still would not call the methodology perfect.

The first issue is the long-standing problem of fidelity in image restoration. Although we designed the framework with more constraints to preserve structure, some information loss is still inevitable during denoising. A very typical example is trabecular bone: its texture can look very similar to noise, and existing denoising methods often struggle to distinguish the two. As a result, over-smoothing can appear in spinal regions.

Another issue is that the overall training framework is still fairly complex, which means its robustness could be improved further. In practice, we also found that the model is sensitive to the choice of CT window width and window level. Some structures become much easier to detect under certain settings, while under a larger window width they may become over-smoothed. For this reason, in our collaborations with hospitals, we usually recommend treating the denoising model as an optional denoising layer rather than a fixed preprocessing step, so that clinicians can choose between denoised and original images based on their needs.

These are, in many ways, common limitations of image-domain denoising and restoration methods. As for projection-domain reconstruction methods, we did not consider them at the time because of the original application scenario. Combining image-domain and projection-domain approaches is something worth exploring in future work, and it is also becoming a mainstream direction in CT denoising research.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in