Behind the Paper

What if We Identified Causes From Their Effects? — Assimilative Causal Inference

Assimilative causal inference is a new mathematical framework where causes are traced backwards from their observed effects, shifting the classical paradigm of predictive causality.

Where causation is concerned, a grain of wise subjectivity tells us more about the real world than any amount of objectivity.

― Judea Pearl, The Book of Why: The New Science of Cause and Effect

When a significant event occurs, such as the formation of a tornado, a sudden climate anomaly, or a regime shift in a complex system, we can often observe its consequences clearly. Yet, identifying the precise mechanisms that produced it is one of the most difficult topics in science, dating all the way back to Aristotle. The fundamental challenge is that:

We can directly observe the effects, but not their causes.

Based on this conundrum, assimilative causal inference (ACI) aims to answer the following:

Can we identify the causes from their observed effects?

ACI proposes a new way to study causality in high-dimensional complex systems in real-time.

What is causal inference and why it matters?

Causal inference seeks to quantify how changes in one part of a system produce, at least partly, measurable responses to other variables, so as to determine cause-and-effect relationships.

Understanding the true causal structure of nature is of paramount importance. For researchers, it is an essential prerequisite for developing surrogate models that skilfully and effectively capture realistic properties and dynamics of systems in geophysics, neuroscience, and engineering. For governments and the public, it supports decision-making and risk assessment in socioeconomic matters, as well as disaster preparedness against overwhelming and unforeseen events.

How do we usually infer causal relationships?

Traditionally, causality is studied in a forward manner. Using either observational data or dynamical models, system states are extrapolated forward in time. The hope is that, by evolving the causes forward in time, we can predict and capture their future genuine effects, an approach described as predictive causality.

However, real-world systems are rarely fully observable, with causal drivers either hidden, poorly measured, or entirely inaccessible. Furthermore, data-based methods often provide results in an aggregated form, without temporal information on how relationships evolve.

On the other hand, model-driven approaches are susceptible to model error, which can easily bias causal discovery. In addition, scaling to high dimensional systems is intractable, with computational needs scaling exponentially as to counteract the curse of dimensionality.

How is assimilative causal inference different?

ACI aims to address both the philosophical and operational shortcomings of classical causal inference. 

Rather than forecasting effects from their causes, wouldn't it be more natural to do the opposite? What if we instead used data from the observed effects to work backwards and identify their candidate causes? What if we assessed how such observational information refines the estimation of earlier system states of the underlying model, so as to identify the causes that must have generated said effects? In essence, how can we frame causal inference as an inverse problem, where effects are interpolated onto their causes?

Conceptually, we combine two complementary sources of information for causal discovery:

  • Observational data, representing the "objective" reality.
  • A dynamical model, which constitutes our "subjective" prior understanding of the system's causal structure.

The framework that enables this is data assimilation, the mathematical tool behind ACI (from which it also inherits its name). Using Bayes' theorem, observations and physical forecasts are efficiently integrated to produce improved estimates for the variables we cannot observe in practice.

How was ACI developed?

The origins of ACI emerged during an earlier work of our research group on a specific real-time framework for data assimilation. In this paper, we developed a pipeline that adaptively and retroactively improves the estimated states of a dynamical system as new partial observations become available. This is known as online smoothing. Specifically, during a case study of a coarse-grained model for atmospheric variability, it became apparent that when extreme events were detected in the observables, their strong signal led to a significant improvement in the estimation of the past unobserved variables that were responsible for their generation.

As a result, future observations from the effects were effectively revealing information about the state of their causal drivers in the past. In other words, observing the consequence improved our knowledge of its cause over the relevant time window. This is mathematically known as uncertainty reduction.

Using this remark, we* formed the basis of ACI, developing a principled framework for causal inference that uniquely identifies instantaneous causal relationships of high-dimensional complex systems and the dynamic interplay between cause-and-effect roles in real time, without requiring observations from candidate causes.

Can you provide an intuitive example of ACI?

Of course! An illustrative example is provided in Panel (a) of Figure 1. Consider a complex Earth-system model in which we can only observe the large-scale variables. For example, satellite imagery of Arctic ice may reveal the location of ice floes over time, but the ocean currents that move are unobserved.

By combining data of ice locations with model forecasts, data assimilation estimates and reconstructs the hidden ocean state responsible for their observed motion. We have two options on how to estimate the hidden states:

  1. Use only past and present observations (filter solution).
  2. Together with the past and current data, additionally incorporate future observations in which the causal impact has manifested to an observed effect (smoother solution).

In the latter case, the bias and uncertainty in the inferred hidden state decreases substantially! This reveals how, at that time, the unobserved variables are the instantaneous cause to the observables under the ACI-based viewpoint of causality!

Does ACI provide anything new beyond a reinterpretation of causality?

Yes! ACI goes beyond just determining whether  Y caused X at time t. Crucially, it provides rigorous metrics for describing the forward and backward causal influence range (CIR) of a relationship. These metrics, respectively answer two fundamental questions:

  1. For how long will this causal effect persist?
  2. When did the causal precursors for this event emerged?

By explicitly adding a time dimension to causality, ACI is the first general framework to unify prediction and attribution. It creates new avenues for forecasting the emergence and persistence of extreme events, thus advancing both science and disaster preparedness. Panel (b) of Figure 1 provides a high-level overview of the forward and backward CIRs, in the context of a real-world scenario (tornado forecast and attribution).

Sounds interesting! How can I learn more about ACI?

Read our paper that was recently published in Nature Communications and the follow-up work that formalises the forward and backward formulations of the CIR! Furthermore, in the following YouTube video, a short introduction to ACI is provided. In this exposition, the operational details of ACI are explained via animations without complex mathematics. Both English and Greek closed captions are available.

 

Let us know what you think about ACI! Share your opinions or questions in the comments below, and don't hesitate to reach out via email for further inquiries.

*In memory of our co-author, Erik Bollt, who unexpectedly passed away on December 7, 2025.