Are all "fishing expeditions" bad?

The work presented in “Familial confounding in associations between maternal health in autism” adds to a large number of etiological studies of autism that focus on the role of early-life environmental factors in the development of the condition. While the causes of autism appear to be largely genetic, neither common nor rare genetic variation has been able to fully account for autism risk. In addition, while the recent surge in autism rates coincides with improvements in detection and increased awareness, questions about the possible contribution of environmental factors to autism etiology remain.
Although autism is usually not diagnosed until early childhood, studies suggest that the relevant neurodevelopmental processes begin already before birth — motivating a number of epidemiological studies investigating the impact of maternal health during pregnancy on a child’s likelihood of autism. The majority of these studies examined the effects of one or several specific diagnoses, focusing predominantly on several maternal conditions. For example, the associations between maternal depression, diabetes or obesity, and child autism have been replicated by multiple, large-scale studies across different settings.
Building on those studies, our goal was to generate a more comprehensive picture of the links between maternal health in pregnancy and child autism risk by testing the associations between all maternal diagnoses (of sufficient frequency in the population) and autism. An important component of our approach was to carefully account for the extensive comorbidity between maternal conditions. For example, depression and obesity often co-occur, and this should be accounted for when estimating their independent effects on autism risk. We hoped that our multi-diagnosis approach would tell us whether the earlier, more targeted approaches had missed anything important about the effects of maternal health on autism; and conversely, if some of their findings may have been driven by comorbidity between different conditions in the mother.
Testing multiple associations in epidemiological studies can be met with skepticism and even branded a “fishing expedition”. However, carefully designed, comprehensive analyses across a spectrum of exposures can not only increase the likelihood of novel discovery, but also enhance the rigor of the results – with false-positive rates properly controlled – and allow for internal benchmarking, which enables agnostic comparison of expected and unexpected effects. Critically, such an approach remains rooted in the hypothesis-driven scientific method – with both broad, and highly-specific, hypotheses being essential in the process of discovery, as we elaborate below.
The rigor of the results. The true likelihood that an association between a maternal diagnosis and autism is a false-positive does not change depending on how many associations are reported in a single scientific paper. For example, the “true-ness” of the effects of maternal diabetes on autism is the same, irrespective of whether these results are reported on their own, or together with the estimated effects of other maternal diagnoses. Most researchers would agree with the intuition that “slicing” the results into individual papers, each reporting a single association just below the threshold for nominal significance, is likely to multiply the number of false-positive associations in the literature. Following this reasoning, narrowly focused studies may underestimate the necessary correction for multiple testing by focusing on an experiment-wise (number of tests performed in a study), rather than a family-wise (number of possible tests, bound by a meaningful factor) error rate. Our systematic approach ensures that the experiment- and family-wise error rates are more closely aligned, reducing the probability of a false-positive finding. While the availability of this approach does open-up the possibility of data dredging and selective reporting of a few nominally significant associations, if a researcher wants to do bad science, they can always find a way. The transparent reporting of systematic, cross-disorder analyses, with analysis code made publicly available, reduces – not increases – the likelihood of such practices tarnishing the scientific literature.
Context. Typically, associations between different maternal conditions and autism are estimated across multiple studies with varying samples and analytical approaches, which complicates comparing the estimates of autism risk associated with different maternal diagnoses. In contrast, all associations in our study were estimated in the same cohort, and followed the same iterative adjustment strategy and sensitivity testing — allowing e.g., to infer which diagnoses are associated with a higher likelihood of autism compared to others, and to track the distribution of the effect sizes – and thus detect potential global inflation of the results that could arise due to the underlying selection or other biases. Additionally, our strategy enabled us to observe patterns of results that would otherwise be missed: for example, the enrichment of significant associations in a few specific diagnostic categories (psychiatric, obstetric, cardiometabolic), or the broad effects of adjustment for maternal healthcare utilization (the diagnoses rendered non-significant after this adjustment included mostly mild conditions, non-specific diagnoses and symptoms – a pattern not observable in single-diagnosis studies). Lastly, and perhaps critically, by adjusting for the co-occurrence of different diagnoses, we could account for the broader context within which each of these diagnoses presents itself — ensuring that the diagnoses highlighted in our study are associated with child autism independently of their comorbidity with another diagnosis.
Hypothesis testing. Performing such all-encompassing testing in epidemiology is sometimes claimed to be a “fishing expedition”, with concerns about translating the results into meaningful knowledge. These concerns can be valid, especially in the context of a lack of careful study design, selective reporting, and/or superficial interpretation of the research findings. However, in itself, such an approach is consistent with the scientific method that relies on testing falsifiable hypotheses. Having prior evidence allows us to test more targeted hypotheses, but to achieve this, the knowledge base needs to be established in an iterative manner. Therefore, the degree of specificity of the hypotheses increases as more knowledge is amassed, but both broader and more targeted hypotheses are essential and valid, and have their place at different stages of generating knowledge. Our study tested a broad hypothesis that certain maternal conditions in pregnancy are associated with autism risk — which, in the second phase, we were able to follow up with further questions about the role of familial factors in those associations. We found that multiple maternal diagnoses are associated with autism in the child, with the majority of these associations attributable to familial factors like genetics. These results funnel into many, more specific hypotheses (what familial factors, how do they exert their effects), which we hope will contribute to robust knowledge of the effects of maternal health in pregnancy on autism risk.
Importantly, the systematic approach we implemented is also associated with limitations that must be acknowledged, including e.g., difficulties in adjustment for factors specific to a given exposure. In addition, such studies should never be treated as an endpoint in themselves – and will only be of value when, systematically and thoroughly, they are followed up with more targeted approaches.
As health data grows exponentially in this era of ‘big data’, our approach to epidemiology must evolve to take advantage of the information hidden within, while ensuring that we also benefit from the hypothesis-testing frameworks that have made so many of the discoveries of the past.
Acknowledgment: thank you to Vahe Khachadourian, the brilliant lead author on the paper, and Paul O’Reilly, a co-author, for discussing these ideas with me over the years, challenging me when I went wrong, and providing valuable comments on this piece.
Follow the Topic
-
Nature Medicine
This journal encompasses original research ranging from new concepts in human biology and disease pathogenesis to new therapeutic modalities and drug development, to all phases of clinical work, as well as innovative technologies aimed at improving human health.
Related Collections
With collections, you can get published faster and increase your visibility.
Stem cell-derived therapies
Publishing Model: Hybrid
Deadline: Mar 26, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in