Fatal Flaws are Ingrained in Laboratory Animal Research - But who cares?

When we began looking closely at laboratory animal experiments, we found that despite the enormous time, cost, and care involved, studies had biased designs. The problem appeared systemic. We set out to understand the extent, causes and prevention of flawed designs in laboratory animal research.

Published in Cancer, Microbiology, and Neuroscience

Fatal Flaws are Ingrained in Laboratory Animal Research - But who cares?
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

To determine the prevalence of biased laboratory animal research designs in the published literature, we conducted a structured review of animal experiments published in North America and Europe in 2022 (Scientific Reports https://rdcu.be/eCisi). These studies involved sophisticated techniques, talented scientists, and thousands of laboratory animals. And yet, when we examined how the experiments were designed, we found a sobering truth: not even one followed the principles of rigorous experimental design.

At first, this was hard to believe. But the deeper we looked, the clearer it became: the flaws were not subtle details buried in statistical jargon. They were fundamental problems in how the experiments were set up in the first place.

Three Problems That Undermine Reliability

  1. Randomization and Blinding

In any valid experiment, chance must decide which subject goes into which group. Randomization helps ensure groups are balanced and comparable. And blinding protects against the human tendency — conscious or unconscious — to see what we expect to see. Yet in no study we reviewed were animals fully randomized, with all researchers and analysts completely unaware of the groups they were working with. That leaves results open to bias before the data are even analyzed.

2.    The Cage Effect

Animals living together in the same cage influence each other and share the same micro-environment. No two cages are ever exactly alike. If one cage of animals is assigned to each treatment, then the effects of treatment and cage become inseparable. It’s like comparing two schools by testing just one classroom in each — you can’t tell whether differences come from the schools or the classrooms. This “completely confounded design” makes treatment effects impossible to assess.

A better solution is the Randomized Complete Block Design (RCBD), where animals from different treatment groups are mixed within cages. In this setup, each cage acts as a fair test, allowing valid comparisons between treatments while using fewer animals overall. If this is not possible, then a valid but less efficient and more expensive RCBD with multiple cages of animals assigned to each treatment can be used.

3.     The Wrong Unit of Analysis

Finally, we found that many studies analyzed data as though each animal were independent, even when treatments were applied to entire cages. This practice, called pseudoreplication, is like baking one batch of cookies but pretending each cookie had a separate recipe. It artificially inflates the sample size and makes results look more convincing than they really are.

Why This Matters

These flaws undermine the credibility of the entire field of laboratory animal research. And the consequences ripple outward:

  • Ethical costs: Millions of animals are used each year in research, with the expectation that their use will generate knowledge to advance health. If the designs are invalid, animals suffer without producing trustworthy science.
  • Scientific and economic costs: Time, money, and talent are poured into experiments that cannot deliver reliable results.
  • Societal costs: Treatments that look promising in flawed animal studies often fail in human trials — delaying progress and sometimes exposing volunteers to unnecessary risk.

It is no wonder that laboratory animal science faces a reproducibility crisis. Many of its roots can be traced to how experiments are designed.

The Way Forward

The solutions are not new. A century ago, statistician R. A. Fisher laid out the principles of sound experimental design: randomization, replication, and blocking. Applied correctly, these principles solve the problems we found. The Randomized Complete Block Design, in particular, explicitly accounts for cage effects, ensures random allocation, permits valid statistical analysis, and requires the fewest number of animals.

But technical fixes alone are not enough. Change must happen at every level:

  • Researchers must design experiments using rigorous block designs, proper randomization, and blinding.
  • Funding agencies and ethics boards must demand clear documentation of study designs and train reviewers to spot invalid ones.
  • Journals must enforce standards that prevent fatally flawed experiments from being published.
  • Universities must create graduate programs to train a new generation of experts in experimental design and analysis who can work alongside research teams, funding bodies, ethics committees, and editorial boards.

A Call to Action

Science depends on public trust. That trust is eroded when people learn that most animal experiments are invalid by design. By addressing these flaws, we can make animal research more ethical, efficient, and impactful. We can ensure that when animals are used, their contribution genuinely advances both human and animal health.

The tools to fix this are already in our hands. What’s needed now is the collective will to use them.

What do You Think?

Billions of dollars and millions of lives, human and animal, depend on the credibility of preclinical research. The tools to fix this problem already exist — what’s missing is the collective will to use them.

Our review shows that virtually all published comparative, laboratory animal studies suffer from biased designs. This conclusion is hard to dispute. But if anyone believes the current situation can be defended, we want to hear that defence — especially from funding agencies, research organizations, ethics boards, and journal editors, who are in the best position to mandate change.

And if no credible defence exists, then the harder questions remain: what must be done, and who will take responsibility?

We believe this calls for an open, frank discussion. What do you think? How can we, as a community, address these flaws and ensure that animal experiments truly deliver reliable knowledge?

 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Go to the profile of Jens Hoffmann
about 1 month ago

Interesting discussion but not very helpful for cancer models in mice.

 

One general aspect for cancer research needs to be considered: Cancer is a heterogeneous disease and we have learned not to translate results from one tumor model in mice to the general cancer patient population.

In cancer research now it is rather standard to test compounds in a panel of models – similar to clinical trails phase II. Single mouse study design is frequently used for this and randomization not longer an important topic. Giving this trial design, it is not longer necessary to compare groups with statistical tests. It is rather important to evaluate treatment effects on tumors (response) critical by using adopted RECIST criteria.

Blinding would definitely helpful, but frequently not possible because treatments are different.

 

Please do not have mice with different treatments in one cage. Compounds are excreted by mice with urine and feces and can be taken up by other mice. I have seen studies failing because the other compound has been detected in the plasma from one mouse in the same cage.

 

The unit of analysis is definitely relevant, but in a total different meaning. The statistician has rather to calculate how many different tumor models need to be used to allow comparison of response rates with SoC in clinical use.

Go to the profile of Hugh G.G. Townsend
about 1 month ago

Reply to Comment

Thank you for taking the time to share these important perspectives from the cancer research field. We are pleased to have the opportunity to address you challenges and agree with several of the points you raise, particularly that:

  • Cancer is indeed a heterogeneous disease, and no single mouse model can represent the diversity of human tumors. Using a panel of models is a sensible way to address this.
  • Housing design matters: co-housing mice that are assigned to different treatments and then kept together in the same cage may bias results if there is a risk of clinically significant cross-contamination with pharmaceutical products through urine and faeces. Under these circumstances treatment must be randomly assigned to multiple cages, each housing one or more mice. One mouse per cage is costly and not considered humane. More than one mouse per cage is more humane but even more costly. So commingling is best unless you know that cross-contamination will adversely affect the results. If treatment is assigned to cage then the sample size is equal to the number of cages assigned to each treatment, not the number of mice assigned to each treatment. Treating the individual animal as the unit of analysis produces pseudoreplication of data.
  • Sample sizes and model selection must address heterogeneity and generalizability to the clinical setting.

At the same time, we respectfully disagree with the suggestion that randomization, blinding, controlling for confounding, and statistical testing are unnecessary in this context. Panels of models are only as reliable as the individual experiments that comprise them. If each study within the panel is biased in design or execution, the conclusions across the panel will also be biased.

  • Randomization remains essential to avoid allocation bias, even in multi-model trials. Without it, treatment effects may be confounded by systematic differences in how animals are assigned. Even in single mouse studies, randomization to cage and treatment is essential to be sure that the treatment groups are balanced with respect to all confounders that cannot be controlled through analysis. Further, any other procedures performed (e.g. cage placement, sample processing, histology) that may result in biased outcomes must be addressed through randomization.
  • Blinding is generally feasible at the stage of outcome assessment (e.g., tumor measurement, histology, imaging), even if treatment administration cannot be blinded. This greatly reduces risk of detection bias. Having said this, failure to blind investigators during any procedure that may influence outcome assessment, for any reason (e.g. because it is challenging, inconvenient or just not possible) increases risk of bias. As with randomization, blinding must be full/complete with everyone who has any potential to influence study outcome, including the statistician, must be blind to group assignment.
  • Statistical analysis is still required to determine whether observed differences are larger than expected by chance. Descriptive evaluation alone is insufficient when making efficacy claims, especially in studies intended to inform human trials.
  • Cage Effect will be present in every laboratory animal study and must be addressed appropriately during both the design and analysis of every study.
  • Correct unit of analysis – This is addressed above. Pseudoreplication was present in all but a small number of the studies that we have reviewed.

We therefore believe that rigorous experimental design principles apply equally to cancer models as to other preclinical studies. Panels of tumor models can improve representativeness, but studies that do not address the above within each model will yield unreliable results and conclusions. As emphasized in our paper, such studies are not ethical because they waste animal lives, resources, public funds (in many cases) and lead to the miseducation of young scientists.

We hope this is helpful and look forward to further discussions.

Go to the profile of Steve Acres
about 1 month ago

I think this paper makes a number of essential points about pre-clinical trial design, backed up by two analytical studies of recent publications. The necessity of using randomization, blinding, and the correct unit of analysis are well explained and supported, including for mouse cancer models.  I believe that most MSc and PhD programs require students to take a course in statistics. Perhaps a course specifically on study design should also be required.  In the paper, I think more emphasis might have been put on the need to start with a clearly stated Null Hypothesis, and situations when one might not be needed. Secondly, some discussion of when normal distribution of data cannot be assumed and when and where non-parametric statistics might be useful.

Overall a useful and timely contribution.

Go to the profile of Hugh G.G. Townsend
about 1 month ago

Thank you very much for your thoughtful and constructive comments.

We agree that graduate students in the biomedical sciences need a reasonable knowledge of statistics. A basic grounding in statistics is essential for designing unbiased studies that produce data suitable for analysis. This does not mean every investigator must become a statistician. Rather, it means they should have enough understanding to know when expert support is needed, and to be able to critically evaluate the validity of studies on which their own work depends. We also agree that dedicated training in study design would prove to be an invaluable, if not essential addition to their graduate education. This would greatly strengthen the long-term success of future researchers.

Your point about the importance of a clearly stated null hypothesis is well taken. In reviewing the literature, however, we found no examples where laboratory animal studies specified a null hypothesis. One reason is that these studies are seldom structured as true experiments, testing a single factor against a single outcome. Instead, they often include multiple factors and outcomes—many chosen only after data collection. In this sense, most laboratory animal studies resemble observational studies, which, if properly designed, are useful for generating ideas but not for formal hypothesis testing.

We also appreciate your suggestion about recognizing when data may not follow a normal distribution and when non-parametric methods might be more appropriate. We did not explore this in our paper because, in practice, we rarely found studies where the design allowed for a valid statistical analysis of any kind. Our main goal was to highlight design flaws that make proper analysis impossible from the outset. Once data from more rigorously designed studies are available, we believe a next step will be to examine how statistical methods—including parametric and non-parametric approaches—are applied or misapplied in laboratory animal research.

Go to the profile of Amanda Barabas
about 1 month ago

Very valuable work and worth highlighting. A couple thoughts:

I have seen several presentations on this very topic at the American Association for Laboratory Animal Science national meeting. I attend this meeting because I have an interest in animal welfare. However, I wonder how many investigators make up the audience at this meeting. It is primarily aimed at facility care staff/managers and veterinarians/vet techs. Is this topic presented at other meetings? For instance, since this study data consisted of vaccine research, will this be presented at vaccine/immunology conferences? Do you plan to review work in other fields and potentially present at other conferences? In general I don't think this message is getting to the right audience. Hopefully this publication is a step in the right direction.

Secondly, as a manuscript reviewer, the journal editors really need to take the ARRIVE guidelines into more consideration. I have reviewed for journals that at minimum encourage using the ARRIVE essential 10. Some have a statement asking the authors if they followed these guidelines. They check "yes", and yet I often point out several points related to the study design/statistics reporting that are in the essential 10, but not in the manuscripts. In my opinion, authors should be filling out a form, reporting which lines in their manuscripts fulfill the ARRIVE guidelines. I actually beta tested the 2.0 guidelines with a manuscript of mine and filled out a very inclusive table of where each point was located. It took a little time, but it made the manuscript more transparent. 

Go to the profile of Hugh G.G. Townsend
about 1 month ago

Thank you very much for your comments. These are very important points.

Getting this message out and overcoming deeply ingrained approaches to study design that stretch back more than 100 years will be very challenging. Our initial action has been to publish our paper and through this, give the scientific community a clear description of our position on this issue. In the coming days we will make a joint press release, aimed at getting the attention of the scientific community and the general public. At the same time, we are in the process of opening discussions with major funding agencies as we believe they can act as a major control point to assure that poorly designed research does not receive their support. To keep our thinking practical, we are meeting with various laboratory groups and ethics boards to get their reaction to our ideas and recommendations. We are also meeting with university officials to discuss the establishment of training programs aimed at producing experts in study design and analysis. Where we go after that is still being considered and we will certainly pay attention to your recommendations and any others that we receive.

We are fully supportive of the ARRIVE guidelines, though our emphasis differs in important ways. ARRIVE focuses on the publication process, while our concern is with the planning stage of research. For this reason, we are more prescriptive and less flexible about critical issues related to bias, such as blinding, randomization, addressing cage effects, and using the correct unit of analysis. With respect to blinding and randomization, this distinction makes sense: no study design can anticipate every possible source of bias, so some level of imperfection will always need to be tolerated in the reporting of research.

Go to the profile of Amanda Barabas
about 1 month ago

I see. By the time you're at the ARRIVE guide stage, it's too late.

What are your thoughts of the PREPARE guide and having funding agencies require something similar? https://journals.sagepub.com/doi/10.1177/0023677217724823 

Section 4 seems like it could be helpful, although the wording is a bit vague at this point.  I think it's a good starting point at least. 

Go to the profile of Hugh G.G. Townsend
about 1 month ago

Thank you for drawing attention to PREPARE: Guidelines for planning animal research and testing, an important document and associated website that offers much valuable guidance for researchers during the design and conduct of animal studies. Of particular relevance to our paper is the detailed supplemental information linked at norecopa.no, which expands on Section 4—Experimental design and statistical analysis.

Our paper, however, focuses specifically on the four fatal flaws that compromise the validity of most, if not all, laboratory animal experiments. If these flaws—related to failures in full randomization, full blinding, control of cage effect, and the use of the correct unit of analysis—are not rigorously addressed at the design and analysis stages, the resulting data are inherently unreliable and cannot support valid statistical inference.

Given the opportunity, we will strongly recommend that funding agencies verify that these core design elements are in place before a study proceeds to peer review. Without this foundation, even the best-intentioned initiatives such as PREPARE and ARRIVE cannot ensure that results will be credible, repeatable, reproducible and translatable to human and veterinary medicine.

Follow the Topic

Biomedical Research
Life Sciences > Health Sciences > Biomedical Research
Cancer Models
Life Sciences > Biological Sciences > Cancer Biology > Cancer Models
Immunology
Life Sciences > Biological Sciences > Immunology
Veterinary Microbiology
Life Sciences > Biological Sciences > Veterinary Science > Veterinary Microbiology
Medical Microbiology
Life Sciences > Health Sciences > Biomedical Research > Medical Microbiology
Neuroscience
Life Sciences > Biological Sciences > Neuroscience

Ask the Editor - Immunology, Pathogenesis, Inflammation and Innate Immunity

Got a question for the editor about the complement system in health and disease? Ask it here!

Continue reading announcement

Related Collections

With Collections, you can get published faster and increase your visibility.

Obesity

This cross-journal collection welcomes submissions of clinical and preclinical work that explores all aspects of obesity, including causes, pathophysiological mechanisms, incidence, prevention, treatment and impact.

Publishing Model: Hybrid

Deadline: Apr 24, 2026

Reproductive Health

This Collection welcomes submissions related to a broad range of topics within reproductive health care and medicine related to reproductive well-being.

Publishing Model: Hybrid

Deadline: Mar 30, 2026