Since the beginning of the pandemic, a number of analyses have suggested that standard preparedness measures did not predict COVID-19 mortality or other outcomes. These studies investigated the WHO’s States Party Self-Assessment Annual Reporting (SPAR) tool and the Joint External Evaluation (JEE) tool and the Global Health Security Index (GHSI). Such analyses led the Independent Panel for Pandemic Preparedness and Response to say that “the failure of these metrics to be predictive demonstrates the need for a fundamental reassessment which better aligns preparedness measurement with operational capacities in real-world stress situations, including the points at which coordination structures and decision-making may fail.”
Having studied tools for measuring and assessing public health emergency preparedness, we were concerned that the first part of the Independent Panel’s conclusion – about the failure to be predictive – was misleading. The analyses are essentially asking the wrong question; evaluating metrics involves more than asking whether they predict outcomes.
Nevertheless, we agree about the need for better measurement tools that capture operational capabilities, including coordination and decision-making. Understanding the distinction requires taking a measurement science perspective. This well-established paradigm begins with the fundamental question of “why measure,” with accountability, quality improvement, and resource allocation and mobilization as the most common uses. Answers to this question, in turn, help frame approaches to the remaining questions: “what to measure,” “how to measure” (i.e., collect data to populate measures), and assessing “how well measures work” (i.e., their validity, reliability, feasibility, and utility).
Our recent paper in Globalization and Health, identifies two reasons why standard measures failed to predict COVID-19 outcomes. First, there are statistical problems: the quality and comparability of the data across countries, the focus on simplistic outcome measures, and the inadequacy of simple cross-sectional study designs call into question the validity of these analyses.
What to measure: But there also is a serious conceptual issue: conflating resilience and preparedness. Various communities of practice use the terms in different ways, but we think of resilience as the ability of individuals, communities, and systems to adapt and recover from adversity, including public health emergencies. From this perspective, current preparedness measures fail to predict pandemic outcomes because they do not adequately capture variations in the presence of effective political leadership needed to activate and implement existing systems, instill confidence in the government’s response or differences in interpersonal and institutional trust needed to mount fast and adaptable responses.
Preparedness metrics, on the other hand, are not intended primarily to predict outcomes. Rather, their purpose is to hold countries accountable for their obligations under the International Health Regulations and to identify gaps in preparedness systems so they can be addressed and improved. Thus, the validity of the SPAR and JEE tools, and the GHSI depends on how well these metrics identify gaps in preparedness capacities and capabilities that are necessary, but not sufficient, to guarantee good outcomes. Although governance and trust are critical, the international community can only hold countries accountable for their treaty obligations.
While our paper was in review, Jorge Ledesma and colleagues published an analysis that addressed both of our concerns. First, related to our technical concerns, they used age-adjusted excess mortality estimates with data over two years (2020 and 2021) rather than a limited window as the outcome measure. Second, with this statistical approach they found quite different result than the other analyses: higher GHSI scores were associated with lower COVID-19 deaths, as one would expect from a good measure of preparedness.
Moreover, Ledesma and colleagues further analyzed the results in a multivariate regression exploring the impact of the six components of the GHSI. Controlling for the other components, they found that the “risk environment score” had a stronger relationship to COVID-19 mortality than any of the others. This score includes government effectiveness, public confidence in governance, trust in medical and health advice, and related factors. The other five components are more traditional preprepared measures; the risk environment score at least partially includes governance and social trust, which are necessary for a successful response.
However, although our paper explains why preparedness measures didn’t predict COVID-19 outcomes, we agree with the Independent Panel’s call for better measurement tools better aligned with operational capacities in real-world stress situations. This is where measurement science’s consideration of “what to measure” and “how to measure” comes in.
What to measure: ensuring accountability under the IHR requires going beyond summary JEE or GHSI scores to identify specific gaps in operational capabilities. One lesson from the pandemic, for instance, was that the standard measures did not capture all of the relevant constructs. The SPAR tool and the GHSI have both been updated since to reflect lessons learned in the pandemic about, for instance, the need for enhanced surveillance. Similarly, the European Union’s template for triennial reporting on preparedness in the Member States now reflects the need to scale-up testing and laboratory efforts, not just the availability of surveillance systems.
How to measure: we can’t improve systems without knowing how they operate and why they might fail. This requires qualitative assessments as well as quantitative indicators, and fortunately, several tools have been developed in recent years to address this need. Rigorous methods for After- and Intra-Action Reviews have been developed and were effectively used to examine how systems performed under stress during the pandemic. Simulation exercises and stress tests have also been developed as tools to assess preparedness capabilities.
Along these lines, Prevent Epidemics has developed the 7-1-7 process as an organizing principle, target, and accountability metric. The process focuses on three specific time milestones: 7 days to detect a suspected outbreak, 1 day to inform national public health authorities to initiate an investigation, and 7 days to initiate an effective response. The idea is that continuously evaluating and improving timeliness can identify performance bottlenecks and help to improve detection speed and response quality. WHO is currently working to incorporate this approach, which they label Early Action Review, with a OneHealth focus.
No single score – or indeed any particular tool – can capture every important dimension or perfectly predict outcomes. But taken together, new tools for measuring and assessing public health emergency preparedness offer opportunities to ensure countries’ accountability to one another and to improve the global, national, and local preparedness systems that are essential for improving global health security.
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in