Why fairness in clinical language models is an alignment problem

Can clinical language models perform equitably across racial groups?

Rawan Abulibdeh Feb 25, 2026

Clinical artificial intelligence is routinely described as a technical advance—an instrument for improving prediction, accelerating decision-making, and optimizing care. Yet when AI systems are trained on electronic health records, they do not merely learn from data. They inherit the social, institutional, and epistemic histories sedimented in clinical documentation itself—histories shaped by power, omission, and uneven attention.

This paper began with a deceptively narrow question: Can clinical language models perform equitably across racial groups? What we found is that this question cannot be answered at the level of optimization alone. Fairness is not a post hoc correction. It is an emergent property—one that depends on how medicine records, values, and interprets people in the first place.

The problem we rarely name

Race is among the most consequential variables in health research, yet it is inconsistently and unevenly recorded in electronic health records. Structured demographic fields are frequently missing, outdated, or misaligned with how patients understand themselves. These absences are not random, they reflect long-standing judgments about whose identities are clinically salient, whose are presumed, and whose are rendered peripheral.

Unstructured clinical notes—free-text narratives accumulated over years of care—often contain richer social information. But to extract race from such text is to encounter an ethical fault line. Race is not a biological substrate; it is a social designation that indexes differential exposure to structural constraint and institutional harm. Algorithmically inferring race risks reinscribing categories that medicine has historically naturalized and misapplied, yet methodological abstinence is not neutral. To ignore race altogether is to allow inequity to remain statistically illegible—to render disparities unmeasured, and therefore ungoverned.

We therefore approached this tension as constitutive rather than peripheral. Our objective was not to stabilize race as an intrinsic trait, nor to treat it as a convenient modeling feature. It was to examine whether AI systems could recover patterns already embedded in clinical documentation with sufficient reliability and equity to support accountability. The task was diagnostic, not essentialist: to determine whether existing records can be surfaced in ways that permit inequities to be seen, audited, and ultimately addressed—rather than silently reproduced through the veneer of technical neutrality.

Why architecture matters more than we expected

Most contemporary clinical natural language processing systems are built on transformer architectures that treat text as a continuous sequence of tokens. This abstraction has enabled remarkable advances in scale and performance, but clinical documentation is not, in practice, a flat stream. It is layered and cumulative: sentences embedded within notes, notes situated within encounters, encounters unfolding across time. Clinical meaning is not merely lexical; it is contextual and longitudinal. To collapse this structure is not a trivial engineering shortcut. It is an epistemic decision about what counts as signal and what can be discarded.

Our findings suggest that this decision carries distributive consequences. Models explicitly aligned with the hierarchical organization of clinical narratives—architectures that preserve note-level and encounter-level structure—achieved not only higher predictive performance but more equitable behavior across racial groups. By contrast, systems that treated the record as undifferentiated text were more vulnerable to disparity or produced unstable dynamics, including regression toward majority-class predictions. Architectural misalignment did not merely reduce accuracy; it altered how error was distributed.

The lesson is not that one technical form guarantees justice, it is that equity cannot be abstracted from design. Fairness does not attach cleanly at the level of optimization, it emerges—or fails to emerge—from how the system conceptualizes the record itself. When the architecture distorts the structure of clinical documentation, disparities are not simply measurable artifacts; they are predictable consequences of representational choices embedded deep within the model.

What fairness constraints can, and cannot, do

We evaluated fairness-aware objectives including a loss function inspired by equalized odds, explicitly penalizing disparities in error rates across racial groups. Such approaches are increasingly presented as evidence that ethical concerns can be resolved within the training loop.

Our findings resist that reassurance; constraints substantially improved parity for some transformer models, particularly where baseline disparities were severe. In other models, the same intervention introduced trade-offs, degrading overall performance or producing unstable behavior. Even where constraints narrowed headline gaps, disparities persisted across intersections of race, sex, and age. Equity proved fragile and model-dependent.

The implication is neither that fairness methods are futile nor that they are sufficient. Algorithmic bias is not primarily an algorithmic problem, it is seeded in how data are generated, how identities are recorded, and how clinical systems distribute attention. Models do not invent these patterns; they formalize them. No optimization strategy, however well intentioned, can fully remediate inequities that arise upstream in documentation practices and institutional design.

A structural interpretation of bias

One of the most consistent signals in our analysis was that performance disparities could not be explained by class imbalance alone. Instead, they tracked differences in how clinicians document patients—what is recorded, how explicitly, and for whom. These patterns are shaped by institutional norms, time pressures, implicit bias, and the historical lineages of medical practice itself.

Seen in this light, biased model outputs are not aberrations, they are mirrors. They reveal where healthcare systems encode difference unevenly, and where marginalized groups are rendered statistically fragile. This reframing matters, it shifts the central question from “How do we fix the model?” to “What does the model expose about how care is practiced?”

Why this work is personal

I approach this work with a lived understanding that data systems are not neutral infrastructures. My own family history is marked by displacement, interrupted education, and uneven access to opportunity. These experiences make it impossible to treat fairness as a checkbox or a purely technical metric.

In healthcare AI, the stakes are particularly high. These systems increasingly shape who is seen, who is flagged, and who is overlooked. When built without attention to structure, history, and power, they risk automating precisely the inequities they claim to remedy.

What comes next

This paper does not argue against fairness-aware AI, it argues against superficial fairness. Equity cannot be bolted onto architectures that disregard how clinical knowledge is produced, nor can it be achieved without confronting documentation practices that systematically flatten or exclude social reality.

If clinical AI is to support health equity, it must be designed with epistemic humility—an acknowledgment of what models can and cannot repair. It must also be paired with institutional efforts to improve how social information is recorded, governed, and interpreted.

Fairness is not a patch, it is an emergent property of alignment—between models and data, between technology and clinical practice, and between innovation and justice. That alignment is harder than optimization. But it is the only path that leads somewhere worth going.