Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Three shutdown mechanisms were activated in nine days during the Anthropic-Pentagon dispute. None held. This post examines what that failure reveals about an untested assumption in AI safety: that human capacity to refuse and resist reactivation remains intact under sustained algorithmic interaction
Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

SpringerLink
SpringerLink SpringerLink

Formal and computational foundations for implementing Affective Sovereignty in emotion AI systems - Discover Artificial Intelligence

Emotional artificial intelligence (AI)—systems that infer, simulate, or influence human feelings—create ethical risks that existing frameworks of privacy, transparency, and oversight cannot fully address. This paper advances the concept of Affective Sovereignty: the right of individuals to remain the ultimate interpreters of their own emotions. We make four contributions. First, we develop formal foundations by decomposing risk functions to capture interpretive override as a measurable cost. Second, we propose a Sovereign-by-Design architecture that embeds safeguards and contestability into the machine learning lifecycle. Third, we operationalize sovereignty through new metrics—the Interpretive Override Score (IOS), After-correction Misalignment Rate (AMR), and Affective Divergence (AD)—and demonstrate their use in a proof-of-concept simulation. Fourth, we link technical design to governance by introducing the Affective Sovereignty Contract (ASC), a machine-readable policy layer, and by issuing a Declaration of Affective Sovereignty as a normative anchor for regulation. Together, these elements offer a computational framework for aligning emotional AI with human dignity and autonomy, moving beyond abstract principles toward enforceable, testable standards. In proof-of-mechanism simulations with $$k=10$$ random seeds, enforcing DRIFT (Dynamic Risk and Interpretability Feedback Throttling) with policy constraints reduces the Interpretive Override Score (IOS) from $$32.4\%\pm 3.8$$ (baseline) to $$14.1\%\pm 2.9$$, demonstrating measurable preservation of affective sovereignty with quantified variability. Results reported here are based on proof-of-mechanism simulations; a preregistered human-subject evaluation ($$n=48$$) is planned and has not yet been conducted.

In late February 2026, the Anthropic-Pentagon dispute exposed a problem that AI safety theory has not yet adequately modeled. A corporate refusal, a presidential ban, and a military transition order were all activated within nine days. The system never fully stopped. It changed providers, changed networks, and continued operating.

This is not only a political sequence. It is a theoretical one. AI safety has developed increasingly sophisticated models of shutdown, controllability, and alignment. What it has not yet developed is a theory of the human conditions required to keep a system offline once shutdown becomes possible.

The missing assumption

At IASEAI’26 in Paris, Vincent Conitzer presented a formal framework for shutdown safety valves in advanced AI systems. The framework specifies four conditions: the system recognizes danger, the system values halting, the operator remains rational, and reactivation requires deliberate human judgment. Gillian Hadfield, at the same conference, argued that AI systems must develop normative competence by learning from human emotional and social signals, including enforcement, punishment, forgiveness, and shifts in tone.

Both frameworks are important. Both also presuppose something that is rarely examined directly: that the human operator remains a stable emotional and cognitive reference point. Neither framework asks what happens if the human side of the loop is itself changing under sustained interaction with the very systems being governed.

Why the human baseline is not stable

My recent study in Computers in Human Behavior Reports (Kim, 2026; Vol. 21, Article 100975) suggests that this change is already measurable. In a cross-sectional study of 301 U.S. adults, functional AI use temporally preceded emotional closeness to the system, not the reverse. Users did not first decide to trust and then engage. They engaged, and trust assembled around repeated use. In a longitudinal study of 234 Singaporean university students, habitual interaction predicted deepening attachment over time. Anthropomorphism did not alter the direction of this effect, but it increased its speed.

These findings matter for shutdown theory because they suggest that the operator’s critical distance from the system is not a fixed baseline. It is a variable shaped by frequency, habit, and relational framing, and under repeated use it appears to move in one direction: toward greater dependence, reduced interpretive distance, and a diminished ability to tolerate the system’s absence.

Two concepts for the safety discourse

This line of work introduces two concepts that may help clarify the problem.

Affective Sovereignty names the background condition: the right and capacity to interpret one’s own emotional states without algorithmic override. I have developed a formal architecture for this principle in Discover Artificial Intelligence (Kim, 2026). When affective sovereignty erodes, the human signals on which normative competence depends also degrade at their source. The problem is no longer only what the system reads, but what remains available to be read.

Reactivation resistance names a governance-relevant capacity: the human ability to keep a system offline once shutdown has become possible. Shutdown design asks whether a system can be stopped. Reactivation resistance asks whether the human can sustain that stoppage when institutional, social, and psychological pressures mount to restart.

The Anthropic case illustrates the distinction. Dario Amodei could refuse the Department of Defense’s demand in part because he occupied a position of structural detachment from the daily operational loop. The analyst, officer, or operator whose workflow and professional identity have become intertwined with the system faces a different problem. By the time explicit risk evaluation begins, the justification for continued use may already be assembling itself.

What the field is not measuring

The AI safety community tracks the capability curve of AI systems with extraordinary precision: benchmark performance, scaling behavior, emergent capacities, and rates of improvement. There is no comparable measurement for the human side. No one is systematically tracking declines in emotional granularity, interpretive authority, or tolerance of ambiguity under sustained algorithmic mediation.

We know how fast the machine is changing. We have only begun to ask how fast the human is changing with it.

If AI safety rests on a human foundation, that foundation requires monitoring with the same seriousness applied to the systems it is meant to govern.

AI safety may need not only a theory of controllable systems, but a theory of preservable human refusal.


A computational model addressing predictive emotional selfhood (PESAM) is currently under review at Acta Psychologica. An extended essay developing the full argument and timeline is available on Substack.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Social Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology
Science Ethics
Humanities and Social Sciences > Society > Science and Technology Studies > Science, Technology and Society > Science Ethics
Public Policy
Humanities and Social Sciences > Politics and International Studies > Public Policy

Ask the Editor – Collective decision-making

Got a question for the editor about Experimental Psychology and Social Psychology? Ask it here!

Continue reading announcement

Related Collections

With Collections, you can get published faster and increase your visibility.

Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions

Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.

This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.

This Collection supports and amplifies research related to SDG 4 and SDG 9.

Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics

Publishing Model: Open Access

Deadline: May 31, 2026

Explainable and Interpretable AI in Business and Society

Overview

Over the past years we have been observing a growing level of complexity and sophistication in machine learning algorithms and artificial intelligence technologies. This creates opportunities and challenges for business and society.

One of the main problems in ML and AI is that most this technology operates within black boxes, and this makes difficult to interpret and explain how certain decisions are taken or why a specific outcome has been obtained. AI systems are becoming a huge support to decision making by providing predictions, diagnosis and recommendations. In such complex and uncertain context, explainability becomes key to decision makers to think in a sustainable and transparent way.

In many application areas, for instance in healthcare or medicine, these aspects are so crucial that can even affect the potential adoption of such technologies and question their relevance.

Benefits in adopting explainable and interpretable ML and AI systems include ethics, regulation, and the implementation of good practices. This topical collection focuses on explainable and interpretable AI in a broad sense. Some of the topics will cover, include, but not limited to black box algorithms, algorithm bias, ethics, transparency, knowledge and rule-based systems, intelligent agents, argumentation systems and models, etc.

Keywords: Artificial Intelligence; Machine Learning; Deep Learning; Deep Neural Networks; Explainable AI (XAI), Decision support systems, Interpretable AI; Business; Society.

Publishing Model: Open Access

Deadline: Jun 30, 2026