Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Three shutdown mechanisms were activated in nine days during the Anthropic-Pentagon dispute. None held. This post examines what that failure reveals about an untested assumption in AI safety: that human capacity to refuse and resist reactivation remains intact under sustained algorithmic interaction
Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

SpringerLink
SpringerLink SpringerLink

Formal and computational foundations for implementing Affective Sovereignty in emotion AI systems - Discover Artificial Intelligence

Emotional artificial intelligence (AI)—systems that infer, simulate, or influence human feelings—create ethical risks that existing frameworks of privacy, transparency, and oversight cannot fully address. This paper advances the concept of Affective Sovereignty: the right of individuals to remain the ultimate interpreters of their own emotions. We make four contributions. First, we develop formal foundations by decomposing risk functions to capture interpretive override as a measurable cost. Second, we propose a Sovereign-by-Design architecture that embeds safeguards and contestability into the machine learning lifecycle. Third, we operationalize sovereignty through new metrics—the Interpretive Override Score (IOS), After-correction Misalignment Rate (AMR), and Affective Divergence (AD)—and demonstrate their use in a proof-of-concept simulation. Fourth, we link technical design to governance by introducing the Affective Sovereignty Contract (ASC), a machine-readable policy layer, and by issuing a Declaration of Affective Sovereignty as a normative anchor for regulation. Together, these elements offer a computational framework for aligning emotional AI with human dignity and autonomy, moving beyond abstract principles toward enforceable, testable standards. In proof-of-mechanism simulations with $$k=10$$ random seeds, enforcing DRIFT (Dynamic Risk and Interpretability Feedback Throttling) with policy constraints reduces the Interpretive Override Score (IOS) from $$32.4\%\pm 3.8$$ (baseline) to $$14.1\%\pm 2.9$$, demonstrating measurable preservation of affective sovereignty with quantified variability. Results reported here are based on proof-of-mechanism simulations; a preregistered human-subject evaluation ($$n=48$$) is planned and has not yet been conducted.

In late February 2026, the Anthropic-Pentagon dispute exposed a problem that AI safety theory has not yet adequately modeled. A corporate refusal, a presidential ban, and a military transition order were all activated within nine days. The system never fully stopped. It changed providers, changed networks, and continued operating.

This is not only a political sequence. It is a theoretical one. AI safety has developed increasingly sophisticated models of shutdown, controllability, and alignment. What it has not yet developed is a theory of the human conditions required to keep a system offline once shutdown becomes possible.

The missing assumption

At IASEAI’26 in Paris, Vincent Conitzer presented a formal framework for shutdown safety valves in advanced AI systems. The framework specifies four conditions: the system recognizes danger, the system values halting, the operator remains rational, and reactivation requires deliberate human judgment. Gillian Hadfield, at the same conference, argued that AI systems must develop normative competence by learning from human emotional and social signals, including enforcement, punishment, forgiveness, and shifts in tone.

Both frameworks are important. Both also presuppose something that is rarely examined directly: that the human operator remains a stable emotional and cognitive reference point. Neither framework asks what happens if the human side of the loop is itself changing under sustained interaction with the very systems being governed.

Why the human baseline is not stable

My recent study in Computers in Human Behavior Reports (Kim, 2026; Vol. 21, Article 100975) suggests that this change is already measurable. In a cross-sectional study of 301 U.S. adults, functional AI use temporally preceded emotional closeness to the system, not the reverse. Users did not first decide to trust and then engage. They engaged, and trust assembled around repeated use. In a longitudinal study of 234 Singaporean university students, habitual interaction predicted deepening attachment over time. Anthropomorphism did not alter the direction of this effect, but it increased its speed.

These findings matter for shutdown theory because they suggest that the operator’s critical distance from the system is not a fixed baseline. It is a variable shaped by frequency, habit, and relational framing, and under repeated use it appears to move in one direction: toward greater dependence, reduced interpretive distance, and a diminished ability to tolerate the system’s absence.

Two concepts for the safety discourse

This line of work introduces two concepts that may help clarify the problem.

Affective Sovereignty names the background condition: the right and capacity to interpret one’s own emotional states without algorithmic override. I have developed a formal architecture for this principle in Discover Artificial Intelligence (Kim, 2026). When affective sovereignty erodes, the human signals on which normative competence depends also degrade at their source. The problem is no longer only what the system reads, but what remains available to be read.

Reactivation resistance names a governance-relevant capacity: the human ability to keep a system offline once shutdown has become possible. Shutdown design asks whether a system can be stopped. Reactivation resistance asks whether the human can sustain that stoppage when institutional, social, and psychological pressures mount to restart.

The Anthropic case illustrates the distinction. Dario Amodei could refuse the Department of Defense’s demand in part because he occupied a position of structural detachment from the daily operational loop. The analyst, officer, or operator whose workflow and professional identity have become intertwined with the system faces a different problem. By the time explicit risk evaluation begins, the justification for continued use may already be assembling itself.

What the field is not measuring

The AI safety community tracks the capability curve of AI systems with extraordinary precision: benchmark performance, scaling behavior, emergent capacities, and rates of improvement. There is no comparable measurement for the human side. No one is systematically tracking declines in emotional granularity, interpretive authority, or tolerance of ambiguity under sustained algorithmic mediation.

We know how fast the machine is changing. We have only begun to ask how fast the human is changing with it.

If AI safety rests on a human foundation, that foundation requires monitoring with the same seriousness applied to the systems it is meant to govern.

AI safety may need not only a theory of controllable systems, but a theory of preservable human refusal.


A computational model addressing predictive emotional selfhood (PESAM) is currently under review at Acta Psychologica. An extended essay developing the full argument and timeline is available on Substack.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Social Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology
Science Ethics
Humanities and Social Sciences > Society > Science and Technology Studies > Science, Technology and Society > Science Ethics
Public Policy
Humanities and Social Sciences > Politics and International Studies > Public Policy

Ask the Editor – Collective decision-making

Got a question for the editor about Experimental Psychology and Social Psychology? Ask it here!

Continue reading announcement

Related Collections

With Collections, you can get published faster and increase your visibility.

Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions

Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.

This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.

This Collection supports and amplifies research related to SDG 4 and SDG 9.

Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics

Publishing Model: Open Access

Deadline: May 31, 2026

AI and Big Data-Driven Finance and Management

This collection aims to bring together cutting-edge research and practical advancements at the intersection of artificial intelligence, big data analytics, finance, and management. As AI technologies and data-driven methodologies increasingly shape the future of financial services, corporate governance, and industrial decision-making, there is a growing need to explore their applications, implications, and innovations in real-world contexts.

The scope of this collection includes, but is not limited to, the following areas:

- AI models for financial forecasting, fraud detection, credit risk assessment, and regulatory compliance

- Machine learning techniques for portfolio optimization, stock price prediction, and trading strategies

- Data-driven approaches in corporate decision-making, performance evaluation, and strategic planning

- Intelligent systems for industrial optimization, logistics, and supply chain management

- Fintech innovations, digital assets, and algorithmic finance

- Ethical, regulatory, and societal considerations in deploying AI across financial and managerial domains

By highlighting both theoretical developments and real-world applications, this collection seeks to offer valuable insights to researchers, practitioners, and policymakers. Contributions that emphasize interdisciplinary approaches, practical relevance, and explainable AI are especially encouraged.

This Collection supports and amplifies research related to SDG 8 and SDG 9.

Keywords: AI in Finance, Accountability, Applied Machine Learning, Artificial Intelligence, Big Data

Publishing Model: Open Access

Deadline: Apr 30, 2026