Behind the Paper

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Three shutdown mechanisms were activated in nine days during the Anthropic-Pentagon dispute. None held. This post examines what that failure reveals about an untested assumption in AI safety: that human capacity to refuse and resist reactivation remains intact under sustained algorithmic interaction

Published in Social Sciences, Computational Sciences, and Behavioural Sciences & Psychology

Mar 07, 2026

Ryan Sangbaek Kim

Director & Principal Investigator, Ryan Research Institute (RRI)

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Liked by India Ambler and 3 others

Explore the Research

SpringerLink

Formal and computational foundations for implementing Affective Sovereignty in emotion AI systems - Discover Artificial Intelligence

Emotional artificial intelligence (AI)—systems that infer, simulate, or influence human feelings—create ethical risks that existing frameworks of privacy, transparency, and oversight cannot fully address. This paper advances the concept of Affective Sovereignty: the right of individuals to remain the ultimate interpreters of their own emotions. We make four contributions. First, we develop formal foundations by decomposing risk functions to capture interpretive override as a measurable cost. Second, we propose a Sovereign-by-Design architecture that embeds safeguards and contestability into the machine learning lifecycle. Third, we operationalize sovereignty through new metrics—the Interpretive Override Score (IOS), After-correction Misalignment Rate (AMR), and Affective Divergence (AD)—and demonstrate their use in a proof-of-concept simulation. Fourth, we link technical design to governance by introducing the Affective Sovereignty Contract (ASC), a machine-readable policy layer, and by issuing a Declaration of Affective Sovereignty as a normative anchor for regulation. Together, these elements offer a computational framework for aligning emotional AI with human dignity and autonomy, moving beyond abstract principles toward enforceable, testable standards. In proof-of-mechanism simulations with $$k=10$$ random seeds, enforcing DRIFT (Dynamic Risk and Interpretability Feedback Throttling) with policy constraints reduces the Interpretive Override Score (IOS) from $$32.4\%\pm 3.8$$ (baseline) to $$14.1\%\pm 2.9$$, demonstrating measurable preservation of affective sovereignty with quantified variability. Results reported here are based on proof-of-mechanism simulations; a preregistered human-subject evaluation ($$n=48$$) is planned and has not yet been conducted.

In late February 2026, the Anthropic-Pentagon dispute exposed a problem that AI safety theory has not yet adequately modeled. A corporate refusal, a presidential ban, and a military transition order were all activated within nine days. The system never fully stopped. It changed providers, changed networks, and continued operating.

This is not only a political sequence. It is a theoretical one. AI safety has developed increasingly sophisticated models of shutdown, controllability, and alignment. What it has not yet developed is a theory of the human conditions required to keep a system offline once shutdown becomes possible.

The missing assumption

At IASEAI’26 in Paris, Vincent Conitzer presented a formal framework for shutdown safety valves in advanced AI systems. The framework specifies four conditions: the system recognizes danger, the system values halting, the operator remains rational, and reactivation requires deliberate human judgment. Gillian Hadfield, at the same conference, argued that AI systems must develop normative competence by learning from human emotional and social signals, including enforcement, punishment, forgiveness, and shifts in tone.

Both frameworks are important. Both also presuppose something that is rarely examined directly: that the human operator remains a stable emotional and cognitive reference point. Neither framework asks what happens if the human side of the loop is itself changing under sustained interaction with the very systems being governed.

Why the human baseline is not stable

My recent study in Computers in Human Behavior Reports (Kim, 2026; Vol. 21, Article 100975) suggests that this change is already measurable. In a cross-sectional study of 301 U.S. adults, functional AI use temporally preceded emotional closeness to the system, not the reverse. Users did not first decide to trust and then engage. They engaged, and trust assembled around repeated use. In a longitudinal study of 234 Singaporean university students, habitual interaction predicted deepening attachment over time. Anthropomorphism did not alter the direction of this effect, but it increased its speed.

These findings matter for shutdown theory because they suggest that the operator’s critical distance from the system is not a fixed baseline. It is a variable shaped by frequency, habit, and relational framing, and under repeated use it appears to move in one direction: toward greater dependence, reduced interpretive distance, and a diminished ability to tolerate the system’s absence.

Two concepts for the safety discourse

This line of work introduces two concepts that may help clarify the problem.

Affective Sovereignty names the background condition: the right and capacity to interpret one’s own emotional states without algorithmic override. I have developed a formal architecture for this principle in Discover Artificial Intelligence (Kim, 2026). When affective sovereignty erodes, the human signals on which normative competence depends also degrade at their source. The problem is no longer only what the system reads, but what remains available to be read.

Reactivation resistance names a governance-relevant capacity: the human ability to keep a system offline once shutdown has become possible. Shutdown design asks whether a system can be stopped. Reactivation resistance asks whether the human can sustain that stoppage when institutional, social, and psychological pressures mount to restart.

The Anthropic case illustrates the distinction. Dario Amodei could refuse the Department of Defense’s demand in part because he occupied a position of structural detachment from the daily operational loop. The analyst, officer, or operator whose workflow and professional identity have become intertwined with the system faces a different problem. By the time explicit risk evaluation begins, the justification for continued use may already be assembling itself.

What the field is not measuring

The AI safety community tracks the capability curve of AI systems with extraordinary precision: benchmark performance, scaling behavior, emergent capacities, and rates of improvement. There is no comparable measurement for the human side. No one is systematically tracking declines in emotional granularity, interpretive authority, or tolerance of ambiguity under sustained algorithmic mediation.

We know how fast the machine is changing. We have only begun to ask how fast the human is changing with it.

If AI safety rests on a human foundation, that foundation requires monitoring with the same seriousness applied to the systems it is meant to govern.

AI safety may need not only a theory of controllable systems, but a theory of preservable human refusal.

A computational model addressing predictive emotional selfhood (PESAM) is currently under review at Acta Psychologica. An extended essay developing the full argument and timeline is available on Substack.

Ryan Sangbaek Kim (He/Him)

Director & Principal Investigator, Ryan Research Institute (RRI)

Ryan Sangbaek Kim is the founding director and principal investigator of the Ryan Research Institute (RRI), an independent institute based in Paris. Working across affective neuroscience, theoretical psychology, philosophy of mind, AI ethics, and law, he has developed a sustained interdisciplinary research program on the interpretation, suppression, and governance of emotion in human and machine systems.

He is best known for introducing Affective Sovereignty, a socio-technical design right that locates the person as the final interpreter of his or her own emotional life under conditions of computational mediation. His broader body of work includes the concepts of Affective Suppression Fatigue (ASF), Algorithmic Affective Blunting (AAB), and Predictive Emotional Self-Modeling (PESAM), through which he integrates computational formalism, phenomenological inquiry, and regulatory thought.

His work moves across academic research, public writing, and emotion-centered design, guided by the view that scholarship, culture, and technological form are not separate domains but continuous sites of interpretation.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Social Psychology

Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology

Science Ethics

Humanities and Social Sciences > Society > Science and Technology Studies > Science, Technology and Society > Science Ethics

Public Policy

Humanities and Social Sciences > Politics and International Studies > Public Policy

Discover Artificial Intelligence

Discover Artificial Intelligence

This is a transdisciplinary, international journal that publishes papers on all aspects of the theory, the methodology and the applications of artificial intelligence (AI).

More about the journal

Ask the Editor – Collective decision-making

Got a question for the editor about Experimental Psychology and Social Psychology? Ask it here!

Related Collections

With Collections, you can get published faster and increase your visibility.

Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions

Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.

This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.

This Collection supports and amplifies research related to SDG 4 and SDG 9.

Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics

Publishing Model: Open Access

Deadline: May 31, 2026

Explore this Collection

AI for Image and Video Analysis: Emerging Trends and Applications

The application of AI in image and video analysis has revolutionized a wide range of domains, offering more accurate and efficient visual data processing. Thanks to advances in neural networks, large-scale datasets, and computational power, AI algorithms have surpassed traditional computer vision techniques in performance. This transformation has had a profound impact on areas like healthcare (where AI aids in diagnosing diseases through medical imaging), security (with real-time video surveillance), and entertainment (enhancing video quality and enabling automated content tagging). As AI continues to evolve, new challenges emerge, including the need for explainability, handling large datasets efficiently, improving robustness in real-world environments, and addressing biases in AI models. These open questions necessitate continued research, collaboration, and discourse. The proposed Collection focuses on the intersection of artificial intelligence (AI) and image and video analysis, exploring the latest advancements, challenges, and applications in this rapidly evolving field. As AI-powered techniques such as deep learning, computer vision, and generative models mature, they are increasingly being leveraged for tasks like image classification, object detection, video segmentation, activity recognition, facial recognition, and more. These technologies are pivotal in industries including healthcare, security, autonomous vehicles, entertainment, and smart cities, to name a few. We invite researchers and practitioners to submit articles related to, but not limited to, the following topics:

- Deep learning techniques for image and video analysis

- AI-based object detection and recognition

- Image segmentation and annotation using AI

- Video classification and activity recognition

- Real-time video surveillance and security systems

- AI for medical image analysis and diagnostics

- Generative adversarial networks (GANs) for image and video generation

- AI in autonomous driving and smart transportation systems

- AI-powered multimedia search and retrieval

- Human-Computer Interaction (HCI) through AI-based video analysis

- AI techniques for image and video compression

- Ethical concerns and responsible AI in image and video analysis

This Collection supports and amplifies research related to SDG 9 and SDG 11.

Keywords: computer vision; image segmentation; object detection; video surveillance

Publishing Model: Open Access

Deadline: Sep 15, 2026

Explore this Collection

How Emotional Discrepancy Stopped Looking Like Error

Behind the Paper

When Good Design Becomes the Problem: Reflective Summarization and the Measurement Gap in Affective AI

Behind the Paper

Interpretive Authority: The Missing Layer in AI Governance

Behind the Paper

Can Self-Awareness Exist Without a Self? What Indian Philosophy Asks of Consciousness Science and AI

Behind the Paper

Behaviour authors emotion: What AI attachment data reveal about a 200-year-old assumption

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Share this post

Share with...

...or copy the link