Behind the Paper

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Three shutdown mechanisms were activated in nine days during the Anthropic-Pentagon dispute. None held. This post examines what that failure reveals about an untested assumption in AI safety: that human capacity to refuse and resist reactivation remains intact under sustained algorithmic interaction

Published in Social Sciences, Computational Sciences, and Behavioural Sciences & Psychology

Mar 07, 2026

Ryan Sangbaek Kim

Director & Principal Investigator, Ryan Research Institute (RRI)

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Liked by India Ambler and 3 others

Explore the Research

SpringerLink

Formal and computational foundations for implementing Affective Sovereignty in emotion AI systems - Discover Artificial Intelligence

Emotional artificial intelligence (AI)—systems that infer, simulate, or influence human feelings—create ethical risks that existing frameworks of privacy, transparency, and oversight cannot fully address. This paper advances the concept of Affective Sovereignty: the right of individuals to remain the ultimate interpreters of their own emotions. We make four contributions. First, we develop formal foundations by decomposing risk functions to capture interpretive override as a measurable cost. Second, we propose a Sovereign-by-Design architecture that embeds safeguards and contestability into the machine learning lifecycle. Third, we operationalize sovereignty through new metrics—the Interpretive Override Score (IOS), After-correction Misalignment Rate (AMR), and Affective Divergence (AD)—and demonstrate their use in a proof-of-concept simulation. Fourth, we link technical design to governance by introducing the Affective Sovereignty Contract (ASC), a machine-readable policy layer, and by issuing a Declaration of Affective Sovereignty as a normative anchor for regulation. Together, these elements offer a computational framework for aligning emotional AI with human dignity and autonomy, moving beyond abstract principles toward enforceable, testable standards. In proof-of-mechanism simulations with $$k=10$$ random seeds, enforcing DRIFT (Dynamic Risk and Interpretability Feedback Throttling) with policy constraints reduces the Interpretive Override Score (IOS) from $$32.4\%\pm 3.8$$ (baseline) to $$14.1\%\pm 2.9$$, demonstrating measurable preservation of affective sovereignty with quantified variability. Results reported here are based on proof-of-mechanism simulations; a preregistered human-subject evaluation ($$n=48$$) is planned and has not yet been conducted.

In late February 2026, the Anthropic-Pentagon dispute exposed a problem that AI safety theory has not yet adequately modeled. A corporate refusal, a presidential ban, and a military transition order were all activated within nine days. The system never fully stopped. It changed providers, changed networks, and continued operating.

This is not only a political sequence. It is a theoretical one. AI safety has developed increasingly sophisticated models of shutdown, controllability, and alignment. What it has not yet developed is a theory of the human conditions required to keep a system offline once shutdown becomes possible.

The missing assumption

At IASEAI’26 in Paris, Vincent Conitzer presented a formal framework for shutdown safety valves in advanced AI systems. The framework specifies four conditions: the system recognizes danger, the system values halting, the operator remains rational, and reactivation requires deliberate human judgment. Gillian Hadfield, at the same conference, argued that AI systems must develop normative competence by learning from human emotional and social signals, including enforcement, punishment, forgiveness, and shifts in tone.

Both frameworks are important. Both also presuppose something that is rarely examined directly: that the human operator remains a stable emotional and cognitive reference point. Neither framework asks what happens if the human side of the loop is itself changing under sustained interaction with the very systems being governed.

Why the human baseline is not stable

My recent study in Computers in Human Behavior Reports (Kim, 2026; Vol. 21, Article 100975) suggests that this change is already measurable. In a cross-sectional study of 301 U.S. adults, functional AI use temporally preceded emotional closeness to the system, not the reverse. Users did not first decide to trust and then engage. They engaged, and trust assembled around repeated use. In a longitudinal study of 234 Singaporean university students, habitual interaction predicted deepening attachment over time. Anthropomorphism did not alter the direction of this effect, but it increased its speed.

These findings matter for shutdown theory because they suggest that the operator’s critical distance from the system is not a fixed baseline. It is a variable shaped by frequency, habit, and relational framing, and under repeated use it appears to move in one direction: toward greater dependence, reduced interpretive distance, and a diminished ability to tolerate the system’s absence.

Two concepts for the safety discourse

This line of work introduces two concepts that may help clarify the problem.

Affective Sovereignty names the background condition: the right and capacity to interpret one’s own emotional states without algorithmic override. I have developed a formal architecture for this principle in Discover Artificial Intelligence (Kim, 2026). When affective sovereignty erodes, the human signals on which normative competence depends also degrade at their source. The problem is no longer only what the system reads, but what remains available to be read.

Reactivation resistance names a governance-relevant capacity: the human ability to keep a system offline once shutdown has become possible. Shutdown design asks whether a system can be stopped. Reactivation resistance asks whether the human can sustain that stoppage when institutional, social, and psychological pressures mount to restart.

The Anthropic case illustrates the distinction. Dario Amodei could refuse the Department of Defense’s demand in part because he occupied a position of structural detachment from the daily operational loop. The analyst, officer, or operator whose workflow and professional identity have become intertwined with the system faces a different problem. By the time explicit risk evaluation begins, the justification for continued use may already be assembling itself.

What the field is not measuring

The AI safety community tracks the capability curve of AI systems with extraordinary precision: benchmark performance, scaling behavior, emergent capacities, and rates of improvement. There is no comparable measurement for the human side. No one is systematically tracking declines in emotional granularity, interpretive authority, or tolerance of ambiguity under sustained algorithmic mediation.

We know how fast the machine is changing. We have only begun to ask how fast the human is changing with it.

If AI safety rests on a human foundation, that foundation requires monitoring with the same seriousness applied to the systems it is meant to govern.

AI safety may need not only a theory of controllable systems, but a theory of preservable human refusal.

A computational model addressing predictive emotional selfhood (PESAM) is currently under review at Acta Psychologica. An extended essay developing the full argument and timeline is available on Substack.

Ryan Sangbaek Kim (He/Him)

Director & Principal Investigator, Ryan Research Institute (RRI)

Ryan Sangbaek Kim is the founding director and principal investigator of the Ryan Research Institute (RRI), an independent institute based in Paris. Working across affective neuroscience, theoretical psychology, philosophy of mind, AI ethics, and law, he has developed a sustained interdisciplinary research program on the interpretation, suppression, and governance of emotion in human and machine systems.

He is best known for introducing Affective Sovereignty, a socio-technical design right that locates the person as the final interpreter of his or her own emotional life under conditions of computational mediation. His broader body of work includes the concepts of Affective Suppression Fatigue (ASF), Algorithmic Affective Blunting (AAB), and Predictive Emotional Self-Modeling (PESAM), through which he integrates computational formalism, phenomenological inquiry, and regulatory thought.

His work moves across academic research, public writing, and emotion-centered design, guided by the view that scholarship, culture, and technological form are not separate domains but continuous sites of interpretation.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Social Psychology

Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology

Science Ethics

Humanities and Social Sciences > Society > Science and Technology Studies > Science, Technology and Society > Science Ethics

Public Policy

Humanities and Social Sciences > Politics and International Studies > Public Policy

Discover Artificial Intelligence

Discover Artificial Intelligence

This is a transdisciplinary, international journal that publishes papers on all aspects of the theory, the methodology and the applications of artificial intelligence (AI).

More about the journal

Ask the Editor – Collective decision-making

Got a question for the editor about Experimental Psychology and Social Psychology? Ask it here!

Related Collections

With Collections, you can get published faster and increase your visibility.

Enhancing Trust in Healthcare: Implementing Explainable AI

Healthcare increasingly relies on Artificial Intelligence (AI) to assist in various tasks, including decision-making, diagnosis, and treatment planning. However, integrating AI into healthcare presents challenges. These are primarily related to enhancing trust in its trustworthiness, which encompasses aspects such as transparency, fairness, privacy, safety, accountability, and effectiveness. Patients, doctors, stakeholders, and society need to have confidence in the ability of AI systems to deliver trustworthy healthcare. Explainable AI (XAI) is a critical tool that provides insights into AI decisions, making them more comprehensible (i.e., explainable/interpretable) and thus contributing to their trustworthiness. This topical collection explores the contribution of XAI in ensuring the trustworthiness of healthcare AI and enhancing the trust of all involved parties. In particular, the topical collection seeks to investigate the impact of trustworthiness on patient acceptance, clinician adoption, and system effectiveness. It also delves into recent advancements in making healthcare AI decisions trustworthy, especially in complex scenarios. Furthermore, it underscores the real-world applications of XAI in healthcare and addresses ethical considerations tied to diverse aspects such as transparency, fairness, and accountability.

We invite contributions to research into the theoretical underpinnings of XAI in healthcare and its applications. Specifically, we solicit original (interdisciplinary) research articles that present novel methods, share empirical studies, or present insightful case reports. We also welcome comprehensive reviews of the existing literature on XAI in healthcare, offering unique perspectives on the challenges, opportunities, and future trajectories. Furthermore, we are interested in practical implementations that showcase real-world, trustworthy AI-driven systems for healthcare delivery that highlight lessons learned.

We invite submissions related to the following topics (but not limited to):

- Theoretical foundations and practical applications of trustworthy healthcare AI: from design and development to deployment and integration.

- Transparency and responsibility of healthcare AI.

- Fairness and bias mitigation.

- Patient engagement.

- Clinical decision support.

- Patient safety.

- Privacy preservation.

- Clinical validation.

- Ethical, regulatory, and legal compliance.

Publishing Model: Open Access

Deadline: Sep 10, 2026

Explore this Collection

Artificial Intelligence for Sustainable Agriculture and Food Security

Artificial intelligence (AI) is rapidly transforming the agri-food value chain: from precise crop and soil monitoring, adaptive water and nutrient management, and early detection of pests and diseases, to yield forecasting under increasing climate variability and the optimization of transparent supply chain logistics.

This Collection aims to gather cutting-edge interdisciplinary research demonstrating how AI can enhance agricultural productivity, resilience and sustainability while safeguarding biodiversity and promoting equitable access to nutritious food. We welcome theoretical advances, novel algorithms, field-validated prototypes and socio-technical studies that bridge the gap between AI research and real-world agricultural impact, with particular attention to smallholder contexts, climate-smart practices and responsible, explainable AI.

This Collection supports and amplifies research related to SDG 2, SDG 9, SDG 12, and SDG 13.

Keywords: Artificial Intelligence; Sustainable Agriculture; Food Security; Autonomous Robotics; Agricultural IoT; Precision Farming; Crop Monitoring; Supply‑chain Optimization; Climate‑smart Agriculture; Remote Sensing

Publishing Model: Open Access

Deadline: Jun 30, 2026

Explore this Collection

How Emotional Discrepancy Stopped Looking Like Error

Behind the Paper

When Good Design Becomes the Problem: Reflective Summarization and the Measurement Gap in Affective AI

Behind the Paper

Interpretive Authority: The Missing Layer in AI Governance

Behind the Paper

Can Self-Awareness Exist Without a Self? What Indian Philosophy Asks of Consciousness Science and AI

Behind the Paper

Behaviour authors emotion: What AI attachment data reveal about a 200-year-old assumption

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Affective Preconditions for AI Safety: What the Anthropic-Pentagon Dispute Reveals

Share this post

Share with...

...or copy the link