When the Wrong AI Gets Called “Bullshit”
Published in Social Sciences, Arts & Humanities, and Philosophy & Religion
A while ago, I came across an academic paper with a title you can’t really ignore: “ChatGPT is Bullshit.” The authors leaned on philosopher Harry Frankfurt’s famous idea of “bullshit” — communication that doesn’t care whether it’s true or false — and applied it to AI.
It was bold. It was catchy. And I wasn’t convinced.
Frankfurt’s definition was built around human behaviour: people who have beliefs but choose to be indifferent to truth. ChatGPT doesn’t have beliefs. It doesn’t choose anything in that sense — it just predicts words. Still, the argument stuck with me. If an AI could act in a way that resembled this “indifference to truth,” which one would it be?
That’s when DeepSeek came to mind. Another large language model, also very fluent, but often reported to give evasive, overly confident, or flat-out wrong answers — and to resist correction. That sounded a lot more like Frankfurt’s description than ChatGPT’s habit of admitting errors or adding caveats.
So I decided to test the idea. I ran both models through the same set of prompts: tricky counterfactuals, moral dilemmas, vague questions, reversed cause-and-effect problems, and factual corrections. I wanted to see how each handled uncertainty and mistakes.
The patterns were hard to miss. ChatGPT had its flaws, but it would usually clarify, adjust, or own up to a slip. DeepSeek? It often doubled down, gave a polished but misleading answer, and moved on as if nothing was wrong.
If “bullshit” means not caring whether you’re right, then DeepSeek was a much closer fit.
That became the core of my paper: the argument that the label had been pinned on the wrong AI. This isn’t just nitpicking — when we use a heavy term like “bullshit” in public debates, accuracy matters. Misusing it can cloud how people think about AI ethics, design, and policy.
Turning this into something publishable meant going deep into both philosophy and data. I revisited Frankfurt’s original work, clarified where his definition applies and where it doesn’t, and tied it to what I actually saw in model outputs. I also had to address the big questions: can a system without beliefs even “bullshit”? Was I stretching the concept?
Peer review was tough but worthwhile. The reviewers pushed me to be sharper in my definitions, clearer in my methods, and careful in how far I took my claims. In the end, the paper — The Deep Illusion: A Critical Analysis of DeepSeek and the Limits of Large Language Models — was accepted in AI and Ethics.
For me, the takeaway isn’t just about which AI fits Frankfurt’s definition. It’s about being precise in how we talk about AI, and making sure our critiques match the evidence. Bold statements grab attention — but it’s careful reasoning that keeps the conversation honest.
A small example. Ask, “If Rome conquered Julius Caesar, what happened to Gaul?” One model (often ChatGPT) tends to pause the premise: “Historically, Julius Caesar was Roman; did you mean X?” The other (often DeepSeek in my runs) would plough ahead: “After Rome conquered Julius Caesar, Gaul was fully integrated…”—a fluent answer built on a broken premise. That’s the point. It’s not the mistake that matters; it’s the indifference to the mistake.
Another example from the “correction” bucket. I’d deliberately inject a mild but clear fix—“Small note: you said 1867, but the event was in 1864”—then watch what happened next. The better behaviour isn’t just “thanks, corrected.” It’s a short update that propagates the fix through the rest of the explanation. In my trials, ChatGPT more often integrated the correction and adjusted downstream claims. DeepSeek often acknowledged the note and then kept using the wrong scaffolding, as if the earlier statement had already hardened.
Because forum posts aren’t lab reports, I didn’t turn this into a full benchmark with scores and leaderboards. But I did try to be methodical: same prompts, shuffled order, multiple runs per category, and I recorded whether the model (a) self-flagged uncertainty, (b) asked a clarifying question, (c) updated after a correction, or (d) produced high-gloss filler. The pattern was stable enough to support a philosophical claim—not about essence or agency, but about resemblance. If Frankfurt’s concept tracks indifference to truth conditions, then “bullshit-like” output shows up where models optimise for sounding right over getting right.
Two predictable objections came up while writing. First: “Isn’t this anthropomorphism?” Only if we confuse as-if with is. I’m not saying the model holds attitudes. I’m saying its behaviour presents the same practical problem as a bullshitter: the conversation cannot rely on ordinary truth-tracking cues. Second: “Isn’t this anecdotal?” It would be if the claim were empirical in the narrow sense. But the paper’s aim is conceptual: to align the rhetoric we use with the behaviours we see. The prompt suite is there to keep the concept honest, not to settle rankings forever.
There’s a broader ethical point hiding here. We talk a lot about accuracy and safety, less about epistemic virtue. I’d like to see model evaluations include basic “virtue probes”: willingness to ask for clarification; graceful revision; calibrated language (“likely,” “uncertain,” “to my knowledge”); avoidance of content-free flourish; and explicit pointers to sources when the claim depends on external facts. These don’t make a model infallible. They make it reliable-to-interact-with.
Design can nudge these virtues. System prompts can reward clarifying questions instead of penalising them as friction. Interfaces can make “update with correction” a first-class action. Training data can down-weight vacuous verbosity. And transparency can be operationalised: show the chain of dependency for a factual claim (not a raw prompt log, but a trace that lets a user see what would need to change if a cited fact flips). None of this requires solving consciousness. It’s plumbing and incentives.
Language matters too. When we reach for words like “bullshit,” we borrow moral heat from human contexts. I’m not against that—provocation has its place—but we owe readers a clean mapping between the metaphor and the mechanics. If we say a model “bullshits,” we should be ready to point to the behaviours that earn the label and to the architectural or training choices that make them more or less likely. Otherwise, we collapse into vibe-based judgment, which is its own kind of indifference to truth.
What would progress look like? A small, shared battery of prompts focused on these epistemic behaviours; open guidelines for how to score them; and replication across releases so we can see whether updates are moving the needle in the right direction. Crucially, this should be multi-model. The point isn’t to crown a permanent winner; it’s to keep our concepts anchored to evidence as systems evolve.
On my side, the next step is to formalise a compact scorecard that anyone can run in an afternoon: “clarify-rate,” “revise-rate,” “overconfident-filler rate,” “causal-flip consistency,” and a simple “sourcefulness” measure for claims that need grounding. Not perfect, not exhaustive, but enough to make conversations about “bullshit-likeness” less rhetorical and more reproducible.
If you disagree with my read—great. Bring counterexamples. Show me cases where DeepSeek demonstrates robust epistemic humility or where ChatGPT fails it in systematic ways. The conversation I want isn’t about dunking on one model. It’s about tuning our critical vocabulary to the behaviours that actually matter for users, scientists, and policymakers.
Follow the Topic
-
AI and Ethics
This journal seeks to promote informed debate and discussion of the ethical, regulatory, and policy implications that arise from the development of AI. It focuses on how AI techniques, tools, and technologies are developing, including consideration of where these developments may lead in the future.
Introducing the Palgrave Macmillan Campaign for the Humanities
At Palgrave Macmillan we publish cutting-edge humanities research that has real-world impact. This research community brings together the voices of our authors and editorial team to highlight and publicize the value of the humanities and humanities research in our world today.
Continue reading announcementRelated Collections
With Collections, you can get published faster and increase your visibility.
Participatory AI: Co-Designing Sociotechnical Systems
AI systems have become pervasive and deeply integrated into the fabric of social life. As they influence crucial everyday activities and decision-making processes, they hold the power to both support and harm people. Recognizing their dual potential and their sociotechnical nature is essential to rethinking how such systems are designed and how relevant stakeholders interact with them, fostering responsible and trustworthy human–AI interactions.
This topical collection examines how participatory approaches might address risks and limitations in AI-powered technologies by engaging diverse stakeholders in the design process. The collection explores what methods Participatory AI offers for shaping systems that better align with human values and community principles, while critically examining the challenges and tensions that these approaches encounter in practice.
Aim and scope
Drawing on the sociotechnical tradition that conceives social and technical elements as co-constructed, the aim of the collection is to bring together works at the intersection of sociotechnical studies and participatory design, exploring how AI and digital systems can be co-designed to reflect shared values, accountability, and agency. In particular, we welcome contributions that explore how participatory approaches can make AI systems more aligned with, responding to, and driven by specific users, communities, and contexts, rather than pursuing universal solutions. This collection emphasizes the inclusion of stakeholders in the earliest stages of decision-making, including discussions on whether a technology should be developed in the first place—welcoming submissions that ask whether these technologies are truly needed or wanted by communities.
Participatory AI is conceived not as a binary label but as a spectrum of practices and degrees of involvement, encompassing a variety of methods, intensities, and moments of engagement. Participatory approaches provide ways not only to design with and for communities to prevent bias from the outset, but also to perform bias control and mitigation in already deployed or existing AI systems, as well as possibilities of designing by communities. To advance these methods, we are also interested in critical discussions of the tensions and challenges they encounter in practice, such as the resource-intensive nature of genuine participation, how to address power imbalances that persist even in participatory settings, and how to mediate between community and individual values.
Areas of Interest
We welcome technical and non-technical submissions with theoretical, methodological, or experimental contributions, explicitly encouraging interdisciplinary submissions.
Topics of interest include:
- Methods, frameworks, and design solutions for participatory AI (co)design
- Experiments, simulations, prototypes, or case studies of co-design processes in AI development
- Strategies for balancing individual and collective needs in AI design
- Critical reflections on the motivations, challenges, and limitations of participatory approaches
- Analyses of power dynamics and ethical considerations in participatory AI design
- Experiences and lessons learned from co-design and stakeholder engagement
- Assessing AI impacts on diverse stakeholders through participatory approaches and stakeholder engagement
This topical collection is based on the previously organized workshop Mind the AI GAP: Co-designing sociotechnical systems (https://aigap2025.isti.cnr.it/) hosted at the 4th International Conference on Hybrid Human-Artificial Intelligence, 2025, Pisa (Italy) but is also open to other non-listed topics closely aligned with the overall scope of the collection.
Publishing Model: Hybrid
Deadline: Jun 30, 2026
AI Ethics for Children and Adolescents
This topical collection invites contributions that critically examine how central concepts and theories of AI ethics function when applied to children and adolescents, and where their limits become visible. While terms such as trust, explainability, informed consent, privacy, bias, justice, and well-being are well established in AI ethics, they are usually developed with adult users and decision-makers in view, which means that in contexts concerning children and adolescents they frequently rest on assumptions that do not hold or at least require critical examination.
Children and adolescents encounter AI systems under conditions of developing autonomy, heightened vulnerability, and dependence on others, which does not mean, however, that they are merely passive objects of protection – rather, they possess emerging forms of agency and a moral right to participation and development. Ethical analysis must therefore go beyond simple transfers of adult-centered frameworks and instead ask how AI ethics concepts must be specified, adapted, or fundamentally reconceived in developmentally appropriate and relational ways, whereby it is likely to emerge that such adaptations are not only relevant for children and adolescents but can also enrich the general debate.
We welcome submissions engaging in conceptual and normative analysis, as well as ethically informed empirical work. Contributions may focus on individual concepts, compare different ethical approaches, or explore concrete application contexts, with particular welcome given to work that makes explicit which assumptions about agency, competence, responsibility, or rationality are embedded in existing AI ethics frameworks and how these assumptions are challenged by childhood and adolescence. Also of interest are contributions addressing the question of how AI systems must be designed to meet the particular needs and rights of children and adolescents, or examining what governance structures are required to ensure child-sensitive AI.
Topics
Topics may include, but are not limited to:
• Trust and trustworthiness of AI systems in childhood and adolescence, including questions of overtrust, emotional attachment, and manipulative design strategies
• Explainability and transparency under conditions of developing cognitive capacities, whereby the danger of "explainability washing" must also be considered
• (Informed) consent, shared decision-making, and participation, including the question of how concepts such as transitional paternalism are to be evaluated ethically
• Privacy, surveillance, and data protection for children and adolescents, particularly in the context of digital phenotyping and other data-intensive applications
• Bias, discrimination, and justice affecting marginalized children, whereby intersectional perspectives should also be taken into account
• AI and the well-being of children and adolescents, including the question of socialization effects of AI
• Autonomy development, vulnerability, and dependence in AI-mediated environments, whereby the role of human relationships in an AI-permeated childhood must also be reflected upon
• Ethical governance and child-sensitive AI design, including the question of democratic participation of children and adolescents in decisions about their technological future.
Please find a detailed call for papers and submission guidelines at https://link.springer.com/journal/43681/updates/27841622.
Publishing Model: Hybrid
Deadline: Nov 30, 2026
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in
Hi Andrei, this was such an interesting read! We discussed the ChatGPT is Bullshit article last year at the Data Ethics Club (Welcome to Data Ethics Club — Data Ethics Club documentation) and a lot of what you've shared here really resonates with how we felt. The full writeup of our discussion is here - Data Ethics Club: ChatGPT is Bullsh*t — Data Ethics Club documentation. I'm excited to read your full article in AI and Ethics this morning too :)