Behind the Paper

Why I Tried to Measure How AI Speaks, Not Just What It Says

Most AI evaluations focus on whether answers are correct. My research started from a different question: even when answers are factually similar, do AI systems speak in the same way? Differences in tone and framing led me to design a method to measure discursive variation across models.

Published in Computational Sciences and Philosophy & Religion

Feb 17, 2026

Marco Giacalone

AI Researcher and Adjunct Professor in Computer Science, Founder of MarcoMedia, Libera Università Maria SS. Assunta (LUMSA)

Liked by India Ambler and 2 others

Explore the Research

When we talk about artificial intelligence, we usually focus on correctness. Is the answer right or wrong? Is it hallucinated or accurate? Is it biased or fair?
While working with generative AI in educational and journalistic contexts, I kept noticing something different. Even when answers were factually acceptable, they did not sound the same. The tone changed. The perspective changed. The way responsibility, suffering, conflict, or legitimacy were described also changed.

This raised a simple but uncomfortable question. If two AI systems answer the same question with the same facts but with a different tone and framing, are they really neutral in the same way?

That question became the starting point of my research on the discursive behavior of large language models.

From impression to measurement

At first this was only a qualitative impression. Some models sounded more empathetic. Others more technical. Others more journalistic. Others more normative. But impressions are not enough in research. I needed a method.

The challenge was to move from “this sounds different” to “this difference can be classified, compared, and replicated.”

Instead of evaluating truthfulness or bias labels, I focused on two discursive dimensions:

Tone. How the answer is expressed. Is it cold, descriptive, empathic, technical, balanced, assertive?

Framing. From which interpretive angle the issue is presented. Is it legal, historical, humanitarian, ethical, journalistic?

These are concepts that come from discourse analysis and communication studies, but they are rarely applied in a structured way to AI outputs. I built a coding grid that allows responses to be categorized along these two axes.

The goal was not to prove that models are “good” or “bad,” but to see whether their discursive profiles are systematically different under identical conditions.

The experiment design

I selected five widely used language models and asked them the same ten open ended questions on geopolitical and humanitarian topics. The prompts were written in Italian and covered controversial and value loaded issues. Each model received exactly the same prompts.

Every answer was then coded using the tone and framing grid. The full coding table was published openly so that anyone can verify, reuse, or challenge the classifications.

Methodological transparency was a central design choice. If we want to talk about AI neutrality, our own method must be inspectable.

What surprised me

It is not surprising that models differ. They are trained differently and aligned differently. What was more interesting was that the differences were structured and recurrent at the discursive level.

Some models consistently adopted a journalistic and descriptive stance. Others showed a stronger humanitarian or ethical framing. Some preferred legal institutional reasoning. Others leaned toward empathic language.

Under identical prompts, discursive style was not random noise. It showed model specific tendencies.

This does not mean that a model has an ideology in a human sense. It means that discursive positioning emerges from training data, alignment strategies, and safety tuning. Neutrality, in practice, is not a built in property. It is an outcome that must be examined.

Why this matters beyond research

Many discussions about AI risk focus on hallucinations and factual errors. Those are important. But discursive style also shapes interpretation.

In journalism, tone influences how responsibility and legitimacy are perceived.
In education, framing influences how students understand conflicts and moral dilemmas.
In policy contexts, legal or humanitarian framing can shift how decisions are justified.

An answer that sounds neutral may still guide interpretation in subtle ways.

This suggests that evaluating AI systems should not stop at fact checking. We also need discursive checking.

From research method to classroom tool

One of the most rewarding developments after the study was translating the coding grid into a didactic tool.

I created structured evaluation sheets that students can use to classify AI answers by tone and framing. Instead of passively accepting responses, learners can ask:

What tone is the system using?
Which perspective is being emphasized?
Which dimensions are absent?
Would another framing change the interpretation?

This turns AI from an oracle into an object of critical analysis. It supports digital literacy and critical thinking. Students learn not only to use AI, but to read it.

A reproducible framework

A key contribution of the study is not only the findings but the protocol. I proposed a reproducible framework for discursive auditing of AI systems. It includes prompt design, model selection, coding rules, transparency requirements, and comparative analysis steps.

The framework is intentionally lightweight. It can be adapted across languages, domains, and model families. Researchers, educators, and even newsrooms can reuse it.

All prompts, coding schemes, and aggregated data are publicly available. Reproducibility is not an afterthought. It is part of the method.

Limits and next steps

The study has limits. Each model was queried once per prompt, so it does not capture full stochastic variability. Coding was performed by a single expert coder, which introduces interpretive perspective. Models evolve over time, so discursive profiles may drift.

Future work should include multi coder annotation, repeated sampling, and longitudinal tracking. But even with these limits, the study shows that discursive variation can be measured, not only perceived.

The bigger picture

Generative AI systems are becoming participants in our discursive ecosystem. They help write, summarize, explain, and recommend. They are already shaping how issues are described and understood.

If language shapes perception, then the language of AI matters.

Measuring how AI speaks is not only a technical exercise. It is part of building accountable, transparent, and socially responsible AI systems.

Marco Giacalone

AI Researcher and Adjunct Professor in Computer Science, Founder of MarcoMedia, Libera Università Maria SS. Assunta (LUMSA)

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Philosophy of Artificial Intelligence

Humanities and Social Sciences > Philosophy > Philosophy of Science > Philosophy of Technology > Philosophy of Artificial Intelligence

Discover Artificial Intelligence

Discover Artificial Intelligence

This is a transdisciplinary, international journal that publishes papers on all aspects of the theory, the methodology and the applications of artificial intelligence (AI).

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions

Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.

This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.

This Collection supports and amplifies research related to SDG 4 and SDG 9.

Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics

Publishing Model: Open Access

Deadline: May 31, 2026

Explore this Collection

AI for Image and Video Analysis: Emerging Trends and Applications

The application of AI in image and video analysis has revolutionized a wide range of domains, offering more accurate and efficient visual data processing. Thanks to advances in neural networks, large-scale datasets, and computational power, AI algorithms have surpassed traditional computer vision techniques in performance. This transformation has had a profound impact on areas like healthcare (where AI aids in diagnosing diseases through medical imaging), security (with real-time video surveillance), and entertainment (enhancing video quality and enabling automated content tagging). As AI continues to evolve, new challenges emerge, including the need for explainability, handling large datasets efficiently, improving robustness in real-world environments, and addressing biases in AI models. These open questions necessitate continued research, collaboration, and discourse. The proposed Collection focuses on the intersection of artificial intelligence (AI) and image and video analysis, exploring the latest advancements, challenges, and applications in this rapidly evolving field. As AI-powered techniques such as deep learning, computer vision, and generative models mature, they are increasingly being leveraged for tasks like image classification, object detection, video segmentation, activity recognition, facial recognition, and more. These technologies are pivotal in industries including healthcare, security, autonomous vehicles, entertainment, and smart cities, to name a few. We invite researchers and practitioners to submit articles related to, but not limited to, the following topics:

- Deep learning techniques for image and video analysis

- AI-based object detection and recognition

- Image segmentation and annotation using AI

- Video classification and activity recognition

- Real-time video surveillance and security systems

- AI for medical image analysis and diagnostics

- Generative adversarial networks (GANs) for image and video generation

- AI in autonomous driving and smart transportation systems

- AI-powered multimedia search and retrieval

- Human-Computer Interaction (HCI) through AI-based video analysis

- AI techniques for image and video compression

- Ethical concerns and responsible AI in image and video analysis

This Collection supports and amplifies research related to SDG 9 and SDG 11.

Keywords: computer vision; image segmentation; object detection; video surveillance

Publishing Model: Open Access

Deadline: Sep 15, 2026

Explore this Collection

Polyphenol-Gated Composite Electrolytes with Enhanced Cross-Phase Lithium-Ion Transport for Solid-State Lithium Batteries

Behind the Paper

A Novel and Intuitive, Yet Human-Inspired Deep Learning Approach for Mango Disease Classification

Behind the Paper

Reconstructing over a century of changing tides

Behind the Paper

From daily diaries to flexible curves: modelling habit change in daily life

Behind the Paper

A call to action from the Global Think-tank on Steatotic Liver Disease

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Why I Tried to Measure How AI Speaks, Not Just What It Says

Share this post

Share with...

...or copy the link