Behind the Paper

Why I Tried to Measure How AI Speaks, Not Just What It Says

Most AI evaluations focus on whether answers are correct. My research started from a different question: even when answers are factually similar, do AI systems speak in the same way? Differences in tone and framing led me to design a method to measure discursive variation across models.

Published in Computational Sciences and Philosophy & Religion

Feb 17, 2026

Marco Giacalone

AI Researcher and Adjunct Professor in Computer Science, Founder of MarcoMedia, Libera Università Maria SS. Assunta (LUMSA)

Liked by India Ambler and 2 others

Explore the Research

When we talk about artificial intelligence, we usually focus on correctness. Is the answer right or wrong? Is it hallucinated or accurate? Is it biased or fair?
While working with generative AI in educational and journalistic contexts, I kept noticing something different. Even when answers were factually acceptable, they did not sound the same. The tone changed. The perspective changed. The way responsibility, suffering, conflict, or legitimacy were described also changed.

This raised a simple but uncomfortable question. If two AI systems answer the same question with the same facts but with a different tone and framing, are they really neutral in the same way?

That question became the starting point of my research on the discursive behavior of large language models.

From impression to measurement

At first this was only a qualitative impression. Some models sounded more empathetic. Others more technical. Others more journalistic. Others more normative. But impressions are not enough in research. I needed a method.

The challenge was to move from “this sounds different” to “this difference can be classified, compared, and replicated.”

Instead of evaluating truthfulness or bias labels, I focused on two discursive dimensions:

Tone. How the answer is expressed. Is it cold, descriptive, empathic, technical, balanced, assertive?

Framing. From which interpretive angle the issue is presented. Is it legal, historical, humanitarian, ethical, journalistic?

These are concepts that come from discourse analysis and communication studies, but they are rarely applied in a structured way to AI outputs. I built a coding grid that allows responses to be categorized along these two axes.

The goal was not to prove that models are “good” or “bad,” but to see whether their discursive profiles are systematically different under identical conditions.

The experiment design

I selected five widely used language models and asked them the same ten open ended questions on geopolitical and humanitarian topics. The prompts were written in Italian and covered controversial and value loaded issues. Each model received exactly the same prompts.

Every answer was then coded using the tone and framing grid. The full coding table was published openly so that anyone can verify, reuse, or challenge the classifications.

Methodological transparency was a central design choice. If we want to talk about AI neutrality, our own method must be inspectable.

What surprised me

It is not surprising that models differ. They are trained differently and aligned differently. What was more interesting was that the differences were structured and recurrent at the discursive level.

Some models consistently adopted a journalistic and descriptive stance. Others showed a stronger humanitarian or ethical framing. Some preferred legal institutional reasoning. Others leaned toward empathic language.

Under identical prompts, discursive style was not random noise. It showed model specific tendencies.

This does not mean that a model has an ideology in a human sense. It means that discursive positioning emerges from training data, alignment strategies, and safety tuning. Neutrality, in practice, is not a built in property. It is an outcome that must be examined.

Why this matters beyond research

Many discussions about AI risk focus on hallucinations and factual errors. Those are important. But discursive style also shapes interpretation.

In journalism, tone influences how responsibility and legitimacy are perceived.
In education, framing influences how students understand conflicts and moral dilemmas.
In policy contexts, legal or humanitarian framing can shift how decisions are justified.

An answer that sounds neutral may still guide interpretation in subtle ways.

This suggests that evaluating AI systems should not stop at fact checking. We also need discursive checking.

From research method to classroom tool

One of the most rewarding developments after the study was translating the coding grid into a didactic tool.

I created structured evaluation sheets that students can use to classify AI answers by tone and framing. Instead of passively accepting responses, learners can ask:

What tone is the system using?
Which perspective is being emphasized?
Which dimensions are absent?
Would another framing change the interpretation?

This turns AI from an oracle into an object of critical analysis. It supports digital literacy and critical thinking. Students learn not only to use AI, but to read it.

A reproducible framework

A key contribution of the study is not only the findings but the protocol. I proposed a reproducible framework for discursive auditing of AI systems. It includes prompt design, model selection, coding rules, transparency requirements, and comparative analysis steps.

The framework is intentionally lightweight. It can be adapted across languages, domains, and model families. Researchers, educators, and even newsrooms can reuse it.

All prompts, coding schemes, and aggregated data are publicly available. Reproducibility is not an afterthought. It is part of the method.

Limits and next steps

The study has limits. Each model was queried once per prompt, so it does not capture full stochastic variability. Coding was performed by a single expert coder, which introduces interpretive perspective. Models evolve over time, so discursive profiles may drift.

Future work should include multi coder annotation, repeated sampling, and longitudinal tracking. But even with these limits, the study shows that discursive variation can be measured, not only perceived.

The bigger picture

Generative AI systems are becoming participants in our discursive ecosystem. They help write, summarize, explain, and recommend. They are already shaping how issues are described and understood.

If language shapes perception, then the language of AI matters.

Measuring how AI speaks is not only a technical exercise. It is part of building accountable, transparent, and socially responsible AI systems.

Marco Giacalone

AI Researcher and Adjunct Professor in Computer Science, Founder of MarcoMedia, Libera Università Maria SS. Assunta (LUMSA)

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Philosophy of Artificial Intelligence

Humanities and Social Sciences > Philosophy > Philosophy of Science > Philosophy of Technology > Philosophy of Artificial Intelligence

Discover Artificial Intelligence

Discover Artificial Intelligence

This is a transdisciplinary, international journal that publishes papers on all aspects of the theory, the methodology and the applications of artificial intelligence (AI).

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions

Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.

This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.

This Collection supports and amplifies research related to SDG 4 and SDG 9.

Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics

Publishing Model: Open Access

Deadline: Nov 30, 2026

Explore this Collection

Artificial Intelligence in Medical Imaging

This Topical Collection focuses on artificial intelligence (AI) in medical imaging, which aims to highlight recent advancements in the field of medical imaging analysis using AI and big data. Medical imaging is an essential tool for diagnosis, treatment, and monitoring of various medical conditions. However, analyzing medical images can be time-consuming, costly, and prone to human error. With the emergence of AI, many of these challenges can be addressed by automating tasks involved in medical imaging analysis.

We welcome submissions on various topics related to AI in medical imaging, including, but not limited to, novel AI algorithms and techniques for medical image analysis, the integration of AI into clinical workflows, the development of software packages for medical imaging analysis, and the evaluation of AI methods for clinical use. Additionally, we encourage submissions that explore the ethical and social implications of AI in medical imaging, such as the impact on patient privacy, data security, and clinical decision-making.

Overall, this Topical Collection aims to provide a comprehensive overview of the recent advancements in AI in medical imaging and to promote interdisciplinary research and collaborations between AI researchers, medical imaging experts, and clinicians.

Keywords: Clinical Decision Support System; Computer-Aided Diagnosis; Computer Vision; Deep Learning; Diagnostic Imaging; Image Classification; Image Processing; Image Segmentation; Object Detection; Precision Medicine; Radiomics

Publishing Model: Open Access

Deadline: Aug 10, 2026

Explore this Collection

Latest Content

Opportunities

Call for papers: Medical Ultrasound: Emerging Techniques and Applications

Fighting Darkness Under the Sun (9-1)

Behind the Paper

Anion–Diluent Decoupled Solvation Chemistry in Ionic Liquid‑Based Localized High‑Concentration Electrolytes Toward High‑Voltage Lithium Metal Batteries

What Politicians Think Citizens Think About Climate Action: The Story Behind Our Study

Behind the Paper

Global, regional, and national sex differences in burden of tuberculosis

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Why I Tried to Measure How AI Speaks, Not Just What It Says

Share this post

Share with...

...or copy the link