Why I Tried to Measure How AI Speaks, Not Just What It Says
Published in Computational Sciences and Philosophy & Religion
When we talk about artificial intelligence, we usually focus on correctness. Is the answer right or wrong? Is it hallucinated or accurate? Is it biased or fair?
While working with generative AI in educational and journalistic contexts, I kept noticing something different. Even when answers were factually acceptable, they did not sound the same. The tone changed. The perspective changed. The way responsibility, suffering, conflict, or legitimacy were described also changed.
This raised a simple but uncomfortable question. If two AI systems answer the same question with the same facts but with a different tone and framing, are they really neutral in the same way?
That question became the starting point of my research on the discursive behavior of large language models.
From impression to measurement
At first this was only a qualitative impression. Some models sounded more empathetic. Others more technical. Others more journalistic. Others more normative. But impressions are not enough in research. I needed a method.
The challenge was to move from “this sounds different” to “this difference can be classified, compared, and replicated.”
Instead of evaluating truthfulness or bias labels, I focused on two discursive dimensions:
Tone. How the answer is expressed. Is it cold, descriptive, empathic, technical, balanced, assertive?
Framing. From which interpretive angle the issue is presented. Is it legal, historical, humanitarian, ethical, journalistic?
These are concepts that come from discourse analysis and communication studies, but they are rarely applied in a structured way to AI outputs. I built a coding grid that allows responses to be categorized along these two axes.
The goal was not to prove that models are “good” or “bad,” but to see whether their discursive profiles are systematically different under identical conditions.
The experiment design
I selected five widely used language models and asked them the same ten open ended questions on geopolitical and humanitarian topics. The prompts were written in Italian and covered controversial and value loaded issues. Each model received exactly the same prompts.
Every answer was then coded using the tone and framing grid. The full coding table was published openly so that anyone can verify, reuse, or challenge the classifications.
Methodological transparency was a central design choice. If we want to talk about AI neutrality, our own method must be inspectable.
What surprised me
It is not surprising that models differ. They are trained differently and aligned differently. What was more interesting was that the differences were structured and recurrent at the discursive level.
Some models consistently adopted a journalistic and descriptive stance. Others showed a stronger humanitarian or ethical framing. Some preferred legal institutional reasoning. Others leaned toward empathic language.
Under identical prompts, discursive style was not random noise. It showed model specific tendencies.
This does not mean that a model has an ideology in a human sense. It means that discursive positioning emerges from training data, alignment strategies, and safety tuning. Neutrality, in practice, is not a built in property. It is an outcome that must be examined.
Why this matters beyond research
Many discussions about AI risk focus on hallucinations and factual errors. Those are important. But discursive style also shapes interpretation.
In journalism, tone influences how responsibility and legitimacy are perceived.
In education, framing influences how students understand conflicts and moral dilemmas.
In policy contexts, legal or humanitarian framing can shift how decisions are justified.
An answer that sounds neutral may still guide interpretation in subtle ways.
This suggests that evaluating AI systems should not stop at fact checking. We also need discursive checking.
From research method to classroom tool
One of the most rewarding developments after the study was translating the coding grid into a didactic tool.
I created structured evaluation sheets that students can use to classify AI answers by tone and framing. Instead of passively accepting responses, learners can ask:
What tone is the system using?
Which perspective is being emphasized?
Which dimensions are absent?
Would another framing change the interpretation?
This turns AI from an oracle into an object of critical analysis. It supports digital literacy and critical thinking. Students learn not only to use AI, but to read it.
A reproducible framework
A key contribution of the study is not only the findings but the protocol. I proposed a reproducible framework for discursive auditing of AI systems. It includes prompt design, model selection, coding rules, transparency requirements, and comparative analysis steps.
The framework is intentionally lightweight. It can be adapted across languages, domains, and model families. Researchers, educators, and even newsrooms can reuse it.
All prompts, coding schemes, and aggregated data are publicly available. Reproducibility is not an afterthought. It is part of the method.
Limits and next steps
The study has limits. Each model was queried once per prompt, so it does not capture full stochastic variability. Coding was performed by a single expert coder, which introduces interpretive perspective. Models evolve over time, so discursive profiles may drift.
Future work should include multi coder annotation, repeated sampling, and longitudinal tracking. But even with these limits, the study shows that discursive variation can be measured, not only perceived.
The bigger picture
Generative AI systems are becoming participants in our discursive ecosystem. They help write, summarize, explain, and recommend. They are already shaping how issues are described and understood.
If language shapes perception, then the language of AI matters.
Measuring how AI speaks is not only a technical exercise. It is part of building accountable, transparent, and socially responsible AI systems.
Follow the Topic
-
Discover Artificial Intelligence
This is a transdisciplinary, international journal that publishes papers on all aspects of the theory, the methodology and the applications of artificial intelligence (AI).
Related Collections
With Collections, you can get published faster and increase your visibility.
Transforming Education through Artificial Intelligence: Opportunities, Challenges, and Future Directions
Artificial Intelligence (AI) is rapidly changing the educational field by enabling personalized learning, intelligent tutoring systems, automated assessments, learning analytics, and administrative automation.
This collection invites original research, systematic reviews, and visionary perspectives on the transformative impact of AI in education. It aims to explore how AI technologies can enhance equity, inclusion, and efficiency in educational settings across different contexts, including higher education, K-12, vocational training, and lifelong learning. This collection will address technical, pedagogical, ethical, and policy aspects, fostering interdisciplinary perspectives and evidence-based insights.
This Collection supports and amplifies research related to SDG 4 and SDG 9.
Keywords: Artificial Intelligence, AI in Education, Educational Technology, Data Analytics, AI Ethics
Publishing Model: Open Access
Deadline: May 31, 2026
Explainable and Interpretable AI in Business and Society
Overview
Over the past years we have been observing a growing level of complexity and sophistication in machine learning algorithms and artificial intelligence technologies. This creates opportunities and challenges for business and society.
One of the main problems in ML and AI is that most this technology operates within black boxes, and this makes difficult to interpret and explain how certain decisions are taken or why a specific outcome has been obtained. AI systems are becoming a huge support to decision making by providing predictions, diagnosis and recommendations. In such complex and uncertain context, explainability becomes key to decision makers to think in a sustainable and transparent way.
In many application areas, for instance in healthcare or medicine, these aspects are so crucial that can even affect the potential adoption of such technologies and question their relevance.
Benefits in adopting explainable and interpretable ML and AI systems include ethics, regulation, and the implementation of good practices. This topical collection focuses on explainable and interpretable AI in a broad sense. Some of the topics will cover, include, but not limited to black box algorithms, algorithm bias, ethics, transparency, knowledge and rule-based systems, intelligent agents, argumentation systems and models, etc.
Keywords: Artificial Intelligence; Machine Learning; Deep Learning; Deep Neural Networks; Explainable AI (XAI), Decision support systems, Interpretable AI; Business; Society.
Publishing Model: Open Access
Deadline: Jun 30, 2026
Recommended Content
Why is Singapore Identified in Global Research as Number One? How Physical Activity and Education Excellence Created a Global Leader
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in