Behind the Paper

Can generative AI really improve language learning? What 51 studies tell us

Generative AI is rapidly entering language classrooms worldwide. But does it actually help students learn? Drawing on evidence from 51 studies, our meta-analysis reveals when, how, and for whom AI tools make the biggest difference.

Published in Computational Sciences and Education

Feb 25, 2026

Mirka Saarela, Prabha M. Kumarage & Sachini Gunasekara

3 contributors

Can generative AI really improve language learning? What 51 studies tell us

Liked by Yijia Li and 3 others

Explore the Research

Generative Artificial Intelligence (GenAI) tools have rapidly entered language classrooms, study groups, and self-study routines around the world. Students use them to practice writing, simulate conversations, receive instant feedback, and even brainstorm ideas in a new language. Teachers experiment with them as tutors, feedback providers, and lesson design assistants.

But amid the excitement, a crucial question remains: Do these tools actually improve language learning?

In our recent meta-analysis, we set out to answer that question systematically. Rather than focusing on one classroom or one tool, we synthesized evidence from 51 empirical studies, representing 175 independent effect sizes, to evaluate the overall impact of GenAI on second and foreign language learning.

Our goal was simple: move beyond anecdotes and isolated case studies, and provide a clear, evidence-based picture of what GenAI is doing in language education.

What we found: Large and meaningful effects

Across studies, we found that GenAI tools produce large, statistically significant positive effects on language learning outcomes.

These benefits were visible in two broad areas:

Language proficiency outcomes, such as writing quality, vocabulary acquisition, speaking performance, and grammar.
Affective–cognitive outcomes, including learner confidence, reduced anxiety, and self-regulated learning.

In other words, GenAI tools appear to help learners not only perform better, but also feel more capable and engaged.

However, the story does not end there. The effects were not identical across contexts. One of the strengths of a meta-analysis is that it allows us to explore why results differ from one study to another.

When does GenAI work best? The role of context

We examined several “moderator” variables—factors that might influence how effective GenAI tools are. Four patterns stood out.

1. Informal settings show especially strong effects

GenAI tools were particularly powerful in informal learning environments, such as self-directed study outside the classroom. In these contexts, learners often use AI as a conversational partner, writing assistant, or on-demand tutor.

This finding suggests that GenAI may be especially effective when learners have autonomy and can integrate it flexibly into their own routines.

2. Productive skills benefit more than receptive skills

The strongest gains were observed in productive skills, such as writing and speaking. This makes sense: GenAI excels at generating language, modeling responses, and providing feedback. These features naturally support output-focused practice.

3. Less commonly taught languages may benefit disproportionately

Interestingly, we observed stronger effects in studies involving less commonly taught languages. In many traditional contexts, learners of these languages have limited access to conversation partners or learning materials. GenAI tools may help close that gap by providing scalable, always-available interaction.

4. Intervention duration and learner characteristics matter

Not all interventions were equally long, and not all learners were the same. The impact of GenAI varied depending on how long it was used, who the learners were, and how the intervention was structured. This highlights that technology alone is not a magic solution—design and context still matter enormously.

Why might GenAI be so effective?

From a theoretical perspective, our findings align with sociocultural and interactionist theories of language learning. These frameworks emphasize that language develops through meaningful interaction, feedback, and scaffolded practice.

GenAI tools offer several affordances that support this process:

Immediate, adaptive feedback
Interactive dialogue simulation
Low-pressure practice environments
Opportunities for repeated output and revision

For many learners, practicing with AI can reduce anxiety compared to speaking in front of peers. This reduction in anxiety may partially explain the positive affective outcomes we observed.

At the same time, it is important to note that most existing studies focus on relatively short-term interventions. We still know less about the long-term developmental impact of sustained GenAI use.

What surprised us

As researchers, one striking pattern was the heavy concentration of studies on writing and short experimental designs. While the results are encouraging, the field is still developing.

Many studies rely on:

Short interventions
Self-reported measures
Single skill focus (often writing)

This means that although the overall effects are strong, we must interpret them carefully. The field would benefit from:

Longitudinal research
More diverse language skills (especially speaking and listening)
More rigorous experimental designs
Research in varied cultural and educational contexts

What does this mean for teachers and learners?

Our results suggest that GenAI tools can be powerful complements to language instruction. They are not replacements for teachers, but they can expand opportunities for practice, feedback, and engagement.

For teachers, this means:

Thoughtful integration matters more than mere adoption.
Design choices (how long, how structured, for which skill) shape outcomes.
Informal and autonomous uses can be particularly fruitful.

For learners, it suggests that using GenAI strategically (for example, drafting and revising writing, practicing conversations, or exploring vocabulary in context) can support both skill development and confidence.

Where do we go from here?

The rapid rise of GenAI in education presents both opportunity and responsibility. The technology is evolving quickly, often faster than research can keep up.

Our meta-analysis provides encouraging evidence that GenAI can meaningfully support language proficiency and affective development. Yet it also highlights the need for:

More longitudinal evidence
Greater methodological rigor
Clear pedagogical frameworks for responsible use

As researchers, our hope is not only to measure impact but to contribute to a deeper understanding of how and why GenAI works in language learning. The next phase of research should move beyond “Does it work?” to “Under what conditions does it work best—and for whom?”

Language learning has always been deeply human. What GenAI appears to offer is not a replacement for that humanity, but a new interactive space where learners can experiment, make mistakes, receive feedback, and grow.

The challenge now is to ensure that this powerful tool is used thoughtfully, equitably, and in ways that truly support learners around the world.

Reference:

Saarela, M., Gunasekara, S. & Kumarage, P. A meta-analysis of generative AI effects on language proficiency and affective–cognitive outcomes in language learning. Discov Computing 29, 116 (2026). https://doi.org/10.1007/s10791-026-10015-1

Multiple Contributors

Mirka Saarela, Prabha M. Kumarage & Sachini Gunasekara

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Study and Learning Skills

Humanities and Social Sciences > Education > Skills > Study and Learning Skills

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Discover Computing

Discover Computing

Previously Information Retrieval Journal. Discover Computing is an open access journal publishing research from all fields relevant to computer science.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Interoperability in Data and Security

In an era where digital transformation is rapidly reshaping industries and systems, the seamless integration and interaction of diverse technologies have become crucial. Data and security interoperability—ensuring that different systems can exchange and process data effectively while maintaining robust security measures—is at the forefront of these challenges. As organizations increasingly operate within complex ecosystems characterized by a variety of technologies, platforms, and data formats, achieving interoperability is crucial not only for operational efficiency but also for safeguarding sensitive information. This topical collection seeks to explore and elucidate the techniques and methodologies for enhancing data and security interoperability across different domains. We invite contributions encompassing theoretical research, experimental studies, comprehensive reviews, and survey papers. Areas of primary interest include, but are not limited to:

- Theoretical and practical approaches to data/security interoperability

- Designing interoperability with security and privacy requirements

- Solutions to compatibility issues across various data formats and standards

- Data interoperability in cloud computing and distributed systems

- Security and data interoperability in Internet of Things (IoT) environments

- Managing data interoperability in compliance with privacy regulations

- New technological approaches and frameworks for interoperability

- Data interoperability and security policy regulation and standardization

This Collection supports and amplifies research related to SDG 9 and SDG 11.

Keywords: interoperability, data interoperability, security interoperability, network interoperability, platform interoperability artificial intelligence, Internet of Things, standards

Publishing Model: Open Access

Deadline: Jun 30, 2026

Explore this Collection

Intelligent Medicine: Machine Learning and Explainable AI for Next-Generation Healthcare

The healthcare sector is undergoing a profound digital transformation driven by Machine Learning (ML) and Artificial Intelligence (AI). As these technologies increasingly support diagnosis, prognosis, and clinical decision-making, the challenge is to balance predictive performance with interpretability, fairness, and trust. This Collection invites high-quality research that advances ML theory, methods, and applications specifically designed for clinical, epidemiological, and public-health contexts.

A central emphasis of the Collection is explainability as both a transparency requirement and an educational aid: model explanations that support clinicians in understanding complex patient dynamics, uncovering novel relationships, and enhancing causal reasoning. Contributions that integrate structured electronic health records with imaging, signals, or clinical text, as well as studies addressing fairness, uncertainty quantification, and human-centered design, are particularly encouraged. Likewise, approaches that enable federated, privacy-preserving, and regulation-compliant collaboration across healthcare institutions are welcome.

Topics of Interest

- Predictive Modeling for Diagnosis and Prognosis: Advanced ML architectures for risk stratification, early detection, treatment-response prediction, postoperative outcome modeling, and survival analysis.

- Comorbidity Analysis and Longitudinal Patient Trajectories: Representation learning and temporal modeling for disease interactions, multimorbidity networks, state-transition modeling, and dynamic patient phenotyping based on multivariate or multimodal time-series data.

- Multimodal Data Integration: Techniques merging structured EHRs with imaging (MRI, CT, X-ray), physiological signals (ECG, EEG, wearable data), genomics, and clinical narratives through attention mechanisms, graph-based learning, transformers, and foundation-model adaptation.

- Federated, Distributed, and Privacy-Preserving Learning: Federated optimization, secure aggregation, differential privacy, and decentralized architectures enabling cross-institutional collaboration while safeguarding patient confidentiality and ensuring regulatory compliance.

- Fairness, Causality, Robustness, and Trustworthy ML: Approaches addressing algorithmic bias, causal inference and counterfactual reasoning, calibration and uncertainty quantification, out-of-distribution robustness, and explainability techniques designed for clinical auditability.

- Ethical, Educational, and Human-Centered AI: Interpretable ML systems that enhance clinical training, support explainable decision pathways, improve AI literacy, and facilitate responsible deployment of AI-enabled healthcare tools.

- Human–Robot Interaction and Intelligent Interfaces in Healthcare: Adaptive clinical interfaces, affective computing for patient engagement, assistive robotics, and cognitive-support systems for medical staff and learners.

We warmly welcome submissions that advance explainable and trustworthy AI in healthcare, with a focus on methodological innovation and clinically relevant applications. To keep the Collection aligned with this focus, studies primarily centered on sentiment analysis or opinion mining of AI adoption fall outside the intended scope.

This Collection supports and amplifies research related to SDG 9.

Keywords: Machine Learning; Explainable AI; Healthcare; Comorbidity; Multimodal Learning; Time Series; Federated Learning; Causal Inference; Trustworthy AI; Medical Education; HCI for Personal Healthcare Assistant

Publishing Model: Open Access

Deadline: Oct 05, 2026

Explore this Collection

Latest Content

Behind the Paper, News and Opinion, Empower Your Research

Survive, Adapt & Return—that’s how cancer wins!

Mapping dental stem-cell research: looking beyond publication counts

Quota-Centric Student Revolt 2024 in Bangladesh: Toppling the Hasina Regime and Revitalizing Democracy and Human Rights

The Multiplicative Singularity: A Unified Non-Newtonian Theory of Mass-Gap and Fluid Regularity

News and Opinion, Opportunities, Empower Your Research

The Efficiency Trap: Why All-Polymer Solar Cells Are Quietly Beating Perovskites to Market

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Can generative AI really improve language learning? What 51 studies tell us

Share this post

Share with...

...or copy the link