Generative Artificial Intelligence (GenAI) tools have rapidly entered language classrooms, study groups, and self-study routines around the world. Students use them to practice writing, simulate conversations, receive instant feedback, and even brainstorm ideas in a new language. Teachers experiment with them as tutors, feedback providers, and lesson design assistants.
But amid the excitement, a crucial question remains: Do these tools actually improve language learning?
In our recent meta-analysis, we set out to answer that question systematically. Rather than focusing on one classroom or one tool, we synthesized evidence from 51 empirical studies, representing 175 independent effect sizes, to evaluate the overall impact of GenAI on second and foreign language learning.
Our goal was simple: move beyond anecdotes and isolated case studies, and provide a clear, evidence-based picture of what GenAI is doing in language education.
What we found: Large and meaningful effects
Across studies, we found that GenAI tools produce large, statistically significant positive effects on language learning outcomes.
These benefits were visible in two broad areas:
- Language proficiency outcomes, such as writing quality, vocabulary acquisition, speaking performance, and grammar.
- Affective–cognitive outcomes, including learner confidence, reduced anxiety, and self-regulated learning.
In other words, GenAI tools appear to help learners not only perform better, but also feel more capable and engaged.
However, the story does not end there. The effects were not identical across contexts. One of the strengths of a meta-analysis is that it allows us to explore why results differ from one study to another.
When does GenAI work best? The role of context
We examined several “moderator” variables—factors that might influence how effective GenAI tools are. Four patterns stood out.
1. Informal settings show especially strong effects
GenAI tools were particularly powerful in informal learning environments, such as self-directed study outside the classroom. In these contexts, learners often use AI as a conversational partner, writing assistant, or on-demand tutor.
This finding suggests that GenAI may be especially effective when learners have autonomy and can integrate it flexibly into their own routines.
2. Productive skills benefit more than receptive skills
The strongest gains were observed in productive skills, such as writing and speaking. This makes sense: GenAI excels at generating language, modeling responses, and providing feedback. These features naturally support output-focused practice.
3. Less commonly taught languages may benefit disproportionately
Interestingly, we observed stronger effects in studies involving less commonly taught languages. In many traditional contexts, learners of these languages have limited access to conversation partners or learning materials. GenAI tools may help close that gap by providing scalable, always-available interaction.
4. Intervention duration and learner characteristics matter
Not all interventions were equally long, and not all learners were the same. The impact of GenAI varied depending on how long it was used, who the learners were, and how the intervention was structured. This highlights that technology alone is not a magic solution—design and context still matter enormously.
Why might GenAI be so effective?
From a theoretical perspective, our findings align with sociocultural and interactionist theories of language learning. These frameworks emphasize that language develops through meaningful interaction, feedback, and scaffolded practice.
GenAI tools offer several affordances that support this process:
- Immediate, adaptive feedback
- Interactive dialogue simulation
- Low-pressure practice environments
- Opportunities for repeated output and revision
For many learners, practicing with AI can reduce anxiety compared to speaking in front of peers. This reduction in anxiety may partially explain the positive affective outcomes we observed.
At the same time, it is important to note that most existing studies focus on relatively short-term interventions. We still know less about the long-term developmental impact of sustained GenAI use.
What surprised us
As researchers, one striking pattern was the heavy concentration of studies on writing and short experimental designs. While the results are encouraging, the field is still developing.
Many studies rely on:
- Short interventions
- Self-reported measures
- Single skill focus (often writing)
This means that although the overall effects are strong, we must interpret them carefully. The field would benefit from:
- Longitudinal research
- More diverse language skills (especially speaking and listening)
- More rigorous experimental designs
- Research in varied cultural and educational contexts
What does this mean for teachers and learners?
Our results suggest that GenAI tools can be powerful complements to language instruction. They are not replacements for teachers, but they can expand opportunities for practice, feedback, and engagement.
For teachers, this means:
- Thoughtful integration matters more than mere adoption.
- Design choices (how long, how structured, for which skill) shape outcomes.
- Informal and autonomous uses can be particularly fruitful.
For learners, it suggests that using GenAI strategically (for example, drafting and revising writing, practicing conversations, or exploring vocabulary in context) can support both skill development and confidence.
Where do we go from here?
The rapid rise of GenAI in education presents both opportunity and responsibility. The technology is evolving quickly, often faster than research can keep up.
Our meta-analysis provides encouraging evidence that GenAI can meaningfully support language proficiency and affective development. Yet it also highlights the need for:
- More longitudinal evidence
- Greater methodological rigor
- Clear pedagogical frameworks for responsible use
As researchers, our hope is not only to measure impact but to contribute to a deeper understanding of how and why GenAI works in language learning. The next phase of research should move beyond “Does it work?” to “Under what conditions does it work best—and for whom?”
Language learning has always been deeply human. What GenAI appears to offer is not a replacement for that humanity, but a new interactive space where learners can experiment, make mistakes, receive feedback, and grow.
The challenge now is to ensure that this powerful tool is used thoughtfully, equitably, and in ways that truly support learners around the world.
Reference:
Saarela, M., Gunasekara, S. & Kumarage, P. A meta-analysis of generative AI effects on language proficiency and affective–cognitive outcomes in language learning. Discov Computing 29, 116 (2026). https://doi.org/10.1007/s10791-026-10015-1