T3 Talk2Text - A model for near real-time voice transcription in virtual group meetings

Group projects thrive on communication, but how can students revisit discussions effortlessly? We present T3 Talk2Text, an open-source tool for real-time meeting transcription, enabling reflection and collaboration analysis. Discover the tech behind it!
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

SpringerLink
SpringerLink SpringerLink

T3 Talk2Text – A model for near real-time voice transcription in virtual group meetings - Discover Education

Group projects are an important part of many educational programs. In these projects, groupware tools like shared workspaces, shared editors, and synchronous or asynchronous communication tools (video conferencing, chat, email, forum) are often utilized. For synchronous collaboration, video conferencing systems are often used since they allow for direct and effortless informal communication via speech and video. If the group is to reflect about the content and way of communication and collaboration within and across meetings, the group needs access to the development of artifacts as well as the conversation of the meetings. While some video conferencing systems allow recording of a meeting, accessing content and dialogue structure requires real-time viewing and detailed note-taking. To reduce this effort, a video chat should include a transcription functionality that is able to support students in an online group work session by providing a video chat with automatic creation of a transcript. For this purpose, a video communication tool named T3 was developed that enables communication between learners and generates a transcript through an integrated automatic speech recognition (ASR) system. The transcript can be used to reflect on the conversations in order to recall the topics discussed or to identify group work problems, such as insufficient participation, coordination and collaboration problems. The implementation of T3 uses voice activation detection, WebRTC and ASR models to maintain a high level of quality in the transcription process. Initial functionality tests demonstrated the ability of T3 to create accurate group discussion transcripts, making it attractive for students and teachers to assess and improve communication and collaboration, and for researchers studying group discussions.

The Challenge: Capturing Group Discussions for Reflection

Group projects are a cornerstone of modern education, but remote collaboration introduces challenges like unequal participation, miscommunication, and the difficulty of recalling past discussions. While video conferencing tools facilitate real-time interaction, they lack built-in support for documenting conversations. Manual note-taking is cumbersome, and post-meeting transcriptions from recordings are time-consuming.

We asked: How can we provide students and educators with an effortless way to capture and reflect on group discussions in near real-time?

Introducing T3 Talk2Text

t3 - logo

Our solution, T3 Talk2Text, is an open-source web application that integrates:

  • WebRTC-based video conferencing (peer-to-peer, no expensive licenses)
  • Automatic Speech Recognition (ASR) and Real-time transcription via OpenAI’s Whisper model
  • On-demand summaries (using Llama3 for AI-generated insights)

Unlike commercial tools (e.g., Microsoft Teams), T3 prioritizes privacy (self-hosted), multilingual support, and accessibility (works on any device with a browser).

 User interface of T3 - T3 in use                      

Key Innovations

  1. Voice Activity Detection (VAD) Pipeline

    • Filters background noise and segments speech for accurate ASR input.

    • Achieves 8.04% Word Error Rate (WER) in German—comparable to human transcribers.

  2. Dynamic Transcript Formats
    Users can download transcripts as:

    • PDFs (messenger-style, with speaker alignment)

    • CSVs (structured for analysis)

    • AI summaries (condensed key points)

      Possible dialog protocol output formats

  3. Scalable Architecture

    • Lightweight SQLite storage for transcripts.

    • Peer-to-peer media streaming reduces server load.

Behind the Scenes: Overcoming Technical Hurdles

Challenge 1: Real-Time Processing
WebRTC’s low-latency streams were ideal for video chat but required careful buffering to feed Whisper ASR without delays. Our VAD component ensured only speech segments were processed, optimizing resource use.

Challenge 2: Multilingual Support
Whisper’s multilingual capabilities let T3 adapt to diverse classrooms, though future work will explore fine-tuning for non-native accents.

Challenge 3: Privacy-First Design
All data stays on institutional servers, and temporary audio files are deleted post-transcription, which is critical for GDPR compliance.

Impact and Future Directions

Initial tests with student groups showed T3 seamlessly integrated into discussions without disrupting collaboration. Educators highlighted its potential for:

  • Identifying participation gaps (via speaker-labeled transcripts).

  • Conflict resolution (revisiting past dialogue).

  • Research (analyzing communication patterns across courses).

Next steps include:

  • Deploying Whisper Large Turbo for faster, even more accurate transcriptions.

  • Longitudinal studies in university courses to measure learning outcomes.

Try It Out!

T3 is open-source and available for institutions to adapt. We welcome collaborations to explore its use in classrooms and beyond.

Read the full paper: The Paper
Code repository: Github
Contact: Thomas.Kasakowskij@fernuni-hagen.de

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Speech and Audio Processing
Technology and Engineering > Electrical and Electronic Engineering > Signal, Speech and Image Processing > Speech and Audio Processing
eLearning
Humanities and Social Sciences > Education > Media Education > Digital Education and Educational Technology > eLearning
Groupwork and Presentation
Humanities and Social Sciences > Education > Skills > Groupwork and Presentation
Higher Education
Humanities and Social Sciences > Education > Higher Education
User Interfaces and Human Computer Interaction
Mathematics and Computing > Computer Science > Computer and Information Systems Applications > User Interfaces and Human Computer Interaction
Natural Language Processing (NLP)
Mathematics and Computing > Computer Science > Artificial Intelligence > Natural Language Processing (NLP)

Related Collections

With Collections, you can get published faster and increase your visibility.

AI-Driven Innovations: Bridging Educational Transformation and Workforce Development in the Age of Talent Management

This Topical Collection explores the intersection of Artificial Intelligence (AI) and education, focusing on how AI-driven technologies are reshaping both academic environments and human resource (HR) practices. With rapid advancements in AI, educational institutions and organizations are finding new ways to enhance learning experiences, personalize student engagement, and foster skills critical to the future workforce. As AI continues to disrupt traditional models, this collection invites scholars to examine the opportunities and challenges that arise when AI-driven tools are applied in education and HR.

Key areas of focus include the use of AI for personalized learning, predictive analytics in student performance, AI's role in talent acquisition and development, and how AI is reshaping leadership and management strategies. This collection aims to bring together research that bridges education and HR to explore innovative approaches to workforce readiness, lifelong learning, and human capital development in an AI-driven world.

Contributors are encouraged to submit work that addresses AI’s potential to transform both educational and organizational landscapes, ultimately shaping the future of work and learning.

Keywords: Artificial Intelligence; Education; Human Resource Management; Talent Development; Personalized Learning; Predictive Analytics; Workforce Development; Leadership; Organizational Innovation

Publishing Model: Open Access

Deadline: Dec 31, 2025

Innovative Curriculum and Psychological Well-Being in Education: Bridging Pedagogy, Leadership, and Technology

This Collection explores the intersection of curriculum development, mental health, educational leadership, and technology in shaping future-ready education systems. It welcomes interdisciplinary research that examines how innovative curriculum designs, critical pedagogy, and psychological well-being influence student learning, teacher effectiveness, and institutional success. Contributions may include empirical studies, theoretical perspectives, and methodological advancements in areas such as digital learning, mental health in education, leadership in educational settings, and transformative pedagogies. Special attention is given to research that aligns with the Sustainable Development Goals (SDGs), particularly in promoting inclusive, equitable, and quality education for all.

This Collection supports and amplifies research related to SDG 4

Keywords: Access to Education; Curriculum development; Educational Psychology; Mental Health in Education; Critical Pedagogy

Publishing Model: Open Access

Deadline: Jan 14, 2026