The Challenge: Capturing Group Discussions for Reflection
Group projects are a cornerstone of modern education, but remote collaboration introduces challenges like unequal participation, miscommunication, and the difficulty of recalling past discussions. While video conferencing tools facilitate real-time interaction, they lack built-in support for documenting conversations. Manual note-taking is cumbersome, and post-meeting transcriptions from recordings are time-consuming.
We asked: How can we provide students and educators with an effortless way to capture and reflect on group discussions in near real-time?
Introducing T3 Talk2Text
Our solution, T3 Talk2Text, is an open-source web application that integrates:
- WebRTC-based video conferencing (peer-to-peer, no expensive licenses)
- Automatic Speech Recognition (ASR) and Real-time transcription via OpenAI’s Whisper model
- On-demand summaries (using Llama3 for AI-generated insights)
Unlike commercial tools (e.g., Microsoft Teams), T3 prioritizes privacy (self-hosted), multilingual support, and accessibility (works on any device with a browser).
Key Innovations
-
Voice Activity Detection (VAD) Pipeline
-
Filters background noise and segments speech for accurate ASR input.
-
Achieves 8.04% Word Error Rate (WER) in German—comparable to human transcribers.
-
-
Dynamic Transcript Formats
Users can download transcripts as:-
PDFs (messenger-style, with speaker alignment)
-
CSVs (structured for analysis)
-
AI summaries (condensed key points)
-
-
Scalable Architecture
-
Lightweight SQLite storage for transcripts.
-
Peer-to-peer media streaming reduces server load.
-
Behind the Scenes: Overcoming Technical Hurdles
Challenge 1: Real-Time Processing
WebRTC’s low-latency streams were ideal for video chat but required careful buffering to feed Whisper ASR without delays. Our VAD component ensured only speech segments were processed, optimizing resource use.
Challenge 2: Multilingual Support
Whisper’s multilingual capabilities let T3 adapt to diverse classrooms, though future work will explore fine-tuning for non-native accents.
Challenge 3: Privacy-First Design
All data stays on institutional servers, and temporary audio files are deleted post-transcription, which is critical for GDPR compliance.
Impact and Future Directions
Initial tests with student groups showed T3 seamlessly integrated into discussions without disrupting collaboration. Educators highlighted its potential for:
-
Identifying participation gaps (via speaker-labeled transcripts).
-
Conflict resolution (revisiting past dialogue).
-
Research (analyzing communication patterns across courses).
Next steps include:
-
Deploying Whisper Large Turbo for faster, even more accurate transcriptions.
-
Longitudinal studies in university courses to measure learning outcomes.
Try It Out!
T3 is open-source and available for institutions to adapt. We welcome collaborations to explore its use in classrooms and beyond.
Read the full paper: The Paper
Code repository: Github
Contact: Thomas.Kasakowskij@fernuni-hagen.de