DOC 234—34/2
Motivation:
In today’s globalized society, second language acquisition (SLA) is essential not only for academic or career advancement but also for integration, cross-cultural understanding, and migration. Mastering a second language enables communication across cultural boundaries and supports participation in diverse social contexts.
SLA is a complex cognitive process, with speaking often the most challenging skill. Unlike reading or writing, speaking demands real-time production, immediate comprehension, and dynamic interaction.
To address these challenges, we present ConversAR, a Mixed Reality system that combines Generative AI, embodied agents, scene recognition, and generative 3D props to situate group conversations in the learner’s physical environment. Informed by language acquisition experts, ConversAR aims to foster engagement, reduce speaking anxiety, and expand opportunities for authentic group dialogue.
System Diagram
System:
Based on our formative study, we identified five design goals for ConversAR. First, the system should foster confidence by enabling safe, low-pressure group conversations with multiple NPCs that feel supportive rather than evaluative. Second, it should provide corrective feedback in encouraging ways—such as recasts or implicit reformulations—that help learners self-adjust without embarrassment. Third, conversations should be realistic and contextualized by incorporating objects from the learner’s physical environment, promoting situated learning.
To sustain dialogue and engagement, ConversAR dynamically generates virtual props aligned with learners’ interests and conversation topics, serving as concrete anchors for interaction (DG4). Finally, the system adapts in real time to each learner’s proficiency level, adjusting vocabulary, sentence structure, and strategies like circumlocution to keep conversations accessible yet challenging, supporting gradual skill development. Together, these design goals position ConversAR as a tool for confidence-building, immersive, and adaptive language practice.
Conversation Example and Flow
Approach:
We conducted two complementary studies to inform and evaluate ConversAR. First, we interviewed 10 SLA educators recruited via outreach to institutions in the U.S. and Mexico and social media. Participants had 3–35 years of experience teaching Spanish, English, Chinese, Italian, Korean, and French across CEFR levels A1–C2 (Table 1). In 45–60 minute semi-structured Zoom interviews, they discussed speaking challenges, digital tool limitations, and instructional strategies, reviewed an early ConversAR prototype, and brainstormed potential features. Interviews were transcribed and thematically analyzed by three authors, resulting in four emergent themes.
We then conducted a controlled lab study with 21 language learners (8 male, 13 female; 11 Hispanic/Latinx) to evaluate usability, support, and usefulness. Participants, recruited through language schools and university social media, were at least 18, actively studying a second language, and had basic conversational proficiency. The study was IRB-approved and addressed three questions:
(1) Can learners engage in group conversations using ConversAR?
(2) How useful is it for practicing with physical and virtual objects?
(3) What challenges arise in group practice?
The study involved two tasks. In Task 1, “Getting to Know You,” a single NPC assessed participants’ proficiency using CEFR criteria and collected personal interests. In Task 2, learners engaged in a 20-minute group conversation with two NPCs, where topics adapted to their interests, proficiency, and detected real-world objects. ConversAR also generated 3D props to anchor discussion and enrich interaction