Press "Enter" to skip to content

CMU Study: Audio Enhances AI’s Engagement and Human Touch

Imagine interacting with an AI assistant that feels as if it is physically present in the room with you, even though it exists only through sound. This is the innovative approach being explored by researchers at Carnegie Mellon University, aiming to transform how we engage with audio-only AI systems in smart glasses and other screen-free technologies.

Creating Audio-Only Interfaces

The team from CMU’s School of Computer Science(opens in new window), alongside experts from the Department of Psychology(opens in new window), is focused on developing an interface that utilizes audio cues alone to create the illusion of a chatbot’s physical presence. By collaborating with various universities, they aim to engage users more fully through sound.



David Lindlbauer

Assistant Professor David Lindlbauer from the Human-Computer Interaction Institute (HCII) questions, “The question becomes, ‘If I had an AI assistant, what would happen if I made the audio component more like an actual human?'” The findings were indeed surprising.

Utilizing Sound for Immersive Experiences

Humans predominantly use vision to interact with their surroundings, which has led to considerable research into visual interfaces like avatars or robots, explains HCII Ph.D. student Yi Fei Cheng(opens in new window). However, in scenarios where visual equipment is not feasible, audio-only interfaces become crucial, such as with smart glasses that have microphones and cameras but lack displays.

The researchers employed spatialization and Foley effects to mimic the presence of an AI in a room. Spatialization involves placing the AI’s voice in the room so it seems to move and perform tasks. Foley effects, known from film and TV, added sounds like typing, paper rustling, and pouring water to enhance realism.



Laurie Heller

Laurie Heller

Psychology Professor Laurie Heller(opens in new window) notes, “When a movie star sits down on a bar stool in his leather jacket, you expect a leathery rustle and a squeaky bar stool and the sounds of his hands hitting the bar. These sounds happen in real life, and if they aren’t part of the movie soundtrack, it doesn’t seem realistic. It doesn’t immerse you.”

Testing Human-Like Audio Interfaces

Participants in the study interacted with AI agents using various combinations of spatialization and Foley effects. Initially, they familiarized themselves with the room’s layout and audio cues, including items like a laptop and books. During the conversation with the AI agent, the illusion of the AI moving around and performing tasks was created. Post-interaction, participants provided feedback through questionnaires and interviews.

Lindlbauer shared, “We found that, yes, the audio interface made the AI assistant seem more humanlike. We have statistically clear results demonstrating that adding spatial and Foley effects increases your engagement.”

Interestingly, the humanlike interface led users to expect the AI to adhere to social norms. Participants perceived multitasking by the AI, such as talking while typing, as inattentiveness or rudeness. “As soon as the participants felt like their agent was engaged in something else… they considered this rude,” Lindlbauer remarked.

Cheng suggested that aligning audio cues more closely with conversations could mitigate this perceived distraction. Despite the office-specific setup, Lindlbauer believes a generalized system could achieve similar engagement levels with less spatial dependency.

Although participants might focus on sound sources like a typing noise, this did not detract from the immersive effect. Heller commented, “Based on the data from this study, the sounds still had an effect on people consistent with another human being there.”

The research findings will be presented at the upcoming Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI 2026)(opens in new window) in Barcelona. Contributors include Alexander Wang, a Ph.D. student at HCII, and collaborators from KAIST, the University of Sydney, and the University of Michigan.

Read More Here

Comments are closed.