Research

My research centers on reliably leveraging audio in multimodal models while mitigating cross-modal interference.

ACPO
Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models
Ami Baid, Zihui Xue, Kristen Grauman
arXiv 2026   [paper] [project page]
Action2Sound
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
ECCV 2024, Oral   [paper] [project page]
Self-Supervised Visual-Acoustic Matching
Self-Supervised Visual-Acoustic Matching
Arjun Somayazulu, Changan Chen, Kristen Grauman
NeurIPS 2023   [paper] [project page]

Internships

Other Projects