Research

My research focuses on audio-visual multimodal learning, with a focus on reliably leveraging audio while mitigating noise and cross-modal confusion.

Undergraduate Honors Thesis (in progress)
Developing a preference optimization framework to reduce video-driven audio hallucination in audio-visual language models.
Action2Sound
Action2Sound
ECCV 2024, Oral
An ambient-aware video to audio generation approach that explicitly disentangles the action sound from the ambient sounds. [link]
Self-Supervised Visual-Acoustic Matching
Self-Supervised Visual-Acoustic Matching
NeurIPS 2023
A self-supervised visual acoustic matching method that re-synthesizes audio to match a target scene's acoustics. [link]

Internships

Other Projects