Ami Baid

Studying computer science + math as a Turing Scholar at UT Austin. I'm graduating in Spring 2026 and starting my Master's in CS at Stanford in the fall!🌲
Undergraduate researcher in the UT Austin Computer Vision Lab, advised by Professor Kristen Grauman.
I am excited about developing intelligent systems that can understand and reason over information from diverse modalities.

Email / GitHub

Research

My research focuses on audio-visual multimodal learning, with a focus on reliably leveraging audio while mitigating noise and cross-modal confusion.

Undergraduate Honors Thesis (in progress)

Developing a preference optimization framework to reduce video-driven audio hallucination in audio-visual language models.

Action2Sound

ECCV 2024, Oral

An ambient-aware video to audio generation approach that explicitly disentangles the action sound from the ambient sounds. [link]

Self-Supervised Visual-Acoustic Matching

NeurIPS 2023

A self-supervised visual acoustic matching method that re-synthesizes audio to match a target scene's acoustics. [link]

Engineering intern @ Stripe (summer 2025): extended Stripe's LLM-based compliance detection system to support image understanding on merchant websites.
Software engineering intern @ Salesforce (summer 2024): automated a key workflow in Salesforce's internal Temporal platform and contributed to the open-source Terraform Temporal provider.

Gaze-centered Egocentric Video Representations: built a gaze-aware preprocessing pipeline that reallocates resolution around gaze, improving efficiency in egocentric video QA. [GitHub]