Xilin Jiang
Notre-Dame de la Garde, Marseille, France
I am a PhD candidate at Columbia University, in Electrical Engineering and Zuckerman Institute. I am a member of the Neural Acoustic Processing Lab supervised by Professor Nima Mesgarani. I build AI models and agents that listen and speak like humans. Recently, I am interested in how multimodal language models perceive and reason about the world, and how to better align them in behavior and in intention with human ears, eyes, and minds.
I am currently searching for 2026 research internships and collaborations.
Education
-
Columbia University, New York, NY
Fall 2022 ~ Now 💻☕
Ph.D. candidate after joint M.S. in Electrical Engineering
-
University of Illinois Urbana–Champaign, IL
Fall 2018 ~ Fall 2021
B.S. in Computer Engineering, with the Bronze Tablet 🏅🎓
Internship
-
Microsoft Research, Redmond, WA
Summer 2025, Research Intern
Mentors: Sebastian Braun & Hannes Gamper
Project: Sci-Phi: A Large Language Model Spatial Audio Descriptor
-
Amazon, Palo Alto, CA
Summer 2021 & 2022, SDE Intern
news
| Jan 27, 2026 | AVMeme Exam is public: A Multimodal Multilingual Multicultural Benchmark for LLMs’ Contextual and Cultural Knowledge and Thinking |
|---|---|
| Jan 17, 2026 | My mentored paper SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Accepted to ICASSP 2026🎉 |
| Jan 13, 2026 | My MSR intern paper Sci-Phi: A Large Language Model Spatial Audio Descriptor Accepted to IEEE Open Journal of Signal Processing 🎉 |
| Nov 07, 2025 | DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis Accepted to AAAI 2026 🎉 |
| Oct 14, 2025 | Bridging Ears&Eyes cross audio&visual LLM distill Won the Best Paper🥇 in WASPAA 2025 |