papers

  1. MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean Flow
    Riki Shimizu, Xilin Jiang, and Nima Mesgarani
    arXiv preprint arXiv:2512.18572, 2025
  2. Speaker Identity is Robustly Encoded in Spatial Patterns of Intracranial EEG for Attention Decoding
    Sukru Samet Dindar*, Xilin Jiang*, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Catherine Schevon, Guy M McKhann, Daniel Friedman, Adeen Flinker, and Nima Mesgarani
    bioRxiv, 2025
  3. Sci-Phi: A Large Language Model Spatial Audio Descriptor
    Xilin Jiang, Hannes Gamper, and Sebastian Braun
    arXiv preprint arXiv:2510.05542, 2025
  4. SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
    Qiaolin Wang, Xilin Jiang, Linyang He, Junkai Wu, and Nima Mesgarani
    arXiv preprint arXiv:2509.15661, 2025
  5. Layer-wise minimal pair probing reveals contextual grammatical-conceptual hierarchy in speech representations
    Linyang He, Qiaolin Wang, Xilin Jiang, and Nima Mesgarani
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
  6. DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis
    Yinghao Aaron Li*, Xilin Jiang*, Fei Tao, Cheng Niu, Kaifeng Xu, Juntong Song, and Nima Mesgarani
    In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
  7. Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
    Xilin Jiang*, Junkai Wu*, Vishal Choudhari, and Nima Mesgarani
    Best PaperšŸ„‡ in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2025
  8. AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
    Xilin Jiang*, Sukru Samet Dindar*, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Daniel Friedman, Adeen Flinker, and Nima Mesgarani
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
  9. ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
    Zhongweiyang Xu, Xulin Fan, Zhong-Qiu Wang, Xilin Jiang, and Romit Roy Choudhury
    In Forty-second International Conference on Machine Learning
  10. Exploring finetuned audio-LLM on heart murmur features
    Adrian Florea, Xilin Jiang, Nima Mesgarani, and Xiaofan Jiang
    Smart Health, 2025
  11. Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
    Xilin Jiang*, Yinghao Aaron Li*, Adrian Nicolas Florea, Cong Han, and Nima Mesgarani
    In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
  12. StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
    Yinghao Aaron Li, Xilin Jiang, Cong Han, and Nima Mesgarani
    In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
  13. Just ASR + LLM? A Study on Speech Large Language Models’ Ability to Identify And Understand Speaker in Spoken Dialogue
    Junkai Wu, Xulin Fan, Bo-Ru Lu, Xilin Jiang, Nima Mesgarani, Mark Hasegawa-Johnson, and Mari Ostendorf
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
  14. SSAMBA: Self-Supervised Audio Representation Learning With Mamba State Space Model
    Siavash Shams, Sukru Samet Dindar, Xilin Jiang, and Nima Mesgarani
    In 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
  15. StyleTalker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
    Yinghao Aaron Li*, Xilin Jiang*, Jordan Darefsky, Ge Zhu, and Nima Mesgarani
    In First Conference on Language Modeling
  16. Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation
    Xilin Jiang, Cong Han, and Nima Mesgarani
    In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
  17. Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
    Xilin Jiang, Cong Han, Yinghao Aaron Li, and Nima Mesgarani
    IEEE Journal of Selected Topics in Signal Processing, 2025
  18. Exploring self-supervised contrastive learning of spatial sound event representation
    Xilin Jiang, Cong Han, Yinghao Aaron Li, and Nima Mesgarani
    In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  19. Hiftnet: A fast high-quality neural vocoder with harmonic-plus-noise filter and inverse short time fourier transform
    Yinghao Aaron Li, Cong Han, Xilin Jiang, and Nima Mesgarani
    arXiv preprint arXiv:2309.09493, 2023
  20. Phoneme-level bert for enhanced prosody of text-to-speech with grapheme predictions
    Yinghao Aaron Li, Cong Han, Xilin Jiang, and Nima Mesgarani
    In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
  21. DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
    Xilin Jiang, Yinghao Aaron Li, and Nima Mesgarani
    In Proc. Interspeech 2023, 2023
  22. Learning Representations for New Sound Classes With Continual Self-Supervised Learning
    Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, and Paris Smaragdis
    IEEE Signal Processing Letters, 2022
  23. Compute and memory efficient universal sound source separation
    Efthymios Tzinis, Zhepei Wang, Xilin Jiang, and Paris Smaragdis
    Journal of Signal Processing Systems, 2022