papers
- MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean FlowarXiv preprint arXiv:2512.18572, 2025
- Speaker Identity is Robustly Encoded in Spatial Patterns of Intracranial EEG for Attention DecodingbioRxiv, 2025
- Sci-Phi: A Large Language Model Spatial Audio DescriptorarXiv preprint arXiv:2510.05542, 2025
- SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language ModelsarXiv preprint arXiv:2509.15661, 2025
- Layer-wise minimal pair probing reveals contextual grammatical-conceptual hierarchy in speech representationsIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
- DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech SynthesisIn Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
- Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal DistillationBest Paperš„ in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2025
- AAD-LLM: Neural Attention-Driven Auditory Scene UnderstandingIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
- ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion PriorIn Forty-second International Conference on Machine Learning
- Exploring finetuned audio-LLM on heart murmur featuresSmart Health, 2025
- Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and SynthesisIn ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
- Just ASR + LLM? A Study on Speech Large Language Modelsā Ability to Identify And Understand Speaker in Spoken DialogueIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
- SSAMBA: Self-Supervised Audio Representation Learning With Mamba State Space ModelIn 2024 IEEE Spoken Language Technology Workshop (SLT), 2024
- StyleTalker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue GenerationIn First Conference on Language Modeling
- Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech SeparationIn ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
- Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory ExperienceIEEE Journal of Selected Topics in Signal Processing, 2025
- Exploring self-supervised contrastive learning of spatial sound event representationIn ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
- Hiftnet: A fast high-quality neural vocoder with harmonic-plus-noise filter and inverse short time fourier transformarXiv preprint arXiv:2309.09493, 2023
- Phoneme-level bert for enhanced prosody of text-to-speech with grapheme predictionsIn ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
- DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio CodesIn Proc. Interspeech 2023, 2023
- Learning Representations for New Sound Classes With Continual Self-Supervised LearningIEEE Signal Processing Letters, 2022
- Compute and memory efficient universal sound source separationJournal of Signal Processing Systems, 2022