I am a Postdoctor at the Technical University of Munich (TUM), under Prof. Bjorn Schuller. I received my Ph.D. degree from National University of Singapore, supervised by Prof. Li Haizhou and Prof. Robby T. Tan. Prior to that, I received the Master’s Degree from Zhejiang University in 2019, supervised by Prof. Li Ping and Prof. Ren Qinyuan. I was awarded a Bachelor’s Degree by Northeastern University (China) in 2016.
My research interests are audio-visual speech recognition, talking face generation, and audio-visual sound source localization. I have 15 papers published or under review at top international conferences and journals, including CVPR, TASLP, TNNLS, TMM, AAAI, ICRA, and ICASSP.
📜 Research Area
| Audio-Visual Speech Processing : Audio-visual speech recognition; Sound Source localization |
Video Synthesize : Talking Face Generation |
🔥 Special Sessions & Challenges
- We are organising a special session at ICASSP 2026 on Multimodal Ambient Scene Perception, Understanding, and Modeling. We warmly welcome submissions related to this topic. Further details are available in the Poster.
- The 1st MPDD Challenge: Multimodal Personality-aware Depression Detection, ACM MM 2025, dublin, Website.
💼 Employment
- 2024.04 - Now, Postdoctor in MRI, Technical University of Munich, Germany.
🏫 Education
- 2019.08 - 2024.02, Ph.D. in Electrical and Computer Engineering, National University of Singapore, Singapore.
- 2016.08 - 2019.06, M.Sc. in Control Engineering, Zhejiang Univerisity, China.
- 2012.09 - 2016.06, B.Eng. in Automation, Northeastern University, China.
📝 Publication
2025
- The First MPDD Challenge: Multimodal Personality-aware Depression Detection, Changzeng Fu, Zelin Fu, Qi Zhang, Xinhe Kuang, Jiacheng Dong, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn Schuller, Hiroshi Ishiguro, ACM Multimedia, 2025.
- Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition, Qianhui Liu, Jiadong Wang*, Yang Wang, Xin Yang, Gang Pan, Haizhou Li, Transactions on Computers, 2025.
- Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention, Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang*, Haizhou Li, TASLP, 2025.
- C2 AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction, Wenxuan Wu, Xueyuan Chen, Shuai Wang, Jiadong Wang, Lingwei Meng, Xixin Wu, Helen Meng, Haizhou Li, JSTSP, 2025.
2024
- Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection, Xinyuan Qian, Xianghu Yue, Jiadong Wang, Huiping Zhuang, Haizhou Li, SPL, 2024.
- Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception, Jiadong Wang, Xinyuan Qian, Haizhou Li, TASLP, 2024.
- Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training, Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang*, Haizhou Li, TMM, 2024.
- Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal, Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang, ECCV, 2024.
- Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition, Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li, AAAI, 2024.
2023
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert, Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li, CVPR, 2023.
2022
- A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding, Jiadong Wang, Jibin Wu, Malu Zhang, Qi Liu, Haizhou Li. ICASSP, 2022.
- Audio-Visual Cross-Attention Network for Robotic Speaker Tracking, Xinyuan Qian, Zhengdong Wang, Jiadong Wang*, Guohui Guan, Haizhou Li, TASLP, 2022.
2021
- GCC-PHAT with speech-oriented attention for robotic sound source localization, Jiadong Wang, Xinyuan Qian, Zihan Pan, Malu Zhang, Haizhou Li, ICRA, 2021.
- Rectified linear postsynaptic potential function for backpropagation in deep spiking neural networks, Malu Zhang, Jiadong Wang*, Jibin Wu, Ammar Belatreche, Burin Amornpaisannon, Zhixuan Zhang, Venkata Pavan Kumar Miriyala, Hong Qu, Yansong Chua, Trevor E Carlson, Haizhou Li, TNNLS, 2021.
- Multi-tone phase coding of interaural time difference for sound source localization with spiking neural networks, Zihan Pan, Malu Zhang, Jibin Wu, Jiadong Wang, Haizhou Li, TASLP, 2021.
- Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation, Xinyuan Qian, Qi Liu, Jiadong Wang, Haizhou Li, Signal Processing Letters, 2021.
- Multi-target DoA estimation with an audio-visual fusion mechanism, Xinyuan Qian, Maulik Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li, ICASSP, 2021.
💻 Open Source Code
👔 Internship and Visiting Experience
- 2018.07 - 2018.12, Visiting Student, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- 2022.02 - 2022.08, Visiting Student, Chinese University of Hong Kong (CUHKSZ), Shenzhen, China.
- 2023.08 - 2024.03, Research Assistant in Chinese University of Hong kong, Shenzhen, China.
Reviewer
- Reviewer of CVPR, ICCV, ECCV, ACM MM, TMM, SIGGRAPH ASIA, IROS.