I am a Postdoctor at the Technical University of Munich (TUM), under Prof. Bjorn Schuller. I received my Ph.D. degree from National University of Singapore, supervised by Prof. Li Haizhou and Prof. Robby T. Tan. Prior to that, I received the Master’s Degree from Zhejiang University in 2019, supervised by Prof. Li Ping and Prof. Ren Qinyuan. I was awarded a Bachelor’s Degree by Northeastern University (China) in 2016.
My research interests are audio-visual speech recognition, talking face generation, and audio-visual sound source localization. I have 15 papers published or under review at top international conferences and journals, including CVPR, TASLP, TNNLS, TMM, AAAI, ICRA, and ICASSP.
📜 Research Area
Audio-Visual Speech Processing : Audio-visual speech recognition; Sound Source localization |
Video Synthesize : Talking Face Generation |
💼 Employment
- 2024.04 - Now, Postdoctor in MRI, Technical University of Munich, Germany.
- 2023.08 - 2024.03, Research Assistant in Chinese University of Hong kong, Shenzhen, China.
🏫 Education
- 2019.08 - 2024.02, Ph.D. in Electrical and Computer Engineering, National University of Singapore, Singapore.
- 2016.08 - 2019.06, M.Sc. in Control Engineering, Zhejiang Univerisity, China.
- 2012.09 - 2016.06, B.Eng. in Automation, Northeastern University, China.
📝 Publication
2024
- Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training, Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang*, Haizhou Li, TMM, 2024.
- Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal, Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang, ECCV, 2024.
- Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition, Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li, AAAI, 2024.
2023
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert, Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li, CVPR, 2023.
2022
- Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception, Jiadong Wang, Xinyuan Qian, Haizhou Li, TASLP (under review), 2022.
- A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding, Jiadong Wang, Jibin Wu, Malu Zhang, Qi Liu, Haizhou Li. ICASSP, 2022.
- Audio-Visual Cross-Attention Network for Robotic Speaker Tracking, Xinyuan Qian, Zhengdong Wang, Jiadong Wang*, Guohui Guan, Haizhou Li, TASLP, 2022.
2021
- GCC-PHAT with speech-oriented attention for robotic sound source localization, Jiadong Wang, Xinyuan Qian, Zihan Pan, Malu Zhang, Haizhou Li, ICRA, 2021.
- Rectified linear postsynaptic potential function for backpropagation in deep spiking neural networks, Malu Zhang, Jiadong Wang*, Jibin Wu, Ammar Belatreche, Burin Amornpaisannon, Zhixuan Zhang, Venkata Pavan Kumar Miriyala, Hong Qu, Yansong Chua, Trevor E Carlson, Haizhou Li, TNNLS, 2021.
- Multi-tone phase coding of interaural time difference for sound source localization with spiking neural networks, Zihan Pan, Malu Zhang, Jibin Wu, Jiadong Wang, Haizhou Li, TASLP, 2021.
- Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation, Xinyuan Qian, Qi Liu, Jiadong Wang, Haizhou Li, Signal Processing Letters, 2021.
- Multi-target DoA estimation with an audio-visual fusion mechanism, Xinyuan Qian, Maulik Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li, ICASSP, 2021.
💻 Open Source Code
👔 Internship and Visiting Experience
- 2018.07 - 2018.12, Visiting Student, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- 2022.02 - 2022.08, Visiting Student, Chinese University of Hong Kong (CUHKSZ), Shenzhen, China.
Reviewer
- Reviewer of ICCV, ECCV, ACM MM, TMM, SIGGRAPH ASIA, IROS.