报告题目:Listen to the Pixel and See the Sound: From Audio-Visual Sound Separation to Localization
报告时间:2022年5月13日,星期五,晚上19:30-21:30
报告平台:#腾讯会议:243-434-980
报告人:宋增杰,中国科学院自动化研究所
报告摘要:
Visual and audio modalities are highly correlated, yet they reflect different properties of the sounding objects. By leveraging such strong correlation, we are able to predict the semantics of one modality from the other with remarkable performance. In this talk, I will (1) present a brief overview of the audio-visual learning landscape; (2) elaborate on two specific tasks, i.e., audio-visual sound separation and localization, as well as our recent progress on this topic; (3) discuss a number of interesting directions for future research.
报告人简介:
Zengjie Song is now a Postdoctoral Researcher with the Institute of Automation, Chinese Academy of Sciences (CASIA), working with Prof. Tieniu Tan and Prof. Zhaoxiang Zhang. He received the PhD degree in statistics and BS degree in applied mathematics from Xi'an Jiaotong University, in 2020 and 2013, respectively, both under the supervision of Prof. Jiangshe Zhang. During 2017-2018, he was a visiting PhD student with the University of Illinois at Urbana-Champaign, hosted by Prof. Oluwasanmi Koyejo. His research interests include predictive coding, multimodal learning, and generative model, with a particular emphasis on the intersection of computer vision and computational neuroscience.