首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition
Institution:1. College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China;2. School of Artificial Intellgence and the Hebei Provincial Key Laboratory of Big Data Computing, Hebei University of Technology, Tianjin 300401, China;1. ISTI-CNR, Pisa, Italy;2. University of Pisa, Pisa, Italy;3. Georgetown University, Washington, DC, USA;1. Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China;2. Department of Pain Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310018, China;3. Wuhan second ship design and research institute, Wuhan 430205, China;1. Digital Media Art Department, Shanghai Theatre Academy, China;2. College of Intelligence and Computing, Tianjin University, China;1. Engineering Research Center of Optical Instrument and System, Ministry of Education, Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai, 200093, PR China;2. School of Social Sciences, University of Shanghai for Science and Technology, Shanghai, 200093, PR China;3. Department of Automation, East China University of Science and Technology, Shanghai 200237, PR China;1. Colloege of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia;2. Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh;3. Department of Informatics, University of Oslo, Oslo, Norway;4. School of Information Technology, Deakin University, Australia;5. Department of Informatics, Modeling, Electronics, and Systems, University of Calabria, Italy
Abstract:Emotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号