Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2021, Vol. 57 ›› Issue (1): 7-15.DOI: 10.13209/j.0479-8023.2020.085

Previous Articles     Next Articles

A Multi-modal Sentiment Recognition Method Based on Multi-task Learning

LIN Zijie1, LONG Yunfei2, DU Jiachen1, XU Ruifeng1,†   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen 518055 2. School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ
  • Received:2020-06-08 Revised:2020-08-14 Online:2021-01-20 Published:2021-01-20
  • Contact: XU Ruifeng, E-mail: xuruifeng(at)hit.edu.cn

一种基于多任务学习的多模态情感识别方法

林子杰1, 龙云飞2, 杜嘉晨1, 徐睿峰1,†   

  1. 1. 哈尔滨工业大学(深圳)计算机科学与技术学院, 深圳 518055 2. School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ
  • 通讯作者: 徐睿峰, E-mail: xuruifeng(at)hit.edu.cn
  • 基金资助:
    国家自然科学基金(61876053, 61632011, 62006062)、深圳市基础研究学科布局项目(JCYJ20180507183527919, JCYJ20180507183608379)和广东省新冠肺炎疫情防控科研专项(2020KZDZX1224)和深圳市技术攻关项目(JSGG20170817140856618)资助

Abstract:

In order to learn more emotionally inclined video and speech representations through auxiliary tasks, and improve the effect of multi-modal fusion, this paper proposes a multi-modal sentiment recognition method based on multi-task learning. A multimodal sharing layer is used to learn the sentiment information of the visual and acoustic modes. The experiment on MOSI and MOSEI data sets shows that adding two auxiliary single-modal sentiment recognition tasks can learn more effective single-modal sentiment representations, and improve the accuracy of sentiment recognition by 0.8% and 2.5% respectively.

Key words: multi-modal information, sentiment recognition, multi-modal fusion, multi-task learning

摘要:

为了通过设置辅助任务学习到更具有情感倾向性的视频和语音表示, 进而提升模态融合的效果, 提出一种基于多任务学习的多模态情感识别模型, 使用多模态共享层来学习视觉和语音模型的情感信息。在MOSI数据集和MOSEI数据集上的实验表明, 添加两个辅助的单模态情感识别任务后, 模型可以学习到更有效的单模态情感表示, 并且在两个数据集上的情感识别准确率比目前性能最佳的单任务模型分别提升0.8%和2.5%。

关键词: 多模态信息, 情感识别, 模态融合, 多任务学习