基于分层融合策略和上下文信息嵌入的多模态情绪识别

doi:10.13209/j.0479-8023.2024.034

北京大学学报自然科学版 ›› 2024, Vol. 60 ›› Issue (3): 393-402.DOI: 10.13209/j.0479-8023.2024.034

基于分层融合策略和上下文信息嵌入的多模态情绪识别

孙明龙, 欧阳纯萍^†, 刘永彬, 任林

南华大学计算机学院, 衡阳 421200

收稿日期:2023-05-19 修回日期:2023-07-30 出版日期:2024-05-20 发布日期:2024-05-20
通讯作者: 欧阳纯萍, E-mail: ouyangcp(at)126.com
基金资助:
湖南省自然科学基金(2022JJ30495)和湖南省教育厅重点科研项目(22A0316)资助

Multimodal Emotion Recognition Based on Hierarchical Fusion Strategy and Contextual Information Embedding

SUN Minglong, OUYANG Chunping^†, LIU Yongbin, REN Lin

School of Computing, University of South China, Hengyang 421200

Received:2023-05-19 Revised:2023-07-30 Online:2024-05-20 Published:2024-05-20
Contact: OUYANG Chunping, E-mail: ouyangcp(at)126.com

摘要/Abstract

摘要：

现有的多模态融合策略大多将不同模态特征进行简单拼接, 忽略了针对单个模态固有特点的个性化融合需求。同时, 在情绪识别阶段, 独立地看待单个话语的情绪而不考虑其在前后话语语境下的情绪状态, 可能导致情绪识别错误。为了解决上述问题, 提出一种基于分层融合策略和上下文信息嵌入的多模态情绪识别方法, 通过分层融合策略, 采用层次递进的方式, 依次融合不同的模态特征, 以便减少单个模态的噪声干扰并解决不同模态间表达不一致的问题。该方法还充分利用融合后模态的上下文信息, 综合考虑单个话语在上下文语境中的情绪表示, 以便提升情绪识别的效果。在二分类情绪识别任务中, 该方法的准确率比SOTA模型提升1.54%。在多分类情绪识别任务中, 该方法的F1值比SOTA模型提升2.79%。

关键词: 分层融合, 噪声干扰, 上下文信息嵌入

Abstract:

Existing fusion strategies often involve simple concatenation of modal features, disregarding personalized fusion requirements based on the characteristics of each modality. Additionally, solely considering the emotions of individual utterances in isolation, without accounting for their emotional states within the context, can lead to errors in emotion recognition. To address the aforementioned issues, this paper proposes a multimodal emotion recognition method based on a layered fusion strategy and the incorporation of contextual information. The method employs a layered fusion strategy, progressively integrating different modal features in a hierarchical manner to reduce noise interference from individual modalities and address inconsistencies in expression across different modalities. It leverages the contextual information to comprehensively analyze the emotional representation of each utterance within the context, enhancing overall emotion recognition performance. In binary emotion classification tasks, the proposed method achieves a 1.54% improvement in accuracy compared with the state-of-the-art (SOTA) model. In multi-class emotion recognition tasks, the F1 score is improved by 2.79% compared to SOTA model.

Key words: hierarchical fusion, noise interference, context information embedding

孙明龙, 欧阳纯萍, 刘永彬, 任林. 基于分层融合策略和上下文信息嵌入的多模态情绪识别[J]. 北京大学学报自然科学版, 2024, 60(3): 393-402.

SUN Minglong, OUYANG Chunping, LIU Yongbin, REN Lin. Multimodal Emotion Recognition Based on Hierarchical Fusion Strategy and Contextual Information Embedding[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(3): 393-402.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://xbna.pku.edu.cn/CN/10.13209/j.0479-8023.2024.034

https://xbna.pku.edu.cn/CN/Y2024/V60/I3/393

324

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	324

来源	本网站	其他网站

次数	186	138
比例	57%	43%

摘要

539

最新录用	在线预览	正式出版

0	0	539

来源	本网站	其他网站

次数	104	435
比例	19%	81%

基于分层融合策略和上下文信息嵌入的多模态情绪识别

Multimodal Emotion Recognition Based on Hierarchical Fusion Strategy and Contextual Information Embedding

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐 0

Metrics

留言