北京大学学报自然科学版 ›› 2020, Vol. 56 ›› Issue (1): 105-111.DOI: 10.13209/j.0479-8023.2019.097

上一篇    下一篇

基于多模态融合技术的用户画像方法

张壮1, 冯小年2, 钱铁云1,†   

  1. 1. 武汉大学计算机学院, 武汉 430072 2. 中国电力财务有限公司, 北京 100005
  • 收稿日期:2019-05-21 修回日期:2019-09-22 出版日期:2020-01-20 发布日期:2020-01-20
  • 通讯作者: 钱铁云, E-mail: qty(at)whu.edu.cn
  • 基金资助:
    国家自然科学基金(61572376)资助

User Profiling Based on Multimodal Fusion Technology

ZHANG Zhuang1, FENG Xiaonian2, QIAN Tieyun1,†   

  1. 1. School of Computer Science, Wuhan University, Wuhan 430072 2. China Power Finance Co., Ltd, Beijing 100005
  • Received:2019-05-21 Revised:2019-09-22 Online:2020-01-20 Published:2020-01-20
  • Contact: QIAN Tieyun, E-mail: qty(at)whu.edu.cn

摘要:

针对当前用户画像工作中各模态信息不能被充分利用的问题, 提出一种跨模态学习思想, 设计一种基于多模态融合的用户画像模型。首先利用 Stacking集成方法, 融合多种跨模态学习联合表示网络, 对相应的模型组合进行学习, 然后引入注意力机制, 使得模型能够学习不同模态的表示对预测结果的贡献差异性。改进后的模型具有精心设计的网络结构和目标函数, 能够生成一个由特征级融合和决策级融合组成的联合特征表示, 从而可以合并不同模态的相关特征。在真实数据集上的实验结果表明, 所提模型优于当前最好的基线方法。

关键词: 用户画像, 模型组合, stacking, 跨模态学习联合表示, 多层多级模型融合

Abstract:

Existing studies in user profiling are unable to fully utilize the multimodal information. This paper presents a cross-modal joint representation learning network, and develop a multi-modal fusion model. Firstly, a stacking method is adopted to learn the joint representation network which fuse the cross-modal information. Then, attention mechanism is introduced to automatically learn the contribution of different modal to the prediction task. Proposed model has a well defined loss function and network structure, which enables combining the related features in various models by learning the joint representations after feature-level and decision-level fusion. The extensive experiments on real data sets show that proposed model outperforms the baselines.

Key words: user profiling, model combination, stacking, cross-modal learning joint representation, multi-layer and multi-level model fusion