北京大学学报自然科学版 ›› 2019, Vol. 55 ›› Issue (1): 37-46.DOI: 10.13209/j.0479-8023.2018.063

上一篇    下一篇

结合表示学习和迁移学习的跨领域情感分类

廖祥文1,2,3,†, 吴晓静1,2, 桂林1, 黄锦辉4, 陈国龙1,2   

  1. 1. 福州大学数学与计算机科学学院, 福州 350116
    2. 福建省网络计算与智能信息处理重点实验室(福州大学), 福州 350116
    3. 福建省信息处理与智能控制重点实验室(闽江学院), 福州 350116
    4. 香港中文大学系统工程与工程管理学系, 香港
  • 收稿日期:2018-04-15 修回日期:2018-08-18 出版日期:2019-01-20 发布日期:2019-01-20
  • 通讯作者: 廖祥文, E-mail: liaoxw(at)fzu.edu.cn
  • 基金资助:
    国家自然科学基金(61772135, U1605251)、中国科学院网络数据科学与技术重点实验室开放基金课题(CASNDST201708, CASNDST201606)、可信分布式计算与服务教育部重点实验室主任基金(2017KF01)和赛尔网络下一代互联网技术创新项目(NGII20160501)资助

Cross-Domain Sentiment Classification Based on Representation Learning and Transfer Learning

LIAO Xiangwen1,2,3,†, WU Xiaojing1,2, GUI Lin1, HUANG Jinhui4, CHEN Guolong1,2   

  1. 1. School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116
    2. Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing (Fuzhou University), Fuzhou, 350116
    3. Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University), Fuzhou, 350116
    4. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
  • Received:2018-04-15 Revised:2018-08-18 Online:2019-01-20 Published:2019-01-20
  • Contact: LIAO Xiangwen, E-mail: liaoxw(at)fzu.edu.cn

摘要:

针对现有跨领域情感分类方法中文本表示特征忽略了重要单词与句子的情感信息, 且在迁移过程中存在负面迁移的问题, 提出一种将文本表示学习与迁移学习算法相结合的跨领域情感分类方法。首先, 利用低维稠密的词向量对文本进行初始化, 通过分层注意力网络, 对文本中重要单词与句子的情感信息进行建模, 从而学习源领域与目标领域的文档级分布式表示。随后, 采用类噪声估计方法, 对源领域中的迁移数据进行检测, 剔除负面迁移样例, 挑选高质量样例来扩充目标领域的训练集。最后, 训练支持向量机对目标领域文本进行情感分类。在大规模公开数据集上进行的两个实验结果表明, 与基准方法相比, 所提方法的均方根误差分别降低1.5%和1.0%, 说明该方法可以有效地提高跨领域情感分类性能。

关键词: 文本表示学习, 迁移学习, 类噪声估计, 跨领域, 情感分类

Abstract:

Most of existing cross-domain sentiment classification methods are not expressive enough to capture rich representation of texts, and class noise accumulated during transfer process would lead to negative transfer which could adversely affect performance. To address these issues, the authors propose a method combining textual representation learning and transfer learning algorithm for cross-domain sentiment classification. This method first builds a hierarchical attention network to generate document representations with local semantic information. Afterwards, the authors utilize the class-noise estimation algorithm to detect the negative transfer samples in transferred samples and remove them. Finally, the sentiment classifier is trained on the expanded dataset from samples in target domain and transferred ones in source domain. Compared with the baselines, two experiments on large-scale product review datasets show that the proposed method is able to effectively reduce RMSE of crossdomain sentiment classification by 1.5% and 1.0% respectively.

Key words: textual representation learning, transfer learning, class-noise estimation, cross-domain, sentiment classification