结合表示学习和迁移学习的跨领域情感分类

doi:10.13209/j.0479-8023.2018.063

北京大学学报自然科学版 ›› 2019, Vol. 55 ›› Issue (1): 37-46.DOI: 10.13209/j.0479-8023.2018.063

结合表示学习和迁移学习的跨领域情感分类

廖祥文^1,2,3,†, 吴晓静^1,2, 桂林¹, 黄锦辉⁴, 陈国龙^1,2

1. 福州大学数学与计算机科学学院, 福州 350116
2. 福建省网络计算与智能信息处理重点实验室(福州大学), 福州 350116
3. 福建省信息处理与智能控制重点实验室(闽江学院), 福州 350116
4. 香港中文大学系统工程与工程管理学系, 香港

收稿日期:2018-04-15 修回日期:2018-08-18 出版日期:2019-01-20 发布日期:2019-01-20
通讯作者: 廖祥文, E-mail: liaoxw(at)fzu.edu.cn
基金资助:
国家自然科学基金(61772135, U1605251)、中国科学院网络数据科学与技术重点实验室开放基金课题(CASNDST201708, CASNDST201606)、可信分布式计算与服务教育部重点实验室主任基金(2017KF01)和赛尔网络下一代互联网技术创新项目(NGII20160501)资助

Cross-Domain Sentiment Classification Based on Representation Learning and Transfer Learning

LIAO Xiangwen^1,2,3,†, WU Xiaojing^1,2, GUI Lin¹, HUANG Jinhui⁴, CHEN Guolong^1,2

1. School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116
2. Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing (Fuzhou University), Fuzhou, 350116
3. Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University), Fuzhou, 350116
4. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong

Received:2018-04-15 Revised:2018-08-18 Online:2019-01-20 Published:2019-01-20
Contact: LIAO Xiangwen, E-mail: liaoxw(at)fzu.edu.cn

摘要/Abstract

摘要：

针对现有跨领域情感分类方法中文本表示特征忽略了重要单词与句子的情感信息, 且在迁移过程中存在负面迁移的问题, 提出一种将文本表示学习与迁移学习算法相结合的跨领域情感分类方法。首先, 利用低维稠密的词向量对文本进行初始化, 通过分层注意力网络, 对文本中重要单词与句子的情感信息进行建模, 从而学习源领域与目标领域的文档级分布式表示。随后, 采用类噪声估计方法, 对源领域中的迁移数据进行检测, 剔除负面迁移样例, 挑选高质量样例来扩充目标领域的训练集。最后, 训练支持向量机对目标领域文本进行情感分类。在大规模公开数据集上进行的两个实验结果表明, 与基准方法相比, 所提方法的均方根误差分别降低1.5%和1.0%, 说明该方法可以有效地提高跨领域情感分类性能。

关键词: 文本表示学习, 迁移学习, 类噪声估计, 跨领域, 情感分类

Abstract:

Most of existing cross-domain sentiment classification methods are not expressive enough to capture rich representation of texts, and class noise accumulated during transfer process would lead to negative transfer which could adversely affect performance. To address these issues, the authors propose a method combining textual representation learning and transfer learning algorithm for cross-domain sentiment classification. This method first builds a hierarchical attention network to generate document representations with local semantic information. Afterwards, the authors utilize the class-noise estimation algorithm to detect the negative transfer samples in transferred samples and remove them. Finally, the sentiment classifier is trained on the expanded dataset from samples in target domain and transferred ones in source domain. Compared with the baselines, two experiments on large-scale product review datasets show that the proposed method is able to effectively reduce RMSE of crossdomain sentiment classification by 1.5% and 1.0% respectively.

Key words: textual representation learning, transfer learning, class-noise estimation, cross-domain, sentiment classification

廖祥文, 吴晓静, 桂林, 黄锦辉, 陈国龙. 结合表示学习和迁移学习的跨领域情感分类[J]. 北京大学学报自然科学版, 2019, 55(1): 37-46.

LIAO Xiangwen, WU Xiaojing, GUI Lin, HUANG Jinhui, CHEN Guolong. Cross-Domain Sentiment Classification Based on Representation Learning and Transfer Learning[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(1): 37-46.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://xbna.pku.edu.cn/CN/10.13209/j.0479-8023.2018.063

https://xbna.pku.edu.cn/CN/Y2019/V55/I1/37

[1]	曲威名, 刘天林, 林惟凯, 罗定生. 机器人学习方法综述[J]. 北京大学学报自然科学版, 2023, 59(6): 1069-1086.
[2]	李子成, 常晓琴, 李雅梦, 李寿山, 周国栋. 基于联合学习的少样本多类别情感分类方法[J]. 北京大学学报自然科学版, 2023, 59(1): 57-64.
[3]	王雅松, 刘明童, 张玉洁, 徐金安, 陈钰枫. 复述平行语料构建及其应用方法研究[J]. 北京大学学报自然科学版, 2021, 57(1): 68-74.
[4]	贾云龙, 韩东红, 林海原, 王国仁, 夏利. 面向微博用户的消费意图识别算法[J]. 北京大学学报自然科学版, 2020, 56(1): 68-74.
[5]	薛云霞,李寿山,王中卿. 基于社会关系网络的半监督情感分类[J]. 北京大学学报（自然科学版）, 2014, 50(1): 61-66.
[6]	陈强,何炎祥,刘续乐,孙松涛,彭敏,李飞. 基于句法分析的跨语言情感分析[J]. 北京大学学报（自然科学版）, 2014, 50(1): 55-60.

结合表示学习和迁移学习的跨领域情感分类

Cross-Domain Sentiment Classification Based on Representation Learning and Transfer Learning

RichHTML

PDF

PDF (翻译版)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics

留言