北京大学学报(自然科学版)

基于自动编码特征的汉语解释性意见句识别

贺宇,潘达,付国宏   

  1. 黑龙江大学计算机科学技术学院, 哈尔滨 150080;
  • 收稿日期:2014-06-28 出版日期:2015-03-20 发布日期:2015-03-20

Chinese Explanatory Opinionated Sentence Recognition Based on Auto-Encoding Features

HE Yu, PAN Da, FU Guohong   

  1. School of Computer Science and Technology, Heilongjiang University, Harbin 150080;
  • Received:2014-06-28 Online:2015-03-20 Published:2015-03-20

摘要: 提出一种基于自动编码特征的汉语解释性意见句识别的分类方法。首先从汽车和手机两个领域的产品评论中构造一个解释性意见语料库, 然后采用分类的方法进行解释性意见句识别。特别地, 采用自动编码技术表示和学习解释性意见句分类的词向量特征。最后, 在支持向量机框架下通过实验优选解释性词向量 维度, 并与一些传统特征表示方法进行比较。实验结果表明, 与传统的卡方、信息增益和TF-IDF及其组合方法相比, 自动编码特征的引入能有效提升汉语解释性意见句识别性能。

关键词: 意见挖掘, 解释性意见句识别, 自动编码

Abstract: An auto-encoding feature based classification method to Chinese explanatory opinionated sentence recognition was presented. An explanatory opinion corpus is built firstly from online product reviews in cellphone and car domains. Then, word embeddings are learned from product reviews using the auto-encoding technique. Finally, the learned word embeddings are used as features for explanatory opinionated sentence classification under the framework of supported vector machines. Experimental results show that word embeddings are more effective than some traditional representations of features like Chi-square, TF-IDF and information gains for explanatory opinionated sentence classification.

Key words: opinion mining, explanatory opinionated sentence recognition, auto-encoding

中图分类号: