北京大学学报(自然科学版)

基于主题情感混合模型的无监督文本情感分析

孙艳,周学广,付伟   

  1. 海军工程大学信息安全系, 武汉 430033;
  • 收稿日期:2012-06-02 出版日期:2013-01-20 发布日期:2013-01-20

Unsupervised Topic and Sentiment Unification Model for Sentiment Analysis

SUN Yan, ZHOU Xueguang, FU Wei   

  1. Deparment of Information Security, Naval University of Engineering, Wuhan 430033;
  • Received:2012-06-02 Online:2013-01-20 Published:2013-01-20

摘要: 针对有监督、半监督的文本情感分析存在标注样本不容易获取的问题, 通过在LDA模型中融入情感模型, 提出一种无监督的主题情感混合模型(UTSU模型)。UTSU模型对每个句子采样情感标签, 对每个词采样主题标签, 无须对样本进行标注, 就可以得到各个主题的主题情感词, 从而对文档集进行情感分类。情感分类实验对比表明, UTSU模型的分类性能比有监督情感分类方法稍差, 但在无监督的情感分类方法中效果最好, 情感分类综合指标比ASUM模型提高了约2%, 比JST模型提高了约16%。

关键词: 主题模型, LDA, 情感分析, 混合模型

Abstract: Supervised and semi-supervised sentiment classification methods need label corpora for classifier training. To solve this problem, an unsupervised topic and sentiment unification model (UTSU model) is proposed based on the LDA model. UTSU model imposes a constraint that all words in a sentence are generated from one sentiment and each word is generated from one topic. This constraint conforms to the sentiment expression of language and will not limit the topic relation of words. UTSU model is compeletly unsupervised and it needs neither labeled corpora nor sentiment seed words. The experiments of sentiment classification show that UTSU model comes close to supervised classification methods and outperforms other topic and sentiment unification models. UTSU model improves the F1 value of sentiment classification 2% than ASUM model and 16% than JST model.

Key words: topic model, latent Dirichlet allocation (LDA), sentiment analysis, unification model

中图分类号: