北京大学学报自然科学版 ›› 2018, Vol. 54 ›› Issue (4): 687-692.DOI: 10.13209/j.0479-8023.2018.011

   下一篇

游客微博主题情感分析方法比较研究

刘思叶, 田原, 冯雨宁, 庄育龙   

  1. 北京大学遥感与地理信息系统研究所, 北京100871
  • 收稿日期:2017-05-05 修回日期:2017-06-07 出版日期:2018-07-20 发布日期:2018-07-20
  • 通讯作者: 田原, E-mail: tianyuanpku(at)pku.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB0505500, 2018YFB0505504)和测绘遥感信息工程国家重点实验室开放研究基金((16)重02)资助

Comparison of Tourist Thematic Sentiment Analysis Methods Based on Weibo Data

LIU Siye, TIAN Yuan, FENG Yuning, ZHUANG Yulong   

  1. Institute of Remote Sensing and Geographical Information System, Peking University, Beijing 100871
  • Received:2017-05-05 Revised:2017-06-07 Online:2018-07-20 Published:2018-07-20
  • Contact: TIAN Yuan, E-mail: tianyuanpku(at)pku.edu.cn

摘要:

针对饮食、娱乐、购物、景观、交通和住宿6个旅游主题, 基于机器学习方法, 开展游客微博主题情感分析方法比较研究。以人工标注的53140条赴日游客微博为数据基础, 应用两种机器学习模型开展建模实验, 并分析不同特征对建模效果的影响。实验结果显示, 两种模型的建模效果良好, 适用于游客微博主题情感分析, 其中最大熵模型效果略优于支持向量机。研究还表明, 在词特征的基础上引入表情符号和主题词进行特征扩展, 可以提高模型的建模效果。

关键词: 主题情感分析, 游客微博, 最大熵模型, 支持向量机

Abstract:

Six tourism themes, diet, entertainment, shopping, view, transportation, and accommodation, are selected for thematic sentiment analysis. 53140 Weibo items published by Chinese tourists in Japan are collected and manually labeled as the case study dataset. Maximum Entropy model and Support Vector Machine are adopted. The training results are both fairly good, where the resulting Maximum Entropy model prevails slightly. It can be concluded that machine learning models are reasonably feasible in tourist thematic sentiment analysis. Moreover, the experiment also shows that the models can be improved by introducing emoticon icons and thematic words as supplements to traditional word features.

Key words: thematic sentiment analysis, Weibo of tourists, Maximum Entropy, Support Vector Machine (SVM)

中图分类号: