Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2018, Vol. 54 ›› Issue (4): 687-692.DOI: 10.13209/j.0479-8023.2018.011
Next Articles
LIU Siye, TIAN Yuan†, FENG Yuning, ZHUANG Yulong
Received:
Revised:
Online:
Published:
Contact:
刘思叶, 田原†, 冯雨宁, 庄育龙
通讯作者:
基金资助:
Abstract:
Six tourism themes, diet, entertainment, shopping, view, transportation, and accommodation, are selected for thematic sentiment analysis. 53140 Weibo items published by Chinese tourists in Japan are collected and manually labeled as the case study dataset. Maximum Entropy model and Support Vector Machine are adopted. The training results are both fairly good, where the resulting Maximum Entropy model prevails slightly. It can be concluded that machine learning models are reasonably feasible in tourist thematic sentiment analysis. Moreover, the experiment also shows that the models can be improved by introducing emoticon icons and thematic words as supplements to traditional word features.
Key words: thematic sentiment analysis, Weibo of tourists, Maximum Entropy, Support Vector Machine (SVM)
摘要:
针对饮食、娱乐、购物、景观、交通和住宿6个旅游主题, 基于机器学习方法, 开展游客微博主题情感分析方法比较研究。以人工标注的53140条赴日游客微博为数据基础, 应用两种机器学习模型开展建模实验, 并分析不同特征对建模效果的影响。实验结果显示, 两种模型的建模效果良好, 适用于游客微博主题情感分析, 其中最大熵模型效果略优于支持向量机。研究还表明, 在词特征的基础上引入表情符号和主题词进行特征扩展, 可以提高模型的建模效果。
关键词: 主题情感分析, 游客微博, 最大熵模型, 支持向量机
CLC Number:
F590
LIU Siye, TIAN Yuan, FENG Yuning, ZHUANG Yulong. Comparison of Tourist Thematic Sentiment Analysis Methods Based on Weibo Data[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2018, 54(4): 687-692.
刘思叶, 田原, 冯雨宁, 庄育龙. 游客微博主题情感分析方法比较研究[J]. 北京大学学报自然科学版, 2018, 54(4): 687-692.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2018.011
https://xbna.pku.edu.cn/EN/Y2018/V54/I4/687