A Weibo Bot-users Indentification Model Based on Random Forest

Acta Scientiarum Naturalium Universitatis Pekinensis

Previous Articles Next Articles

A Weibo Bot-users Indentification Model Based on Random Forest

LIU Kan¹, YUAN Yunying¹, LIU Ping²

1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430074; 2. School of Information Management, Wuhan University, Wuhan 430072;

Received:2014-07-27 Online:2015-03-20 Published:2015-03-20

基于随机森林分类的微博机器用户识别研究

刘勘¹,袁蕴英¹,刘萍²

1. 中南财经政法大学信息与安全工程学院, 武汉 430074; 2. 武汉大学信息管理学院, 武汉 430072;

Abstract

Abstract: Bot-users spread rumors or fake information widely, misleading the public opinion, seriously affecting the normal network environment. Taking Weibo bot-users as main focus, considering their high-level automation, strong disguise power and targeted ability to release, a four-dimensional characteristic index of information entropy, content repetition rate, reputation, mutural, mention ratio, comment ratio, message and numofplatform is proposed to construct a feature vector and an identification model based on random forest algorithm is designed to recognize the bot-users. Finally, the Sina Weibo set are used to verify the efficiency and effectiveness of the model, with the accuracy of 96.7%. The result shows that the model is good at distinguishing the bot-users from ordinary users.

Key words: bot-users, Weibo, random forest

摘要： 针对网络上机器用户大量散布谣言, 发布虚假信息, 误导网民舆论, 严重影响网络环境的问题, 以微博中的机器用户为研究对象, 结合其自动化程度高、伪装能力强、信息发布有针对性的特点, 从行为模式、微博内容、用户关系和发布平台4个维度分析机器用户的特征指标, 利用信息熵、内容重复率等8个指标构建微博用户的特征向量, 通过随机森林算法设计微博中机器用户的识别模型。最后, 在真实的新浪微博数据集上进行验证, 结果表明本模型识别机器用户的准确度达到96.7%, 可以有效地区分微博中的机器用户和普通用户。

关键词: 机器用户, 微博, 随机森林

CLC Number:

TP391

LIU Kan,YUAN Yunying,LIU Ping. A Weibo Bot-users Indentification Model Based on Random Forest[J]. Acta Scientiarum Naturalium Universitatis Pekinensis.

刘勘,袁蕴英,刘萍. 基于随机森林分类的微博机器用户识别研究[J]. 北京大学学报（自然科学版）.

Add to citation manager EndNote|Ris|BibTeX

URL: https://xbna.pku.edu.cn/EN/

https://xbna.pku.edu.cn/EN/Y2015/V51/I2/289

[1]	SHI Jieyu, WU Xiuqin, DONG guihua. Spatial Pattern and Influencing Factors of Non-grain Cultivated Land in the Three River Basin (Yunnan Section) [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(5): 893-904.
[2]	CHEN Zhi, HUANG Ying, DING Jinshan, SHI Zhe, QIU Guoyu, YAN Chunhua. Simulation of Urban Evapotranspiration Considering Vegetation Coverage [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(6): 1130-1140.
[3]	WANG Xinlu, HUANG Ran, ZHANG Wenxian, LÜ Baolei, DU Yunsong, ZHANG Wei, LI Bolan, HU Yongtao. Forecasting Ozone and PM_2.5 Pollution Potentials Using Machine Learning Algorithms: A Case Study in Chengdu [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(5): 938-950.
[4]	CHENG Junyi, ZHANG Xianfeng, SUN Min, LUO Peng, YANG Wanting. Random Forest Model for the Estimation of Fractional Vegetation Coverage Based on a UAV-Ground Co-Sampling Strategy [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 143-154.
[5]	HOU Yubo, GE Xiaoyu. Can Social Media Improve Users’ Social Self-Efficacy? [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(5): 968-976.
[6]	REN Tingyu, LIANG Zhongyao, CHEN Huili, LIU Yong. Clustering of Lake Variables Based on Pattern Recognition Method [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(2): 335-341.
[7]	ZHOU Jianing, ZHANG Jie, LI Tianhong. Bashang Forest Change Monitoring with Multi-Temporal MODIS Images and Random Forest Algorithm [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2018, 54(4): 792-800.
[8]	LIU Siye, TIAN Yuan, FENG Yuning, ZHUANG Yulong. Comparison of Tourist Thematic Sentiment Analysis Methods Based on Weibo Data [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2018, 54(4): 687-692.
[9]	SHI Zhongkui, LI Peijun, LUO Lun, YANG Ke. A Method for Extraction of Newly-Built Buildings in Road Region Using Morphological Attribute Profiles and One-Class Random Forest [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2018, 54(1): 105-114.
[10]	ZHOU Changling, CHEN Kai, GONG Xuxiao, CHEN Ping, MA Hao. Detection of Fast-Flux Domains Based on Passive DNS Analysis [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(3): 396-402.

A Weibo Bot-users Indentification Model Based on Random Forest

基于随机森林分类的微博机器用户识别研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 10

Recommended Articles

Metrics