An Individual-Group-Merchant Relation Model for Identifying Online Fake Reviews

doi:10.13209/j.0479-8023.2017.033

Abstract

Abstract:

A novel individual-group-merchant relation model is proposed to automatically identify fake reviews on E-commerce platforms, which focuses on the characteristics of fake reviewers’ behaviors instead of review contents. Three sets of indicators are proposed, i.e. individual indicators, group indicators and merchants’ indicators. To validate the model, an empirical study of fake review identification from a Chinese E-commerce platform is implemented. A number of 97804 reviews posted from 9558 different IP addresses, which are related to 93 online stores, are selected as test data. Results show that the F1-measure values of the proposed model on identifying fake reviewers, online merchants and groups with credit manipulation are 82.62%, 59.26% and 95.12%, respectively. Utilizing logistic regression and K nearest neighbor classifier based on the comments of the content as the baseline methods, the F1-measure values are 52.63% and 76.75%, respectively. Thus, the IGMRM model outperforms traditional methods in identifying fake reviewers.

Key words: credit manipulation, fake review identification, user behavior modeling, IGMRM

摘要：

从评论利益相关者内容与行为特征相结合的角度, 提出一种基于个人-群体-商户的主体关系模型(IGMRM)。选择93家店铺中9558个不同IP的97804条评论作为样本数据进行实验, 结果表明, IGMRM在识别虚假评论者、存在信用操纵的商铺以及虚假评论者群体的 F1 值分别达到 82.62%、59.26%和95.12%。使用基于评论内容的逻辑回归模型和 K 最邻近模型作为基线分类方法, 识别虚假评论者的 F1 值分别为52.63%和76.75%, 表明IGMRM在识别虚假评论者方面优于传统方法。

关键词: 信用操纵, 虚假评论识别, 行为建模, IGMRM

CLC Number:

TP391

Chuanming YU, Bolin FENG, Yuheng ZUO, Baiyun CHEN, Lu AN. An Individual-Group-Merchant Relation Model for Identifying Online Fake Reviews[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 262-272.

余传明, 冯博琳, 左宇恒, 陈百云, 安璐. 基于个人-群体-商户关系模型的虚假评论识别研究[J]. 北京大学学报自然科学版, 2017, 53(2): 262-272.

Add to citation manager EndNote|Ris|BibTeX

URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2017.033

https://xbna.pku.edu.cn/EN/Y2017/V53/I2/262

Figures/Tables 13

References 20

[1]	陈明亮, 章晶晶. 网络口碑再传播意愿影响因素的实证研究. 浙江大学学报: 人文社会科学版, 2008, 38(5): 127-135
[2]	Mukherjee A, Liu B, Glance N.Spotting fake reviewer groups in consumer reviews // The 21st International Conference on World Wide Web. New York: ACM, 2012: 191-200
[3]	Jindal N, Liu B. Analyzing and detecting review spam // The 7th International Conference on Data Mining Proceedings. Washington, DC: IEEE Computer So-ciety, 2007: 547-552
[4]	Jindal N, Liu B.Opinion spam and analysis // International Conference on Web Search and Data Mining Proceedings. New York: ACM, 2008: 219-230
[5]	Jindal N, Liu B.Review spam detection // The 16th International Conference on World Wide Web Proceedings. New York: ACM, 2007: 1189-1190
[6]	Xu C, Zhang J, Chang K Y, et al.Uncovering Collusive spammers in Chinese review websites // ACM International Conference on Information & Knowledge Management Proceedings. Burlingame, 2013: 979-988
[7]	Wang G, Xie SH, Liu B, et al.Review graph based online store review spammer detection // IEEE International Conference on Data Mining. Vancouver, 2011: 1242-1247
[8]	宋海霞, 严馨, 余正涛, 等. 基于自适应聚类的虚假评论检测. 南京大学学报: 自然科学, 2013, 49(4): 433-438
[9]	聂卉, 王佳佳. 产品评论垃圾识别研究综述. 现代图书情报技术, 2014(2): 63-71
[10]	Ott M, Choi Y J, Cardie C, et al.Finding deceptive opinion spam by any stretch of the imagination // The 49th Meeting of the Association for Computational Linguistics: Human Language Technologies. Strouds-burg, PA: Association for Computational Linguistics, 2011: 309-319
[11]	Mukherjee A, Venkataraman V.What Yelp fake review filter might be doing // The 7th International Conference on Weblogs and Social Media. Palo Alto: AAAI Press, 2013: 409-418
[12]	Li H, Chen Z Y, Liu B, Wei X K, et al.Spotting fake reviews via collective positive-unlabeled learning. International Conference on Data Mining Proceedings, 2014, 18(3): 899-904
[13]	Mukherjee A, Kumar A, Liu B, et al.Castellanos M and Ghost R. Spotting opinion spammers using behavioral footprints // The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 632-640
[14]	Akoglu L, Chandy R, Faloutsos C.Opinion fraud detection in online reviews by network effects // International AAAI Conference on Weblogs and Social Media. Cambridge, 2013: 1-10
[15]	邱云飞, 王建坤, 邵良杉, 等. 基于用户行为的产品垃圾评论者检测研究. 计算机工程, 2012, 38(11): 254-257
[16]	孙升芸, 田萱, 何军. 基于评论行为的商品垃圾评论的识别研究. 计算机工程与设计, 2012, 33(11): 4314-4319
[17]	李霄, 丁晟春. 垃圾商品评论信息的识别研究. 现代图书情报技术, 2013(1): 63-68
[18]	邓莎莎, 张朋柱, 张晓燕, 等. 基于欺骗语言线索的虚假评论识别. 系统管理学报, 2014(2): 263-270
[19]	孟美任, 丁晟春. 虚假商品评论信息发布者行为动机分析. 情报科学, 2013, 31(10): 100-104
[20]	陈燕方, 娄策群. 在线商品虚假评论形成路径研究. 现代情报, 2015, 35(1): 49-53

指标	估值	标准误差	z值	Pr(>\|z\|)
Intercept	-3.5174	0.5886	-5.976	2.29×10^-9***
RUR	-5.4879	1.9562	-2.805	5.03×10^-3**
URW	2.5755	1.2074	2.133	3.291×10^-2*
RR	50.6619	11.9434	4.424	2.22×10^-5***
RCS	2.3308	1.0796	2.159	3.085×10^-2*
RTW	3.5696	0.8075	4.421	9.85×10^-6***

指标	估值	标准误差	z值	Pr(>\|z\|)
Intercept	-3.5174	0.5886	-5.976	2.29×10^-9***
RUR	-5.4879	1.9562	-2.805	5.03×10^-3**
URW	2.5755	1.2074	2.133	3.291×10^-2*
RR	50.6619	11.9434	4.424	2.22×10^-5***
RCS	2.3308	1.0796	2.159	3.085×10^-2*
RTW	3.5696	0.8075	4.421	9.85×10^-6***

象	k	P	R	F1
个人	90	0.7992	0.7543	0.7737
	92	0.8044	0.7719	0.7863
	4	0.8103	0.7894	0.7989
	6	0.8168	0.8070	0.8116
	8	0.8245	0.8245	0.8245
	00	0.8152	0.8245	0.8195
	02	0.8052	0.8245	0.8137
	04	0.8165	0.8421	0.8262
商家	30	0.5471	0.4909	0.5094
	32	0.5690	0.5272	0.5424
	34	0.5907	0.5636	0.5744
	36	0.5774	0.5636	0.5698
	38	0.5636	0.5636	0.5636
	40	0.5866	0.6	0.5926
	42	0.5721	0.6	0.5833
	44	0.5553	0.6	0.5717
群组	66	0.9786	0.8292	0.8874
	68	0.9630	0.8292	0.8863
	70	0.9637	0.8536	0.9010
	72	0.9644	0.8780	0.9154
	74	0.9654	0.9024	0.9298
	76	0.9668	0.9268	0.9441
	78	0.9505	0.9268	0.9385
	80	0.9512	0.9512	0.9512

象	k	P	R	F1
个人	90	0.7992	0.7543	0.7737
	92	0.8044	0.7719	0.7863
	4	0.8103	0.7894	0.7989
	6	0.8168	0.8070	0.8116
	8	0.8245	0.8245	0.8245
	00	0.8152	0.8245	0.8195
	02	0.8052	0.8245	0.8137
	04	0.8165	0.8421	0.8262
商家	30	0.5471	0.4909	0.5094
	32	0.5690	0.5272	0.5424
	34	0.5907	0.5636	0.5744
	36	0.5774	0.5636	0.5698
	38	0.5636	0.5636	0.5636
	40	0.5866	0.6	0.5926
	42	0.5721	0.6	0.5833
	44	0.5553	0.6	0.5717
群组	66	0.9786	0.8292	0.8874
	68	0.9630	0.8292	0.8863
	70	0.9637	0.8536	0.9010
	72	0.9644	0.8780	0.9154
	74	0.9654	0.9024	0.9298
	76	0.9668	0.9268	0.9441
	78	0.9505	0.9268	0.9385
	80	0.9512	0.9512	0.9512

商家编号	f(m)	排名
12**73	7.461918×10^-1	1
7**09	5.370122×10^-1	2
14**68	3.934663×10^-1	3
10**30	2.677272×10^-5	4
7**72	1.714189×10^-5	5
7**96	1.453285×10^-5	6
12**22	1.339835×10^-5	7
12**29	1.184025×10^-5	8
12**76	1.018929×10^-5	9
10**92	8.861665×10^-6	10