Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2016, Vol. 52 ›› Issue (1): 104-112.DOI: 10.13209/j.0479-8023.2016.018
Previous Articles Next Articles
FU Jiabing, DONG Shoubin#br#
Received:
Online:
Published:
Contact:
付佳兵, 董守斌
通讯作者:
基金资助:
Abstract:
Current studies merely focus on a story chain’s similarity of topic relationship and importance of documents, whilst almost ignoring its logical coherency and explainability. Along with algorithm complexity brought about by exponential growth in sets of news data, a story chain from word coverage perspective is constructed, taking advantage of the story comments to position the turning point of each event. The ideas of similarity of topic relationship and sparsity differences as well as RPCA approach are used to conduct logical modeling for the documents. Random walk and graph traversals are adopted to quantify and construct an explainable and logically coherent story chain. The double-blind experiment reveals that proposed method outperforms other algorithms.
Key words: story chain, word coverage, explainable, RPCA, random walk
摘要:
针对目前构建新闻脉络链只关注新闻脉络链的主题相似性和文档重要性, 而忽略新闻脉络链逻辑连贯性和可解释性的不足, 以及新闻数据集合指数级增长带来的算法复杂度问题, 从词覆盖的角度提出一种新闻脉络链构建方法, 利用新闻的评论信息来定位新闻事件转折点, 用主题相似与稀疏差异的思想以及RPCA 方法对文档进行逻辑建模, 利用随机游走以及图遍历的方法, 量化并生成可解释且具有很好逻辑连贯性的脉络链。双盲实验表明, 与其他算法相比, 该方法取得较好的效果。
关键词: 新闻脉络, 词覆盖, 可解释, 健壮主成分分析, 随机游走
CLC Number:
TP391
FU Jiabing, DONG Shoubin. Constructing a News Story Chain from Word Coverage Perspective[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 104-112.
付佳兵, 董守斌. 一种基于词覆盖的新闻事件脉络链构建方法[J]. 北京大学学报(自然科学版), 2016, 52(1): 104-112.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2016.018
https://xbna.pku.edu.cn/EN/Y2016/V52/I1/104