北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 104-112.DOI: 10.13209/j.0479-8023.2016.018

上一篇    下一篇

一种基于词覆盖的新闻事件脉络链构建方法

付佳兵, 董守斌   

  1. 华南理工大学广东省计算机网络重点实验室, 广州 510640
  • 收稿日期:2015-06-19 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 董守斌, E-mail: sbdong(at)scut.edu.cn
  • 基金资助:
    广东省前沿与关键技术创新专项(2014B010112006)和广东省产学研省部合作专项资金(2013B090500087)资助

Constructing a News Story Chain from Word Coverage Perspective

FU Jiabing, DONG Shoubin#br#   

  1. Guangdong key Laboratory of Communications, South China University of Technology, Guangzhou 510640
  • Received:2015-06-19 Online:2016-01-20 Published:2016-01-20
  • Contact: DONG Shoubin, E-mail: sbdong(at)scut.edu.cn

摘要:

针对目前构建新闻脉络链只关注新闻脉络链的主题相似性和文档重要性, 而忽略新闻脉络链逻辑连贯性和可解释性的不足, 以及新闻数据集合指数级增长带来的算法复杂度问题, 从词覆盖的角度提出一种新闻脉络链构建方法, 利用新闻的评论信息来定位新闻事件转折点, 用主题相似与稀疏差异的思想以及RPCA 方法对文档进行逻辑建模, 利用随机游走以及图遍历的方法, 量化并生成可解释且具有很好逻辑连贯性的脉络链。双盲实验表明, 与其他算法相比, 该方法取得较好的效果。

关键词: 新闻脉络, 词覆盖, 可解释, 健壮主成分分析, 随机游走

Abstract:

Current studies merely focus on a story chain’s similarity of topic relationship and importance of documents, whilst almost ignoring its logical coherency and explainability. Along with algorithm complexity brought about by exponential growth in sets of news data, a story chain from word coverage perspective is constructed, taking advantage of the story comments to position the turning point of each event. The ideas of similarity of topic relationship and sparsity differences as well as RPCA approach are used to conduct logical modeling for the documents. Random walk and graph traversals are adopted to quantify and construct an explainable and logically coherent story chain. The double-blind experiment reveals that proposed method outperforms other algorithms.

Key words: story chain, word coverage, explainable, RPCA, random walk

中图分类号: