北京大学学报(自然科学版)

中文新闻关键事件的主题句识别

王伟1,3,赵东岩1,2,赵伟1   

  1. 1. 北京大学计算科学与技术研究所, 北京 100871; 2. 计算语言学教育部重点实验室, 北京 100871; 3. 武警工程学院电子技术系, 西安 710086;
  • 收稿日期:2010-09-10 出版日期:2011-09-20 发布日期:2011-09-20

Identification of Topic Sentence about Key Event in Chinese News

WANG Wei1,3, ZHAO Dongyan1,2, ZHAO Wei1   

  1. 1. Institute of Computer Science and Technology, Peking University, Beijing 100871; 2. Key Laboratory of Computational LinguisticsMOE, Peking University, Beijing 100871; 3. Department of Electronic Technology, Engineering College of Armed Police Force, Xi’an 710086;
  • Received:2010-09-10 Online:2011-09-20 Published:2011-09-20

摘要: 提出在单文档中通过提取主题句以获取关键事件信息的思想。根据新闻的体裁特点, 分析了新闻报道与事件的关系, 以及新闻标题在内容、形式和语言方面的特征。提出利用标题的提示性信息提取主题句来描述新闻关键事件的方法。该方法首先对新闻标题按信息含量进行分类, 然后结合新闻句子的词频、长度、位置、与标题的相似度等特征计算句子的重要性。实验表明, 该方法能够准确提取新闻主题句, 为进一步抽取事件信息打好了基础。

关键词: 计算机应用, 中文信息处理, 自然语言处理, 自动文摘, 事件抽取, 新闻标题

Abstract: The authors propose an approach to extract topic sentences that describe key event from a news article. Considering the special structure of news articles, the relations between news articles and key events reported in them is studied, as well as the characteristics of a news headline in three aspects: information, form and language. A novel method based on the information aspect of a headline is used to extract a topic sentence which contains the key event information from a news story. The method first classifies a news headline as informative or non-informative, and then considers text and semantic features of a sentence, such as word frequency, sentence length, location in the text and word co-concurrency with the headline, to evaluate the importance for each sentence and select the most important one as the topic sentence. Experiment results show that this method can identify a topic sentence accurately and the proposed approach makes a good preparation for event information extraction.

Key words: computer application, Chinese information processing, natural language processing, automatic text abstract, event extraction, news headline

中图分类号: