北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 65-74.DOI: 10.13209/j.0479-8023.2016.014

上一篇    下一篇

基于排序方法的汉语句际关系树自动分析

吴云芳1, 万富强1, 徐艺峰1, 吕学强2#br#   

  1. 1. 计算语言学教育部重点实验室, 北京大学, 北京 100871
    2. 网络文化与数字传播北京市重点实验室, 北京信息科技大学, 北京 100192
  • 收稿日期:2015-06-04 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 吴云芳, E-mail: wuyf(at)pku.edu.cn
  • 基金资助:

    国家自然科学基金(61371129)、国家重点基础研究发展计划(2014CB340504)、国家社会科学基金重大项目(12&ZD227)和网络文化与数字传播北京市重点实验室开放课题 (ICDD201302)资助

A New Ranking Method for Chinese Discourse Tree Building

WU Yunfang1, WAN Fuqiang1, XU Yifeng1, LÜ Xueqiang2   

  1. 1. Key Laboratory of Computational Linguistics (MOE), Peking University, Beijing 100871
    2. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100192
  • Received:2015-06-04 Online:2016-01-20 Published:2016-01-20
  • Contact: WU Yunfang, E-mail: wuyf(at)pku.edu.cn

摘要:

提出一种自动分析汉语小句级句际关系树的新方法。在修辞结构理论体系下, 构建一个汉语句际关系标注语料库。不同于传统的只关心相邻两个单元的方法, 提出一种类排序模型(SVM-R), 自动构建汉语句际关系的树结构, 旨在把握相邻3 个单元之间的关联强度。实验结果表明, 所提出的SVM-R模型对句际关系树的分析显著优于传统方法。最后提出并验证了丰富的、适合于汉语句际关系分析的语言特征。

关键词: 句际关系树构建, 排序方法, 汉语句际关系语料库

Abstract:
This paper proposes a novel method for sentence-level Chinese discourse tree building. The authors
constrcut a Chinese discourse annotated corpus in the framework of Rhetorical Structure Theory, and propose a
ranking-like SVM (SVM-R) model to automatically build the tree structure, which can capture the relative
associated strength among three consecutive text spans rather than only two adjacent spans. The experimental
results show that proposed SVM-R method significantly outperforms state-of-the-art methods in discourse parsing
accuracy. It is also demonstrated that the useful features for discourse tree building are consistent with Chinese
language characteristics.

Key words: discourse tree building, ranking method, Chinese discourse annotated corpus

中图分类号: