北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 58-64.DOI: 10.13209/j.0479-8023.2016.005

上一篇    下一篇

基于规则的依存树库错误自动检测与分析

史林林, 邱立坤, 亢世勇   

  1. 鲁东大学文学院, 烟台 264025
  • 收稿日期:2015-06-19 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 邱立坤, E-mail: qiulikun(at)pku.edu.cn
  • 基金资助:
    国家自然科学基金(61572245, 61103089, 61272215)资助

Rule-Based Detection and Analysis of Annotation Errors in Dependency Treebank

SHI Linlin, QIU Likun, KANG Shiyong#br#   

  1. School of Chinese Language and Literature, Ludong University, Yantai 264025
  • Received:2015-06-19 Online:2016-01-20 Published:2016-01-20
  • Contact: QIU Likun, E-mail: qiulikun(at)pku.edu.cn

摘要:

尝试将依存树转化为短语结构树, 并基于规则的方法自动检测出人工标注结果中的错误。将该方法应用于已经过两遍人工校对的北京大学多视图依存树库, 从50275个句法树中发现1529处错误, 正确率为100%。进一步, 所有错误可以分为3个层次: 分词错误、词性与句法角色不符、句法角色错标。该方法可以有效提高依存树库的质量, 并且适用于各类型的依存树库。

关键词: 树库, 词性, 句法角色, 错误检测

Abstract:

The authors try to transform dependency tree into phrase structure tree, and detect annotation errors automatically based on manual rules. The method is used in processing Peking University Multi-view Chinese Treebank (PMT). Although PMT has been manually checked twice before processed by this method, 1529 errors are detected among the 50275 sentences and the precision is 100%. The errors mainly belong to three types: word segmentation error, mismatching between POS and syntactic role, and syntactic role error. This method can further improve treebank quality, and be applied to other dependency treebanks.

Key words: treebank, part of speech, syntactic role, error detection

中图分类号: