Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2016, Vol. 52 ›› Issue (1): 58-64.DOI: 10.13209/j.0479-8023.2016.005
Previous Articles Next Articles
SHI Linlin, QIU Likun, KANG Shiyong#br#
Received:
Online:
Published:
Contact:
史林林, 邱立坤, 亢世勇
通讯作者:
基金资助:
Abstract:
The authors try to transform dependency tree into phrase structure tree, and detect annotation errors automatically based on manual rules. The method is used in processing Peking University Multi-view Chinese Treebank (PMT). Although PMT has been manually checked twice before processed by this method, 1529 errors are detected among the 50275 sentences and the precision is 100%. The errors mainly belong to three types: word segmentation error, mismatching between POS and syntactic role, and syntactic role error. This method can further improve treebank quality, and be applied to other dependency treebanks.
Key words: treebank, part of speech, syntactic role, error detection
摘要:
尝试将依存树转化为短语结构树, 并基于规则的方法自动检测出人工标注结果中的错误。将该方法应用于已经过两遍人工校对的北京大学多视图依存树库, 从50275个句法树中发现1529处错误, 正确率为100%。进一步, 所有错误可以分为3个层次: 分词错误、词性与句法角色不符、句法角色错标。该方法可以有效提高依存树库的质量, 并且适用于各类型的依存树库。
关键词: 树库, 词性, 句法角色, 错误检测
CLC Number:
TP391
SHI Linlin, QIU Likun, KANG Shiyong. Rule-Based Detection and Analysis of Annotation Errors in Dependency Treebank[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 58-64.
史林林, 邱立坤, 亢世勇. 基于规则的依存树库错误自动检测与分析[J]. 北京大学学报(自然科学版), 2016, 52(1): 58-64.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2016.005
https://xbna.pku.edu.cn/EN/Y2016/V52/I1/58