Acta Scientiarum Naturalium Universitatis Pekinensis
Previous Articles Next Articles
GAO Enting1, CHAO Jiayuan2, LI Zhenghua2
Received:
Online:
Published:
高恩婷1,巢佳媛2,李正华2
Abstract: The authors propose an annotation conversion method using multiple resources for POS tagging, aiming to convert the source-side annotations into target-side and then combine the data to get larger training data. Two innovate strategies are proposed. The first strategy uses reliability information of guide features. The second strategy uses ambiguous labelings to improve the quality of converted data. Results demonstrate that the first strategy is helpful for annotation conversion while the second does little to conversion.
Key words: annotation conversion, conditional random field, POS tagging
摘要: 利用多资源转化方法进行词性标注研究, 旨在将源端资源的标注进行转化, 以符合目标端标注规范, 进而将转化后的资源与目标资源合并, 增大训练数据规模。做了两方面创新: 在转化过程中, 额外利用指导特征的置信度信息; 在转化后的资源中, 用模糊标注表示方法减少错误标注。实验表明, 利用置信度信息能有效帮助转化, 而模糊标注表示方法的影响不大。
关键词: 词性标注转化, 条件随机场, 词性标注
CLC Number:
TP391
GAO Enting,CHAO Jiayuan,LI Zhenghua. Conversion of Multiple Resources for POS Tagging[J]. Acta Scientiarum Naturalium Universitatis Pekinensis.
高恩婷,巢佳媛,李正华. 面向词性标注的多资源转化研究[J]. 北京大学学报(自然科学版).
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/
https://xbna.pku.edu.cn/EN/Y2015/V51/I2/328