北京大学学报自然科学版 ›› 2017, Vol. 53 ›› Issue (3): 412-420.DOI: 10.13209/j.0479-8023.2017.005

上一篇    下一篇

融合多模型与高置信度词典的事件线索检测

陈亚东, 洪宇, 王潇斌, 杨雪蓉, 姚建民, 朱巧明   

  1. 苏州大学江苏省计算机信息处理重点实验室, 苏州 215006
  • 收稿日期:2015-11-26 修回日期:2016-03-23 出版日期:2017-05-20 发布日期:2017-05-20
  • 基金资助:
    国家自然科学基金(61373097, 61272259, 61272260)资助

Combining Multiple Models and High-Confidence Dictionary
for Event Nugget Detection

Yadong CHEN, Yu HONG, Xiaobin WANG, Xuerong YANG, Jianmin YAO, Qiaoming ZHU   

  1. Provincial Key Laboratory of Computer Information Processing Technology, Soochow University, Suzhou 215006
  • Received:2015-11-26 Revised:2016-03-23 Online:2017-05-20 Published:2017-05-20

摘要:

提出一种融合多模型和高置信度词典的事件线索识别方法, 将高置信度词典特征分别加入最大熵模型和条件随机场模型, 然后融合两个模型的结果, 旨在提高触发词识别的召回率和整体性能。针对事件真伪性识别任务, 进一步考察否定词或不确定词与触发词的物理位置距离和依存路径距离等特征, 提高事件真伪性识别的性能。实验结果显示, 针对触发词识别和事件真伪性识别任务, 与仅使用最大熵模型相比, 所提出的融合多模型与高置信度词典的方法能够提高触发词识别的性能6.43%, 提高事件真伪性识别的性能1.69%。

关键词: 事件线索检测, 最大熵模型, 条件随机模型, 高置信度词典

Abstract:

This paper proposes a method that combines multiple models and high-confidence dictionary for event nugget detection. This method introduces dictionary features into maximum entropy model and conditional random fields model respectively, then combines the results of two models. In addition, the lexical length and the length of the dependency path between the trigger and negation or speculation in event realis recognition are considered to improve the accuracy of event realis detection. Compared to the method based on maximum entropy model, the experiment results show that proposed method can get 6.43% gain of F1 in event nugget recognition and 1.69% gain of F1 in event realis recognition.

Key words: event nugget detection, Maximum Entropy, Conditional Random Fields, high-confidence dictionary