北京大学学报(自然科学版)

基于Deep Learning的代词指代消解

奚雪峰1,2,周国栋1   

  1. 1. 苏州大学计算机科学与技术学院, 自然语言处理实验室, 苏州 215006; 2. 苏州科技学院计算机科学与工程系, 苏州 215009;
  • 收稿日期:2013-06-24 出版日期:2014-01-20 发布日期:2014-01-20

Pronoun Resolution Based on Deep Learning

XI Xuefeng1,2, ZHOU Guodong1   

  1. 1. Natural Language Processing Laboratory, School of Computer Science and Technology, Soochow University, Suzhou 215006; 2. Department of Computer Science and Engineering, Suzhou University of Science and technology, Suzhou 215009;
  • Received:2013-06-24 Online:2014-01-20 Published:2014-01-20

摘要: 针对指代消解一直是自然语言处理中的核心问题, 提出一种利用DBN (deep belief nets)模型的Deep Learning 学习机制进行基于语义特征的指代消解方法。DBN模型由多层无监督的RBM (restricted Boltzmann machine)网络和一层有监督的BP (back-propagation)网络组成, RBM网络确保特征向量映射达到最优, 最后一层 BP 网络可以对RBM网络的输出特征向量进行分类, 从而训练指代消解分类器。在 ACE04 英文语料及ACE05中文语料上进行测试, 实验结果表明, 增加RBM训练层数可以提高系统性能。此外, 引入对特征集 合的抽象分层因素, 也对系统性能的提升产生积极作用。

关键词: 代词消解, 深度学习, 深层语义特征

Abstract: Because coreference resolution is a fundamental task in natural language process, a coreference resolution system based on Deep Learning model via the deep belief nets (DBN), which is a classifier of a combination of several unsupervised learning networks, named RBM (restricted Boltzmann machine) and a supervised learning network named BP (back-propagation), is proposed to detect and classify the coreference relationships between the anaphor and antecedent. The RBM layers maintain as much information as possible when feature vectors are transferred to next layer. The BP layer is trained to classify the features generated by the last RBM layer. The experiments are conducted on the ACE 2004 English NWIRE corpus and the ACE 2005 Chinese NWIRE corpus. The results show that increasing the number of layers RBM training and joining of abstract layer for feature set are able to improve the performance of coreference resolution system.

Key words: pronoun resolution, Deep Learning, deep semantic feature

中图分类号: