Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2017, Vol. 53 ›› Issue (2): 255-261.DOI: 10.13209/j.0479-8023.2017.032

• Orginal Article • Previous Articles     Next Articles

A Sentence Segmentation Method for Ancient Chinese Texts Based on Recurrent Neural Network

Boli WANG1, Xiaodong SHI1,2,3,(), Jinsong SU4   

  1. 1. Department of Cognitive Science, Xiamen University, Xiamen 361005
    2. Collaborative Innovation Center for Peaceful Development of Cross-Strait Relations, Xiamen University, Xiamen 361005
    3. Fujian Province Key Laboratory for Brain-inspired Computing, Xiamen 361005
    4. Software School, Xiamen University, Xiamen 361005
  • Received:2016-07-29 Revised:2016-10-07 Online:2017-03-20 Published:2017-03-20
  • Contact: Xiaodong SHI

一种基于循环神经网络的古文断句方法

王博立1, 史晓东1,2,3,(), 苏劲松4   

  1. 1. 厦门大学智能科学与技术系, 厦门 361005
    2. 厦门大学两岸关系和平发展协同创新中心, 厦门 361005
    3. 福建省类脑计算技术及应用重点实验室, 厦门 361005
    4. 厦门大学软件学院, 厦门 361005
  • 通讯作者: 史晓东
  • 基金资助:
    教育部专项“简繁汉字智能转换系统”、国家科技支撑计划项目(2012BAH14F03)、教育部博士点基金(20130121110040)、国家自然科学基金(61573294)和CCF中文信息技术开放课题(CCF2015-01-01)资助

Abstract:

This paper proposes an automatic sentence segmentation method for ancient Chinese texts based on recurrent neural network (RNN). A bi-directional RNN structure with gated recurrent units (GRU) is implemented, and state transition probability and length penalty are employed in decoding to improve the accuracy. Experimental results show that proposed model achieves higher F1 score than traditional methods.

Key words: ancient Chinese, sentence segmentation, recurrent neural network

摘要:

提出一种基于循环神经网络的古文自动断句方法。该方法采用基于GRU (gated recurrent unit)的双向循环神经网络进行古文断句。在解码过程中, 该算法不仅利用神经网络输出的概率分布, 还进一步引入状态转移概率和长度惩罚, 以便提高断句准确率。在大规模古籍语料上的实验结果表明, 所提方法能够取得比传统方法更高的断句F1值。

关键词: 古汉语, 断句, 循环神经网络

CLC Number: