北京大学学报(自然科学版)

基于双语合成语义的翻译相似度模型

王超超,熊德意,张民   

  1. 苏州大学计算机科学与技术学院, 苏州 215006;
  • 收稿日期:2014-06-30 出版日期:2015-03-20 发布日期:2015-03-20

Translation Similarity Model Based on Bilingual Compositional Semantics

WANG Chaochao, XIONG Deyi, ZHANG Min   

  1. School of Computer Science and Technology, Soochow University, Suzhou 215006;
  • Received:2014-06-30 Online:2015-03-20 Published:2015-03-20

摘要: 提出基于双语合成语义的翻译相似度模型, 通过在翻译过程中引入双语语义相似度特征提高翻译性能。首先利用分布式方法分别在源端和目标端获取短语的单语合成语义向量, 然后利用神经网络将它们映射到同一语义空间, 获得双语合成语义向量。在该语义空间, 计算源语言短语和对应的目标语言短语之间基于合成语义向量的翻译相似度, 将其作为一个新特征加入解码器。在汉英翻译NIST06和NIST08测试数据集上, 相较于基准系统, 基于双语合成语义的翻译相似度模型获得0.56和0.42 BLEU值的显著性提高。

关键词: 语义合成, 机器翻译, 分布式表示, 神经网络

Abstract: The authors propose a translation similarity model based on bilingual compositional semantics to integrate the bilingual semantic similarity feature into decoding process to improve translation quality. In the proposed model, monolingual compositional vectors for phrases are obtained at the source and target side respectively using distributional approach. These monolingual vectors are then projected onto the same semantic space and therefore transformed into bilingual compositional vectors. Base on this semantic space, translation similarity between source phrases and their corresponding target phrases is calculated. The similarities are integrated into the decoder as a new feature. Experiments on Chinese-to-English NIST06 and NIST08 test sets show that the proposed model significantly outperforms the baseline by 0.56 and 0.42 BLEU points respectively.

Key words: semantic compositionality, machine translation, distributed representations, neural network

中图分类号: