北京大学学报(自然科学版)

结合语义的位置语言模型

余伟1,王明文1,万剑怡1,左家莉2   

  1. 1. 江西师范大学计算机信息工程学院, 南昌330027; 2. 江西师范大学初等教育学院, 南昌330027;
  • 收稿日期:2012-03-01 出版日期:2013-03-20 发布日期:2013-03-20

Positional Language Models with Semantic Information

YU Wei1, WANG Mingwen1, WAN Jianyi1, ZUO Jiali2   

  1. 1. School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330027; 2. School of Elementary Education,Jiangxi Normal University, Nanchang 330027;
  • Received:2012-03-01 Online:2013-03-20 Published:2013-03-20

摘要: 针对位置语言模型没有考虑词与词之间语义关系的问题, 提出一种结合语义的位置语言模型。首先采用高斯核函数来度量词与词之间的位置关系; 然后提出一种平滑互信息的技术来度量词与词之间的语义关系, 证明了平滑互信息能够有效解决大量词对之间无法通过互信息来计算转移概率的问题; 还证明了位置语言模型是结合语义位置语言模型的一个特例; 最后将结合语义的位置语言模型应用于信息检索, 得到一个基于该模型的检索模型。实验结果表明, 基于该模型的检索模型在性能方面要优于基于位置语言模型的检索模型。

关键词: 位置语言模型, 平滑, 互信息, 信息检索, 语义关系

Abstract: Because positional language models did not consider semantic relationship between the words in different positions, the authors present an effective model named “positional language models with semantic information”. Firstly, the authors use Gaussian kernel function to measure the position relationship between words. Secondly, the authors present a technology which is named “smoothed mutual information” to measure semantic relationship between the words, and also prove that smoothed mutual information can effectively solve the problem that a large number of two words could not calculate the transition probability between them only by mutual information. Then the authors prove that positional language models are a special case of positional language models with semantic information. Finally, applying this new model to the area of information retrieval can obtain a retrieval model based on the new model. The experiment show that the retrieval model based on the new model performs better than a retrieval model based on positional language models for using in information retrieval.

Key words: positional language models, smooth, mutual information, information retrieval, semantic relation

中图分类号: