Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2019, Vol. 55 ›› Issue (1): 55-64.DOI: 10.13209/j.0479-8023.2018.068

Previous Articles     Next Articles

Semantic Search on Non-Factoid Questions for Domain-Specific Question Answering Systems

QIU Yu1,2,3, CHENG Li1,2,3,†, Daniyal Alghazzawi4   

  1. 1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011
    2. University of Chinese Academy of Sciences, Beijing 100049
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011
    4. Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21493
  • Received:2018-06-29 Revised:2018-08-10 Online:2019-01-20 Published:2019-01-20
  • Contact: CHENG Li, E-mail: chengli(at)ms.xjb.ac.cn

特定领域问答系统中基于语义检索的非事实型问题研究

仇瑜1,2,3, 程力1,2,3,†, Daniyal Alghazzawi4   

  1. 1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011
    2. 中国科学院大学, 北京 100049
    3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
    4. 阿卜杜勒阿齐兹国王大学计算机和信息技术学院, 吉达 21493
  • 通讯作者: 程力, E-mail: chengli(at)ms.xjb.ac.cn
  • 基金资助:
    中国科学院“西部之光”人才培养计划基金(2017-XBZG-BR-001)、国家“千人计划”项目(Y32H251201)和中国科学院新疆理化技术研究所所长基金(2015RC007)资助

Abstract:

A semantic-based retrieval method was proposed to extract answer sentences from tax regulations and cases. Firstly, a domain knowledge base was employed to generate semantic annotations for questions, regulations and cases. Secondly, a filtering system was developed for the removal of irrelevant cases from answer candidates. In addition, a semantic similarity measurement method was employed for answer extraction. Finally, a rank model was proposed for the optimization of the retrieved results. In order to validate the proposed method, a series of experiments were performed on real-life dataset. Experiment results show noticeable improvement in accuracy and performance compared to the baseline methods.

Key words: question answering system, non-factoid question, domain knowledge base, semantic search, learning to rank

摘要:

面向财税领域非事实型问题, 提出基于语义检索的方法来抽取答案。首先使用领域知识库对问题及领域文档进行语义标注, 引入语义相似度特征提高法规及案例的检索准确率; 其次使用排序学习算法融合领域文本的多种特征对法规检索结果优化; 最后使用法规特征对案例检索结果进行筛选, 并从相似案例中抽取相应答案。在真实数据集上的测试结果表明, 该方法在准确率和效率上比基准方法有显著提升。

关键词: 问答系统, 非事实型问题, 领域知识库, 语义检索, 排序学习