Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2024, Vol. 60 ›› Issue (1): 62-70.DOI: 10.13209/j.0479-8023.2023.073

Previous Articles     Next Articles

Interpretable Biomedical Reasoning via Deep Fusion of Knowledge Graph and Pre-trained Language Models

XU Yinxin1, YANG Zongbao1, LIN Yuchen1, HU Jinlong1,2, DONG Shoubin1,2,†   

  1. 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641 2. Zhongshan Institute of Modern Industrial Technology of SCUT, Zhongshan 528437
  • Received:2023-05-10 Revised:2023-07-31 Online:2024-01-20 Published:2024-01-20
  • Contact: DONG Shoubin, E-mail: sbdong(at)scut.edu.cn

基于知识图谱和预训练语言模型深度融合的可解释生物医学推理

徐寅鑫1, 杨宗保1, 林宇晨1, 胡金龙1,2, 董守斌1,2,†   

  1. 1. 华南理工大学计算机科学与工程学院, 广州 510641 2. 中山市华南理工大学现代产业技术研究院, 中山 528437
  • 通讯作者: 董守斌, E-mail: sbdong(at)scut.edu.cn
  • 基金资助:
    中山市引进高端科研机构创新专项资金(2019AG031)资助

Abstract:

Joint inference based on pre-trained language model (LM) and knowledge graph (KG) has not achieved better results in the biomedical domain due to its diverse terminology representation, semantic ambiguity and the presence of large amount of noise in the knowledge graph. This paper proposes an interpretable inference method DF-GNN for biomedical field, which unifies the entity representation of text and knowledge graph, denoises the subgraph constructed by a large biomedical knowledge base, and further improves the information interaction mode of text and subgraph entities by increasing the direct interaction between corresponding text and subgraph nodes, so that the information of the two modes can be deeply integrated. At the same time, the path information of the knowledge graph is used to provide interpretability for the model reasoning process. The test results on the public dataset MedQA-USMLE and MedMCQA show that DF-GNN can more reliably leverage structured knowledge for reasoning and provide explanatory properties than existing biomedical domain joint inference models.

Key words: biomedical domain, pre-trained language model, knowledge graph, joint reasoning

摘要:

基于预训练语言模型(LM)和知识图谱(KG)的联合推理在应用于生物医学领域时, 因其专业术语表示方式多样、语义歧义以及知识图谱存在大量噪声等问题, 联合推理模型并未取得较好的效果。基于此, 提出一种面向生物医学领域的可解释推理方法DF-GNN。该方法统一了文本和知识图谱的实体表示方式, 利用大型生物医学知识库构造子图并进行去噪, 改进文本和子图实体的信息交互方式, 增加对应文本和子图节点的直接交互, 使得两个模态的信息能够深度融合。同时, 利用知识图谱的路径信息对模型推理过程提供了可解释性。在公开数据集MedQA-USMLE和MedMCQA上的测试结果表明, 与现有的生物医学领域联合推理模型相比, DF-GNN可以更可靠地利用结构化知识进行推理并提供解释性。

关键词: 生物医学, 预训练语言模型, 知识图谱, 联合推理