Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2025, Vol. 61 ›› Issue (5): 860-868.DOI: 10.13209/j.0479-8023.2024.125

Previous Articles     Next Articles

Entity Relation Extraction Based on Span Representation for Tibetan Medicine Literature

ZHOU Qing1,2,3, YONG Tso1,2,3,†, LAMAO Dongzhi1,2,3, NYIMA Trashi1,2,3   

  1. 1. School of Information Science and Technology, Tibet University, Lhasa 850000 2. Key Laboratory of Tibetan Information Technology and Artificial Intelligence of Tibet, Lhasa 850000 3. Engineering Research Center of Tibetan Information Technology, Ministry of Education, Lhasa 850000
  • Received:2024-07-13 Revised:2024-11-21 Online:2025-09-20 Published:2025-09-20
  • Contact: YONG Tso, E-mail: yongtso(at)163.com

基于跨度表示的藏医药文献实体关系抽取

周青1,2,3, 拥措1,2,3,†, 拉毛东只1,2,3, 尼玛扎西1,2,3   

  1. 1. 西藏大学信息科学技术学院, 拉萨 850000 2. 西藏自治区藏文信息技术人工智能重点实验室, 拉萨 850000 3. 藏文信息技术教育部工程研究中心, 拉萨 850000
  • 通讯作者: 拥措, E-mail: yongtso(at)163.com
  • 基金资助:
    国家自然科学基金(62566060)、科技创新 2030 “新一代人工智能”重大项目(2022ZD0116100)、西藏自治区科技厅项目(XZ202401JD0010)和拉萨市科技重点计划专项(LAKJ202526)资助 

Abstract:

Due to the particularity of Tibetan medical terminology, the scarcity of text resources, and the complexity of language processing, traditional entity relation extraction methods are difficult to directly apply in the field of Tibetan medicine. This paper proposes a method for extracting entity relation from Tibetan medical literature based on span representation. By using span representation and TibetanAI_ALBERT_v2.0 pre-trained language model for encoding, potential candidate entities are enumerated to solve the problem of insufficient recognition of entity nesting. At the same time, KL divergence is introduced to constrain the model inconsistency during the training and inference stages. The experimental results on the entity relation extraction dataset TibetanAI_TMDisRE_v1.0 in the field of Tibetan medicine show that the proposed method achieves significant performance improvement, with accuracy, recall, and F1 values reaching 84.85%, 77.35%, and 80.81%, respectively.

Key words: Tibetan, Tibetan medicine, entity relation extraction, joint extraction

摘要:

针对由于藏医药术语的特殊性、文本资源的稀缺以及语言处理的复杂性, 传统的实体关系抽取方法难以直接应用于藏医药领域的问题, 提出一种基于跨度表示的藏医药文献实体关系抽取方法, 该方法使用跨度表示和TibetanAI_ALBERT_v2.0预训练语言模型进行编码, 通过枚举潜在候选的实体, 解决实体嵌套不能充分识别的问题。同时, 引入KL散度来约束模型在训练和推理阶段不一致的问题。在藏医药领域实体关系抽取数据集TibetanAI_TMDisRE_v1.0上的实验结果表明, 该方法取得显著的性能提升, 精确率、召回率和F1值分别达到84.85%, 77.35%和80.81%。

关键词: 藏文, 藏医药, 实体关系抽取, 联合抽取