Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2025, Vol. 61 ›› Issue (3): 440-450.DOI: 10.13209/j.0479-8023.2025.038

Previous Articles     Next Articles

End-to-End Spanning Tibetan Semantic Role Labeling Based on Graph Parsing

BAN Mabao1,2, LUO Peng3,4, Thupten Tsering3,4,5, Nyima Tashi1,2,4,†, CAI Rangjia3,4,†, YU Yongbin1,†   

  1. 1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054 2. School of Information Science and Technology, Tibet University, Lhasa 850000 3. College of Computer Science and Technology, Qinghai Normal University, Xining 810016 4. The State Key Laboratory of Tibetan Intelligence, Qinghai Normal University, Xining 810008 5. YiBin Institute of UESTC, YiBin 644000
  • Received:2024-04-22 Revised:2024-09-02 Online:2025-05-20 Published:2025-05-20
  • Contact: Nyima Tashi, E-mail: niqiongda(at)163.com,CAI Rangjia, E-mail: zwxxzx(at)163.com,YU Yongbin, E-mail: ybyu(at)uestc.edu.cn

基于图解析的端到端片段藏文语义角色标注方法

班玛宝1,2, 罗鹏3,4, 头旦才让3,4,5, 尼玛扎西1,2,4,†, 才让加3,4,†, 于永斌1,†   

  1. 1. 电子科技大学信息与软件工程学院, 成都 610054 2. 西藏大学信息科学技术学院, 拉萨 850000 3. 青海师范大学计算机学院, 西宁 810016 4. 青海师范大学, 藏语智能全国重点实验室, 西宁 810008 5. 宜宾电子科技大学研究院, 宜宾 644000
  • 通讯作者: 尼玛扎西, E-mail: niqiongda(at)163.com,才让加, E-mail: zwxxzx(at)163.com,于永斌, E-mail: ybyu(at)uestc.edu.cn
  • 基金资助:
    四川省自然科学基金青年基金(25QNJJ3501)、藏语智能全国重点实验室开放课题(2024-Z-001)、科技创新2030—“新一代人工智能”重大项目(2022ZD0116100)和国家自然科学基金(62306158)资助

Abstract:

Semantic role labeling, as an essential pathway to semantic understanding, has a wide range of applications in machine translation, information extraction, and question and answer systems. This paper proposes a graph parsing-based end-to-end spanning semantic role labeling method for Tibetan, based on existing Tibetan semantic labeling systems and methods, by referring to the more mature semantic role labeling methods in English and Chinese. The method converts span-based semantic role labeling in Tibetan into a word-based graph parsing task, and the process is divided into two phases: semantic role labeling to graph conversion and graph to semantic role labeling recovery. In the first stage, a Tibetan pre-training language model (TiUniLM) is used for dynamic word embedding, and predicates are automatically specified by introducing the predicate indicator P. Then, temporal features are further modeled by designing a "gating" mechanism long short-term memory network (GM-LSTM). The second stage uses Viterbi constraint decoding to correct the illegitimate graphs. Experiments on TSRLD-Span show that the proposed method can achieve the best F1 value of 89.69% on the test set, which is a significant improvement in performance compared with the baseline model, indicating that the method is effective.

Key words: natural language processing (NLP), graph parsing, span, Tibetan semantic role labeling, predicate indicator

摘要:

语义角色标注作为通往语义理解的重要途径, 在机器翻译、信息抽取和问答系统中具有广泛的应用价值。本文通过借鉴英文和汉文中较为成熟的语义角色标注方法, 在已有藏文语义标注体系和方法的基础上, 提出一种基于图解析的端到端片段(span)藏文语义角色标注方法。该方法将基于片段的藏文语义角色标注转换成基于词的图解析任务, 可分为语义角色标注到图的转换和图至语义角色标注的恢复两个阶段。第一阶段采用藏文预训练语言模型(TiUniLM)进行动态词嵌入, 并通过引入谓词标识器P, 自动指定谓词, 然后通过设计“门控”机制长短时记忆网络(GM-LSTM)对时序特征进一步建模。第二阶段使用Viterbi约束解码, 对不合法的图进行校正。最后, 通过在TSRLD-Span上的实验表明, 该方法在测试集上的最佳F1值可达89.69%, 相比基线模型, 性能具有显著提升, 验证了该方法的有效性。

关键词: 自然语言处理, 图解析, 片段, 藏文语义角色标注, 谓词标识器