融合CINO-LoRA和Self-condition的DiffuSum藏文文本自动摘要

doi:10.13209/j.0479-8023.2025.097

北京大学学报（自然科学版） ›› 2026, Vol. 62 ›› Issue (2): 266-274.DOI: 10.13209/j.0479-8023.2025.097

融合CINO-LoRA和Self-condition的DiffuSum藏文文本自动摘要

王蓉^1,2, 才智杰^1,2,†

1. 青海师范大学计算机学院, 西宁 810016 2. 藏语智能全国重点实验室, 西宁 810008

收稿日期:2025-02-18 修回日期:2025-09-06 出版日期:2026-03-20 发布日期:2026-03-20
基金资助:
国家自然科学基金(616966031, 6186646462)资助

Automatic Summarization of Tibetan Texts Using DiffuSum with CINO-LoRA and Self-condition Integration

WANG Rong^1,2, CAI Zhijie^1,2,†

1. College of Computer Science and Technology, Qinghai Normal University, Xining 810016 2. The State Key Laboratory of Tibetan Intelligence, Xining 810008

Received:2025-02-18 Revised:2025-09-06 Online:2026-03-20 Published:2026-03-20

摘要/Abstract

摘要：

为进一步提升藏文文本自动摘要的性能, 针对DiffuSum模型在藏文摘要任务中因句子表征能力不足、参数规模过大导致的上下文建模受限以及训练成本高等问题, 提出一种融合CINO-LoRA与自调节(Self-condition)的藏文文本自动摘要模型TiDiffuSum。该模型在句子编码器中引入CINO-LoRA机制, 以增强藏文语义表征并显著减少训练参数量; 在扩散生成模块中集成Self-condition策略, 加强对上下文语义的理解与利用。实验结果表明, TiDiffuSum在藏文摘要数据集TSUM上能够将参数量有效压缩至基线模型的0.45%, 且ROUGE-1, ROUGE-2和ROUGE-L指标分别提升1.07, 0.78和1.08, 显著优于基线模型。

关键词: 藏文, 文本自动摘要, DiffuSum模型, 句子表征

Abstract:

To further improve the performance of Tibetan text automatic summarization, the paper proposes a Tibetan text summarization model TiDiffuSum, which integrates CINO-LoRA and Self-condition into the DiffuSum to address issues of insufficient sentence representation, large parameter scale limiting contextual modeling, and high training costs in the Tibetan task. TiDiffuSum model introduces CINO-LoRA mechanism into the sentence encoder to enhance Tibetan semantic representation and significantly reduce the number of training parameters. Additionally, it incorporates Self-condition strategy in the diffusion generation module to strengthen the comprehension and utilization of contextual semantics. Experimental results indicate that TiDiffuSum can effectively reduce the parameter count to 0.45% of the baseline model on the Tibetan summarization dataset (TSUM), and achieves improvements of 1.07, 0.78, and 1.08 in ROUGE-1, ROUGE-2, and ROUGE-L scores, significantly outperforming baseline models.

Key words: Tibetan, text automatic summarization, Diffusum model, sentence representation

王蓉, 才智杰. 融合CINO-LoRA和Self-condition的DiffuSum藏文文本自动摘要[J]. 北京大学学报（自然科学版）, 2026, 62(2): 266-274.

WANG Rong, CAI Zhijie. Automatic Summarization of Tibetan Texts Using DiffuSum with CINO-LoRA and Self-condition Integration[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2026, 62(2): 266-274.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://xbna.pku.edu.cn/CN/10.13209/j.0479-8023.2025.097

https://xbna.pku.edu.cn/CN/Y2026/V62/I2/266

[1]	周青, 拥措, 拉毛东只, 尼玛扎西. 基于跨度表示的藏医药文献实体关系抽取[J]. 北京大学学报（自然科学版）, 2025, 61(5): 860-868.
[2]	班玛宝, 罗鹏, 头旦才让, 尼玛扎西, 才让加, 于永斌. 基于图解析的端到端片段藏文语义角色标注方法[J]. 北京大学学报自然科学版, 2025, 61(3): 440-450.
[3]	庞仙, 陈波, 赵小兵. 面向新闻文本的汉藏新词抽取及分析[J]. 北京大学学报自然科学版, 2025, 61(1): 45-52.
[4]	班玛宝, 才让加, 张瑞, 色差甲, 卓玛扎西. 融合双通道音节特征的藏文La格例句自动分类模型[J]. 北京大学学报自然科学版, 2022, 58(1): 91-98.
[5]	完么扎西, 尼玛扎西. 藏文的信息熵与输入法键盘设计[J]. 北京大学学报自然科学版, 2017, 53(3): 405-411.
[6]	珠杰,李天瑞,刘胜久. 藏文文本自动校对方法及系统设计[J]. 北京大学学报（自然科学版）, 2014, 50(1): 142-148.
[7]	孙萌,华却才让,刘凯,吕雅娟,刘群. 藏文数词识别与翻译[J]. 北京大学学报（自然科学版）, 2013, 49(1): 75-80.
[8]	珠杰,李天瑞,格桑多吉,仁青诺布,乔少杰. 藏文音节规则模型及应用[J]. 北京大学学报（自然科学版）, 2013, 49(1): 68-74.

融合CINO-LoRA和Self-condition的DiffuSum藏文文本自动摘要

Automatic Summarization of Tibetan Texts Using DiffuSum with CINO-LoRA and Self-condition Integration

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics

留言