Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2017, Vol. 53 ›› Issue (2): 279-286.DOI: 10.13209/j.0479-8023.2017.038
• Orginal Article • Previous Articles Next Articles
Ziyi YANG, Zhengxian GONG, Fang KONG†(), Guodong ZHOU
Received:
2016-07-21
Revised:
2016-10-03
Online:
2017-03-20
Published:
2017-03-20
Contact:
Fang KONG
通讯作者:
孔芳
基金资助:
CLC Number:
Ziyi YANG, Zhengxian GONG, Fang KONG, Guodong ZHOU. Exploit Comparable Corpus to Chinese Zero Pronoun Resolution[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 279-286.
杨紫怡, 贡正仙, 孔芳, 周国栋. 基于中英文可比较语料的中文零指代消解[J]. 北京大学学报自然科学版, 2017, 53(2): 279-286.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2017.038
序号 | 特征 | 说明 |
---|---|---|
F1 | First_Gap | 若Z是句子第一个缺省则T, 否则F |
F2 | Pl_Is_NP | 若Z是句子第一个缺省则NA, 否则如果Pl是一个NP节点则T, 否则F |
F3 | Pr_Is_VP | 若Z是句子第一个缺省则NA, 否则如果Pr是一个VP节点则T, 否则F |
F4 | Pl_Is_NP&Pr_Is_VP | 若Z是句子第一个缺省则NA, 否则若Pl是一个NP节点且Pr是一个VP节点, 则T, 否则F |
F5 | P_Is_VP | 若Z是句子第一个缺省则NA, 否则如果P是VP节点则T, 否则F |
F6 | IP_VP | Wr节点到C的节点路径上, 若存在一个VP节点且它的父节点是一个IP节点则T, 否则F |
F7 | Has_Ancestor_NP | 如果V有一个NP祖先节点则T, 否则F |
F8 | Has_Ancestor_VP | 如果V有一个VP祖先节点则T, 否则F |
F9 | Has_Ancestor_CP | 如果V有一个CP节点则T, 否则F |
F10 | Left_Comma | 如果Z是句子的第一个缺省则NA, 否则如果Wl是一个逗号则T, 否则F |
F11 | LastWord | Z的左边相邻的分词 |
F12 | NetxtWord | Z的右边相邻的分词 |
F13 | LastWordPos | Z的左边相邻的分词词性 |
F14 | NextWordPos | Z的右边相邻的分词词性 |
F15 | LastWordPos_NextWordPos | Z的左边相邻的分词词性+Z的右边相邻的分词词性 |
Table 1 Chinese zero pronoun detection features
序号 | 特征 | 说明 |
---|---|---|
F1 | First_Gap | 若Z是句子第一个缺省则T, 否则F |
F2 | Pl_Is_NP | 若Z是句子第一个缺省则NA, 否则如果Pl是一个NP节点则T, 否则F |
F3 | Pr_Is_VP | 若Z是句子第一个缺省则NA, 否则如果Pr是一个VP节点则T, 否则F |
F4 | Pl_Is_NP&Pr_Is_VP | 若Z是句子第一个缺省则NA, 否则若Pl是一个NP节点且Pr是一个VP节点, 则T, 否则F |
F5 | P_Is_VP | 若Z是句子第一个缺省则NA, 否则如果P是VP节点则T, 否则F |
F6 | IP_VP | Wr节点到C的节点路径上, 若存在一个VP节点且它的父节点是一个IP节点则T, 否则F |
F7 | Has_Ancestor_NP | 如果V有一个NP祖先节点则T, 否则F |
F8 | Has_Ancestor_VP | 如果V有一个VP祖先节点则T, 否则F |
F9 | Has_Ancestor_CP | 如果V有一个CP节点则T, 否则F |
F10 | Left_Comma | 如果Z是句子的第一个缺省则NA, 否则如果Wl是一个逗号则T, 否则F |
F11 | LastWord | Z的左边相邻的分词 |
F12 | NetxtWord | Z的右边相邻的分词 |
F13 | LastWordPos | Z的左边相邻的分词词性 |
F14 | NextWordPos | Z的右边相邻的分词词性 |
F15 | LastWordPos_NextWordPos | Z的左边相邻的分词词性+Z的右边相邻的分词词性 |
序号 | 特征 | 说明 |
---|---|---|
1 | Dist_Sentence | 若Z与A在同一个句子中为0, 相差一个句子则为1, 依此类推 |
2 | Dist_Segment | 若Z与A在同一个分句中为0, 相差一个分句则为1, 依此类推 |
3 | Sibling_NP_VP | 若Z与A在不同的句子中为F, 否则若都是根节点的子节点且是兄弟节点则T, 否则F |
4 | Closet_NP | 若A是距离Z最近的候选先行词则T, 否则F |
5 | A_Ith_Peroson | 若A是第一、二、三人称, 中性, 未知, 分别对应取值First, Second, Third, Neutral, Others |
6 | A_Role | 若A是主语、宾语或者其他, 分别对应取值Subject, Object, Others |
7 | A_Has_Anc_NP | 若A有一个NP祖先节点则T, 否则F |
8 | A_Has_Anc_NP_In_IP | 若A有一个NP祖先节点且该节点是A最低IP祖先节点的后代则T, 否则F |
9 | A_Has_Anc_VP | 若A有一个VP祖先节点则T, 否则F |
10 | A_Has_Anc_VP_In_IP | 若A有一个VP祖先节点且该节点是A最低的IP祖先节点的后代则T, 否则F |
11 | A_Has_Anc_CP | 若A有一个CP祖先节点则T, 否则F |
12 | A_Grammatical_Role | 若A在句子中所承担的语法角色是主语、宾语或其他, 则特征值取为S, O或X |
13 | A_Cluse | 若A在主句、独立分句、附属从句或这3种情况之外, 特征值分别对应取M, I, S, X |
14 | A_Is_ADV | 若A是状语NP节点则T, 否则F |
15 | A_Is_TMP | 若A是一个时间NP则T, 否则F |
16 | A_Is_Pronoun | 若A是一个代名词则T, 否则F |
17 | A_Is_NE | 若A是一个命名实体则T, 否则F |
18 | A_In_Headline | 若A存在文本的标题中则T, 否则F |
19 | Z_Has_Anc_NP | 若V有一个NP祖先节点则T, 否则F |
20 | Z_Has_Anc_NP_In_IP | 若V有一个祖先节点且该节点是V的最低IP祖先节点的后代节点则T, 否则F |
21 | Z_Has_Anc_VP | 若V有一个VP祖先节点则T, 否则F |
22 | Z_Has_Anc_VP_In_IP | 若V有一个VP祖先节点, 并且该节点是V的最低IP祖先节点的后代节点则T, 否则F |
23 | Z_Has_Anc_CP | 若V有一个CP祖先节点则T, 否则F |
24 | Z_Grammatical_Role | 若零指代项Z的语法角色是主语则S, 否则X |
25 | Z_Clause | 若V在主句, 独立分句, 附属从句, 或者三种以外句子中, 特征值分别对应为M, I, S, X |
26 | Z_Is_First_ZP | 若Z是所在句子第一个零指代项候选则T, 否则F |
27 | Z_Is_Last_ZP | 若Z是所在句子最后一个零指代项候选则T, 否则F |
28 | Z_In_Headline | 若Z存在文本的标题中则T, 否则F |
Table 2 Zero pronoun resolution features
序号 | 特征 | 说明 |
---|---|---|
1 | Dist_Sentence | 若Z与A在同一个句子中为0, 相差一个句子则为1, 依此类推 |
2 | Dist_Segment | 若Z与A在同一个分句中为0, 相差一个分句则为1, 依此类推 |
3 | Sibling_NP_VP | 若Z与A在不同的句子中为F, 否则若都是根节点的子节点且是兄弟节点则T, 否则F |
4 | Closet_NP | 若A是距离Z最近的候选先行词则T, 否则F |
5 | A_Ith_Peroson | 若A是第一、二、三人称, 中性, 未知, 分别对应取值First, Second, Third, Neutral, Others |
6 | A_Role | 若A是主语、宾语或者其他, 分别对应取值Subject, Object, Others |
7 | A_Has_Anc_NP | 若A有一个NP祖先节点则T, 否则F |
8 | A_Has_Anc_NP_In_IP | 若A有一个NP祖先节点且该节点是A最低IP祖先节点的后代则T, 否则F |
9 | A_Has_Anc_VP | 若A有一个VP祖先节点则T, 否则F |
10 | A_Has_Anc_VP_In_IP | 若A有一个VP祖先节点且该节点是A最低的IP祖先节点的后代则T, 否则F |
11 | A_Has_Anc_CP | 若A有一个CP祖先节点则T, 否则F |
12 | A_Grammatical_Role | 若A在句子中所承担的语法角色是主语、宾语或其他, 则特征值取为S, O或X |
13 | A_Cluse | 若A在主句、独立分句、附属从句或这3种情况之外, 特征值分别对应取M, I, S, X |
14 | A_Is_ADV | 若A是状语NP节点则T, 否则F |
15 | A_Is_TMP | 若A是一个时间NP则T, 否则F |
16 | A_Is_Pronoun | 若A是一个代名词则T, 否则F |
17 | A_Is_NE | 若A是一个命名实体则T, 否则F |
18 | A_In_Headline | 若A存在文本的标题中则T, 否则F |
19 | Z_Has_Anc_NP | 若V有一个NP祖先节点则T, 否则F |
20 | Z_Has_Anc_NP_In_IP | 若V有一个祖先节点且该节点是V的最低IP祖先节点的后代节点则T, 否则F |
21 | Z_Has_Anc_VP | 若V有一个VP祖先节点则T, 否则F |
22 | Z_Has_Anc_VP_In_IP | 若V有一个VP祖先节点, 并且该节点是V的最低IP祖先节点的后代节点则T, 否则F |
23 | Z_Has_Anc_CP | 若V有一个CP祖先节点则T, 否则F |
24 | Z_Grammatical_Role | 若零指代项Z的语法角色是主语则S, 否则X |
25 | Z_Clause | 若V在主句, 独立分句, 附属从句, 或者三种以外句子中, 特征值分别对应为M, I, S, X |
26 | Z_Is_First_ZP | 若Z是所在句子第一个零指代项候选则T, 否则F |
27 | Z_Is_Last_ZP | 若Z是所在句子最后一个零指代项候选则T, 否则F |
28 | Z_In_Headline | 若Z存在文本的标题中则T, 否则F |
数据域 | Chen等[ | 基准系统 | 引入双语的改进系统 | ||||||
---|---|---|---|---|---|---|---|---|---|
R | P | F | R | P | F | R | P | F | |
NW | 11.9 | 14.3 | 13.0 | 13.4 | 15.7 | 14.5 | 18.2 | 23.9 | 20.7 |
MZ | 4.9 | 4.7 | 4.8 | 8.9 | 7.8 | 8.3 | 9.7 | 13.4 | 11.3 |
WB | 20.1 | 14.3 | 16.7 | 14.2 | 11.4 | 12.6 | 16.2 | 14.5 | 15.3 |
BN | 18.2 | 22.3 | 20.0 | 18.5 | 24.1 | 20.9 | 19.7 | 21.4 | 20.5 |
BC | 19.4 | 14.6 | 16.7 | 21.6 | 14.3 | 17.2 | 22.9 | 19.2 | 20.9 |
TC | 31.8 | 17.0 | 22.2 | 30.1 | 15.6 | 20.5 | 29.7 | 19.4 | 23.5 |
整体 | 19.6 | 15.5 | 17.3 | 20.3 | 15.8 | 17.8 | 25.4 | 20.7 | 22.8 |
Table 3 Comparison of system performance %
数据域 | Chen等[ | 基准系统 | 引入双语的改进系统 | ||||||
---|---|---|---|---|---|---|---|---|---|
R | P | F | R | P | F | R | P | F | |
NW | 11.9 | 14.3 | 13.0 | 13.4 | 15.7 | 14.5 | 18.2 | 23.9 | 20.7 |
MZ | 4.9 | 4.7 | 4.8 | 8.9 | 7.8 | 8.3 | 9.7 | 13.4 | 11.3 |
WB | 20.1 | 14.3 | 16.7 | 14.2 | 11.4 | 12.6 | 16.2 | 14.5 | 15.3 |
BN | 18.2 | 22.3 | 20.0 | 18.5 | 24.1 | 20.9 | 19.7 | 21.4 | 20.5 |
BC | 19.4 | 14.6 | 16.7 | 21.6 | 14.3 | 17.2 | 22.9 | 19.2 | 20.9 |
TC | 31.8 | 17.0 | 22.2 | 30.1 | 15.6 | 20.5 | 29.7 | 19.4 | 23.5 |
整体 | 19.6 | 15.5 | 17.3 | 20.3 | 15.8 | 17.8 | 25.4 | 20.7 | 22.8 |
系统 | 使用标准句法树 | 使用自动句法树 | ||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
基准系统 | 72.9 | 58.2 | 64.7 | 62.7 | 39.4 | 48.4 |
改进后系统 | 70.5 | 62.4 | 66.2 | 60.4 | 50.2 | 54.8 |
Table 4 Performance of Chinese zero pronoun detection %
系统 | 使用标准句法树 | 使用自动句法树 | ||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
基准系统 | 72.9 | 58.2 | 64.7 | 62.7 | 39.4 | 48.4 |
改进后系统 | 70.5 | 62.4 | 66.2 | 60.4 | 50.2 | 54.8 |
系统 | 标准零指代项 | 自动零指代项 | ||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
基准系统 | 44.8 | 44.8 | 44.8 | 20.3 | 15.8 | 17.8 |
改进后系统 | 46.7 | 46.7 | 46.7 | 25.4 | 20.7 | 22.8 |
Table 5 Performance of Chinese zero pronoun resolution using automatic parse trees %
系统 | 标准零指代项 | 自动零指代项 | ||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
基准系统 | 44.8 | 44.8 | 44.8 | 20.3 | 15.8 | 17.8 |
改进后系统 | 46.7 | 46.7 | 46.7 | 25.4 | 20.7 | 22.8 |
[1] | Weischedel R, Palmer M, Marcus M, et al. Ontonotes release 5.0 LDC2013T19 [EB/OL]. Philadelphia: Lin-guistic Data Consortium. (2013‒10‒16) [2015‒03‒ 23]. |
[2] | Kim Y J. Subject/object drop in the acquisition of Korean: a cross-linguistic comparison. Journal of East Asian Linguistics, 2000, 9(4): 325-351 |
[3] | Zhao S H, Ng T H. Identification and resolution of Chinese zero pronouns: a machine learning approach // Proceedings of EMNLP-2007. Prague: Association for Computational Linguistics, 2007: 541-550 |
[4] | Kong Fang, Zhou Guodong.A tree kernel-based uni-fied framework for Chinese zero anaphora resolution// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Massa-chusetts: Association for Computational Linguistics, 2010: 882-891 |
[5] | Chen C, Ng V.Chinese zero pronoun resolution: Some recent advances // Proceedings of the 2013 Conference on Empirical Methods in Natural Lan-guage Processing. Seattle: Association for Compu-tational Linguistics, 2013: 1360-1365 |
[6] | Chen C, Ng V.Chinese overt pronoun resolution: a bilingual approach // Proceedings of the 28th AAAI. Conference on Artificial Intelligence. Québec City, 2014: 1615-1621 |
[7] | Chen C, Ng V. Chinese zero pronoun resolution: a joint unsupervised discourse-aware model rivaling state-of-the-art resolvers // Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Beijing: Association for Computational Lin-guistics, 2015: 320-326 |
[8] | Chung T, Gildea D.Effects of empty categories on machine translation // Proceedings of the 2010 Con-ference on Empirical Methods in Natural Language Processing. Massachusetts: Association for Computa-tional Linguistics, 2010: 636-645 |
[9] | Xue Nianwen, Yang Yaqin. Chinese sentence seg-mentation as comma classification // Proceedings of ACL-2011: Short Papers. Portland, Oregon: Asso-ciation for Computer Linguistics, 2011: 631-635 |
[10] | Kong Fang, Zhou Guodong.A clause-level hybrid approach to Chinese empty element recovery // Pro-ceedings of the 2013 International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2013: 2113-2119 |
[11] | Pradhan S, Moschitti A, Xue Nianwen, et al. CoNLL-2012 shared task: modeling multilingual unrestricted coreference in OntoNotes // Proceedings of the Shared Task: Modeling Multilingual Unrestricted Conference in OntoNotes, EMNLP-CoNLL 2012. Jeju Island, 2012: 1-40 |
[1] |
ZHANG Yazhou, WANG Mengyao, RONG Lu, YU Yang, ZHAO Dongming, QIN Jing.
Can ChatGPT Be Served as the Sentiment Expert? An Evaluation of ChatGPT on Sentiment and Metaphor Analysis
[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 43-52.
|
[2] | SHEN Guohong, HUANG Cong, ZHANG Pengfei, ZHANG Xiaoxin, WANG Jinhua, LI Jiawei, ZONG Weiguo, ZHANG Shenyi, ZHANG Xianguo, SUN Yueqiang, YANG Yong, ZHANG Huanxin, ZOU Hong, WANG Jindong, SUN Ying, BAI Chaoping, TIAN Zheng. Comprehensive Detection Payload Technology for Space Environment of FY-3E Satellite [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 145-156. |
[3] | XIE Hao, CAO Jian, LI Pu, ZHAO Xiongbo, ZHANG Xing. A Hardware Accelerator for SSD Object Detection Algorithm Based on FPGA [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(6): 1015-1022. |
[4] | LI Fang, CAO Jian, LI Pu, XIE Hao, ZHAO Xiongbo, WANG Yuan, ZHANG Xing. Design and Implementation of Object Detection Acceleration Module Based on an ARM+FPGA Heterogeneous Platform [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(6): 1035-1041. |
[5] | YAO Yuan, YANG Zhousheng, JIANG Jinzhong, ZHOU Shiyong. Microseismicity in Central Xiaojiang Fault Zone, Yunnan: Application of PALM on Dense Seismic Network [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(5): 829-838. |
[6] |
ZHONG Wei, LIU Siqi, DONG Yanran, XU Nan.
Identification of Iodinated Disinfection Byproducts in Groundwater from Hebei Province Using Iodide-Based Nontarget Screening
[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(4): 711-720.
|
[7] |
WU Weiwei, LI Peijun.
A Method for Monitoring Cropland Retirement Using Landsat Images and Time Series Subsequence of Cropland Probability
[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(2): 271-281.
|
[8] | KANG Ling, ZHU Hao, HUANG Qianqian, LIU Xinjian, LIN Hongtao, CAI Xuhui, SONG Yu, ZHANG Hongsheng. Impact of Temporal and Spatial Resolution of CALMET on the Simulated Concentration Fields of CALPUFF [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(6): 1006-1018. |
[9] | SUN Wenyu, CAO Jian, LI Pu, LIU Rui. Pruning and Fine-tuning Optimization Method of Convolutional Neural Network Based on Global Information [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(4): 790-794. |
[10] | SHAO Xing, YANG Haijun, LI Yang, JIANG Rui, YAO Jie, YANG Qianzi. Coupled Model Studies of the Tibetan Plateau Effect on the Atlantic Meridional Overturning Circulation under Different Resolutions [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 121-131. |
[11] | ZOU Silin, REN Xiaochen, WANG Chenggong, WEI Jun. Impacts of Temporal Resolution and Spatial Information on Neural-Network-Based PM2.5 Prediction Model [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(3): 417-426. |
[12] | JIA Yunlong, HAN Donghong, LIN Haiyuan, WANG Guoren, XIA Li. Consumption Intent Recognition Algorithms for Weibo Users [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 68-74. |
[13] | WU Ruiying, KONG Fang. Event Coreference Resolution with Document Representation [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 82-88. |
[14] | FANG Haiquan, JIANG Yunzhong, YE Yuntao, CAO Yin. River Extraction from High-Resolution Satellite Images Combining Deep Learning and Multiple Chessboard Segmentation [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(4): 692-698. |
[15] | ZHANG Qinglin, DU Jiachen, XU Ruifeng. Sarcasm Detection Based on Adversarial Learning [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(1): 29-36. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 529
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 1303
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||