Acta Scientiarum Naturalium Universitatis Pekinensis

Hypernym Relation Classification Based on Word Pattern

SUN Jiawei, LI Zhenghua, CHEN Wenliang, ZHANG Min

2019, 55(1): 1-7. DOI: 10.13209/j.0479-8023.2018.055

Asbtract ( )

HTML

PDF (4709KB) ( )

Related Articles | Metrics

The authors propose a hypernym relation classification method based on word pattern, which can effectively alleviate the sparsity problem suffered by the traditional path-based method. Furthermore, this paper makes an effective combination of the path-based method and the distributional method via word pattern embedding. To demonstrate the effectiveness of the proposed approach, the authors manually annotated a Chinese hypernym dataset containing 12000 word pairs. The experimental results show that the proposed word pattern embedding approach is effective and can achieve an F1 score of 95.36%.

Research on Chinese Nested Named Entity Relation Extraction

XU Haoliang, LI Yanqun, HE Yunqi, QIAN Longhua

2019, 55(1): 8-14. DOI: 10.13209/j.0479-8023.2018.056

Asbtract ( )

HTML

PDF (442KB) ( )

Related Articles | Metrics

Nested named entities relationship extraction research lacks corresponding benchmark corpora. To solve this problem, manual annotation with machine learning are combined to extract their semantic relationships from an existing Chinese named entity recognition corpus. The authors manually annotate a Chinese nested named entity relation corpus from existing Chinese named entity recognition and conduct experiments with relation extraction between nested named entities via support vector machines (SVM) and convolutional neural network (CNN) models respectively. The experimental results show that the nested entity relation extraction performs excellently on the corpus with manually labeled entities, obtaining an F1 score of over 95%, while it falls short of expectations with automatically recognized entities.

Building Chinese Zero Corpus Form Discourse Perspective

SHENG Chen, KONG Fang, ZHOU Guodong

2019, 55(1): 15-21. DOI: 10.13209/j.0479-8023.2018.057

Asbtract ( )

HTML

PDF (672KB) ( )

Related Articles | Metrics

To better deal with Chinese zero elements, this paper makes a theoretical analysis from discourse perspective and completes the construction of the Chinese Discourse Zero Corpus (CDZC). First, the necessity of corpus construction has been explored based on the research of existing theoretical and data sources. Then, the topdown and forword search annotation strategy and the combination of the human machine are used to complete corpus annotation. Finally, the detailed statistics analysis shows that CDZC can fully reflect the characters of Chinese linguistic and provide corpus resources for related research.

A Word Representation Method Based on Hownet

CHEN Yang, LUO Zhiyong

2019, 55(1): 22-28. DOI: 10.13209/j.0479-8023.2018.061

Asbtract ( )

HTML

PDF (653KB) ( )

Related Articles | Metrics

Word embedding method based on pre-training still has some defects in the stability and the quality of low-frequency words. The authors propose a new word embedding method based on Hownet. First, based on the sememe independence assumption, all sememes of Hownet are specified in an Euclidean Space’s standard orthogonal basis to initialize all sememe vectors. Secondly, utilizing the relationship between word and sememe defined in the Hownet, each word vector representation can be regarded as a subspace projection by related sememes. Finally, a deep neural network model is put forward to learn word representations. The experimental results indicate that proposed word embedding method based on Hownet obtained comparable results in the two standard evaluation tasks including the word similarity computation and the word sense disambiguation.

Sarcasm Detection Based on Adversarial Learning

ZHANG Qinglin, DU Jiachen, XU Ruifeng

2019, 55(1): 29-36. DOI: 10.13209/j.0479-8023.2018.064

Asbtract ( )

HTML

PDF (530KB) ( )

Related Articles | Metrics

Existing sarcasm detection approaches suffer from lack of sufficient training data. To address this problem, the authors propose an adversarial learning framework built on convolutional neural network (CNN) and attention mechanism, which is trained from limited amounts of labeled data. Two complementary adversarial learning approaches are investigated. First, by training with generated adversarial examples, the authors attempt to enhance the robustness and generalization ability of the classifier. Then, a domain transfer based adversarial learning approach is proposed to leverage cross-domain sarcasm data for improving the performance of sarcasm detection in the target domain. Experimental results on three sarcasm datasets show that both adversarial learning approaches proposed improve the performance of sarcasm detection, but the domain transfer based approach achieves higher performance. Combining the two proposed approaches further improves the performance of sarcasm detection.

Cross-Domain Sentiment Classification Based on Representation Learning and Transfer Learning

LIAO Xiangwen, WU Xiaojing, GUI Lin, HUANG Jinhui, CHEN Guolong

2019, 55(1): 37-46. DOI: 10.13209/j.0479-8023.2018.063

Asbtract ( )

HTML

PDF (1404KB) ( )

Related Articles | Metrics

Most of existing cross-domain sentiment classification methods are not expressive enough to capture rich representation of texts, and class noise accumulated during transfer process would lead to negative transfer which could adversely affect performance. To address these issues, the authors propose a method combining textual representation learning and transfer learning algorithm for cross-domain sentiment classification. This method first builds a hierarchical attention network to generate document representations with local semantic information. Afterwards, the authors utilize the class-noise estimation algorithm to detect the negative transfer samples in transferred samples and remove them. Finally, the sentiment classifier is trained on the expanded dataset from samples in target domain and transferred ones in source domain. Compared with the baselines, two experiments on large-scale product review datasets show that the proposed method is able to effectively reduce RMSE of crossdomain sentiment classification by 1.5% and 1.0% respectively.

Collaborative Analysis of Uyghur Morphology Based on Character Level

Turghun Osman, YANG Yating, Eziz Tursun, CHENG Li

2019, 55(1): 47-54. DOI: 10.13209/j.0479-8023.2018.067

Asbtract ( )

HTML

PDF (1060KB) ( )

Related Articles | Metrics

The Uyghur language has various inflectional affixes, complex structures and phonetic changes. The authors propose a collaborative analysis method for Uyghur morphology at character level. It includes three procedures: morpheme segmentation, morphological annotation and reduction of phonetic changes. The main characteristics of this method is to use a composite tag to represent the morpheme boundaries, annotations and phonetic changes. In addition, character sequence annotation is used to train the model. Experimental results show that the accurency of morpheme segmentation, morphological annotation and reduction of phonetic reaches 96.39%, 92.78% and 99.79% respectively. The overall accuracy of the system reaches 92.59%.

Semantic Search on Non-Factoid Questions for Domain-Specific Question Answering Systems

QIU Yu, CHENG Li, Daniyal Alghazzawi

2019, 55(1): 55-64. DOI: 10.13209/j.0479-8023.2018.068

Asbtract ( )

HTML

PDF (1194KB) ( )

Related Articles | Metrics

A semantic-based retrieval method was proposed to extract answer sentences from tax regulations and cases. Firstly, a domain knowledge base was employed to generate semantic annotations for questions, regulations and cases. Secondly, a filtering system was developed for the removal of irrelevant cases from answer candidates. In addition, a semantic similarity measurement method was employed for answer extraction. Finally, a rank model was proposed for the optimization of the retrieved results. In order to validate the proposed method, a series of experiments were performed on real-life dataset. Experiment results show noticeable improvement in accuracy and performance compared to the baseline methods.

Research on Movie Media Website for Ranking Prediction

YANG Liang, ZHOU Fengqing, LIN Yuan, LIN Hongfei, XU Kan

2019, 55(1): 65-74. DOI: 10.13209/j.0479-8023.2018.062

Asbtract ( )

HTML

PDF (5036KB) ( )

Related Articles | Metrics

Integrating with learning to rank methods, the authors propose a movie ranking prediction model by mining and analyzing the data from movie media websites, which includes extracting and expanding features related to ranking prediction as well as dividing and aligning ranking labels etc. Experiment results show that the proposed model effectively improves the performance of the movie ranking prediction task, which can benefit the cinemas to arrange the number of screenings properly. The model can also provide high quality recommendations to movies for the fans.

Hybrid Neural Network for Recognition of the “de” Structure with Semantic Ellipsis

SHI Bingqing, DAI Rubing, QU Weiguang, GU Yanhui, ZHOU Junsheng, LI Bin, XU Ge, SHI Shengwang

2019, 55(1): 75-83. DOI: 10.13209/j.0479-8023.2018.058

Asbtract ( )

HTML

PDF (893KB) ( )

Related Articles | Metrics

To slove the classification of the “de” structure containing the usage of semantic ellipsis, a hybrid neural network is built. Firstly, the network uses a bidirectional LSTM (long short-term memory) neural network to learn more syntactic and semantic information of the “de” structure. Then, the network employs a Max-pooling
layer or GRU (gated recurrent unit) based multiple attention layers to capture features of ellipsis of the “de” structure by which the network can recognize the “de” structure containing the usage of semantic ellipsis. Experiments on CTB8.0 corpus show that the proposed approach can achieve accurate results efficiently, the F1 value is 96.67%.

Similar Legal Case Retrieval Based on Improved Siamese Network

LI Lanjun, ZHOU Junsheng, GU Yanhui, Qü Weiguang

2019, 55(1): 84-90. DOI: 10.13209/j.0479-8023.2018.059

Asbtract ( )

HTML

PDF (505KB) ( )

Related Articles | Metrics

In view of the existing research about document similarity calculation methods based on siamese networks, the entire document is regarded as the input sequence of model that may lead to sparse data. Hierarchical attention mechanism is used to improve the document representation in the siamese network. For the siamese network computing model based on hierarchical attention mechanism may ignore the important sentence in the document when inputting, a two-step document similarity calculation method that introduces the compression of document content is further proposed. The experimental results show that the proposed method is obviously superior to the siamese network computing model based on the Long Short-Term Memory.

Feature Learning by Distant Supervision for Fine-Grained Implicit Discourse Relation Identification

TANG Yuting, LI Yanbin, LIU Lu, YU Zhonghua, CHEN Li

2019, 55(1): 91-97. DOI: 10.13209/j.0479-8023.2018.060

Asbtract ( )

HTML

PDF (629KB) ( )

Related Articles | Metrics

Aiming at the identification of Chinese fine-grained implicit discourse relation and taking the directionality characteristic in account, the authors propose a feature learning algorithm based on the distant supervision to label explicit discourse data automatically. The relative position information between conjunction and words are applied to train the intensive word representation. Then the rhetorical function of words and the directionality of relations are encoded into the representation of intensive words, which is applied to the relation classification of fine-grained implicit discourses. From the experimental studies of the proposed approach, the classification accuracy reaches 49.79%, which are better than those approaches neglecting the directionality of discourse relations.

Sentence Style Meta Learning for Twitter Classification

YAN Leiming, YAN Luqi, WANG Chaozhi, HE Jiahui, WU Hongyu

2019, 55(1): 98-104. DOI: 10.13209/j.0479-8023.2018.054

Asbtract ( )

HTML

PDF (1927KB) ( )

Related Articles | Metrics

Due to the limited length and freely constructed sentence structures, it is a difficult classification task for short text classification, especially in multi-class classification. An efficient meta learning framework is proposed for twitter classification. The tweets are clustered into many sentence styles corresponding to new class labels. Thus, the original text classification task becomes few-shot learning task. When applying few-shot learning on benchmark datasets, the proposed method Meta-CNN achieves improvement in accuracy and F1 scores on multi-class twitter classification, and outweigh some traditional machine learning methods and a few deep learning approaches.

Research on Sentiment Analysis Based on Representation Learning

LI Xiaojun, SHI Hanxiao, CHEN Nannan, LIU Hong, ZOU Yi

2019, 55(1): 105-112. DOI: 10.13209/j.0479-8023.2018.066

Asbtract ( )

HTML

PDF (931KB) ( )

Related Articles | Metrics

The authors propose C&W-SP model — a text sentiment analysis model based on the representation learning. Firstly, an improved training model based on C&W model is proposed which can integrate emotional information and part of speech information in the training process of word embedding. The evaluation of data sets of NLP&CC’2013 is used to compare experimental results with different models. The experimental results show that the C&W-SP model which combines emotion information and part of speech information has the best performance and confirm the effectiveness of the proposed method.

N3LDG: A Lightweight Neural Network Library for Natural Language Processing

WANG Qiansheng, YU Nan, ZHANG Meishan, HAN Zijia, FU Guohong

2019, 55(1): 113-119. DOI: 10.13209/j.0479-8023.2018.065

Asbtract ( )

HTML

PDF (1161KB) ( )

Related Articles | Metrics

The authors propose a neural network library N3LDG for natural language processing. N3LDG supports constructing computation graphs dynamically, and organizing executions into batches automatically. Experiments show that N3LDG can efficiently construct and execute computation graphs when training CNN, Bi-LSTM, and Tree-LSTM. When using CPU to train above models, the training speed of N3LDG is better than that of PyTorch. When using GPU to train CNN and Tree-LSTM, N3LDG is better than PyTorch.

A Scheme for Picking First-Arrival of Air-Gun Records in Three Stages

ZHU Yixin, ZHANG Yunpeng

2019, 55(1): 120-132. DOI: 10.13209/j.0479-8023.2018.030

Asbtract ( )

HTML

PDF (12561KB) ( )

Related Articles | Metrics

The routinely picked results of first arrival of air-gun source data is poor, mainly because of its low signal-to-noise ratio (SNR). The anthors introduce the seismic exploration methods to pick the first-arrival and proposes a scheme for picking first-arrival in three stages to reduce the dependence on data’s SNR. Firstly, the noise was depressed. And then, the traditional exploration method was used to calculate the characteristic curve. Finally, the first-arrival was determined by combining the edge-preserving smoothing method. On this basis, according to the characteristics of practical data, the authors design an automatic pick-up process, which is used to pick up the first arrival of practical data. Compared with the traditional method, this method requires less SNR for the data and provides more available materials for subsequent processing and analysis.

Zircon U-Pb Age and Geochemistry of Ningjiawan Pluton in Lüliang Region and Their Geological Significances

PANG Fei, LI Qiugen, LIU Shuwen, WANG Zongqi, LIU Zhengfu, MEI Kechen

2019, 55(1): 133-147. DOI: 10.13209/j.0479-8023.2018.034

Asbtract ( )

HTML

PDF (21553KB) ( )

Related Articles | Metrics

Field investigations, petrology, geochemistry, zircon U-Pb geochronology and Hf isotope analysis were performed to investigate the petrogenesis of Ningjiawan pluton, in an attempt to shed light on its geodynamic significance. LA-ICP-MS zircon U-Pb dating from two samples yielded ages of 2364±6 Ma (MSWD=0.13) and 2360±23 Ma (MSWD=4.0) respectively, indicating that the magma emplaced and crystalized in Paleoproterozoic. The pluton contains high concentrations of alkaline, K and Si, had elevated FeO^T/MgO ratios and high field strength element (HFSE) contents, was enriched in Rb, Ba, Th and U elements, was depleted in Ca, Mg, P and Ti, and possessed a “seagull-type” chondrite-normalied REE pattern with significantly negative Eu anomalies (δEu=0.13−0.36) and enrichment in the LREE relative to the HREE, exhibiting the traits of the highly fractionated Itype granite. Relatively high whole-rock Y/Nb values (1.2−2.8), together with positive ε_Hf(t) values (+1.6−+6.4), and t_DM1(Hf) and t_DM2(Hf) ages of 2449−2629 Ma and 2474−2711 Ma respectively on the zircon grains, signified that they were products derived from magma mixing between crust and depleted mantle sources. Moreover, distinctive negative Nb, P and Ti anomalies and positive Ce, Nd and Zr anomalies, characteristics of continental margin arc, combining with the regional geological background, indicate that Ningjiawan pluton is very likely to form in island arc setting.

Physicochemical Estimation of Geochemical Conditions in TSR Reaction

TAN Yu, GUAN Ping, PANG Lei, LIU Peixian, ZHOU Yejun

2019, 55(1): 148-158. DOI: 10.13209/j.0479-8023.2018.037

Asbtract ( )

HTML

PDF (1152KB) ( )

Related Articles | Metrics

According to the principle of thermodynamics, thermodynamic phase diagrams are calculated and drawn, the possibility, direction and physicochemical conditions of the two chemical reaction processes of TSR reaction and dissolution of carbonate rocks by H₂S are determined, and the direct reduction of CaSO₄ (or SO₄²⁻) to H₂S at different temperatures is obtained. It shows that when CaCO₃ is at the boundary between precipitation and dissolution in geological system, a small amount of acidic fluid will make the precipitated CaCO₃ dissolved, and when the concentration of Ca²⁺ and CO₃²⁻ increases, a new equilibrium of precipitation and dissolution will be achieved. When the depth of dissolution of CaCO₃ is about 1000 m, H₂S achieves the best effect. Only long-term and repeated TSR reactions can produce sufficient acidic fluid (H₂S), which is the necessary condition for dissolution modification of carbonate reservoirs to achieve obvious results.

Impact of Tibetan Glacier Change on the Asian Climate during the Last Glacial Maximum

WU Yubin, LIU Yonggang, YI Chaolu, LIU Peng

2019, 55(1): 159-170. DOI: 10.13209/j.0479-8023.2018.094

Asbtract ( )

HTML

PDF (3594KB) ( )

Related Articles | Metrics

Taking the climate of the Last Glacial Maximum (about 26000 years ago to 19000 years ago) as the background climate, the authors study the climatic impact of the expansion of the glacier on the Tibetan Plateau using the atmospheric general circulation model CAM4 coupled to the land surface model CLM4. The results show that in summer the increased glacier extent over Qinghai-Tibet Plateau has a significant impact on the climate in the Northern Hemisphere. Besides the significant temperature decrease on the glacier, atmospheric teleconnection can also cause significant warming near the Bering Strait. In addition, the disturbance caused by glaciers will enhance the South Asian summer monsoon and increase the precipitation there. Finally, through comparing the influence of the scale of the Qinghai-Tibetan glaciers on the climate under the different climate states of the Last Glacial Maximum (LGM) and Pre-industrial (PI) periods, it was found that their influence in the PI period was significantly less than that in the LGM period. It indicates that impact of Tibetan glaciers on climate is related with the climate state.

Spatio-temporal Change of NDVI and Its Relationship with Climate in the Upper and Middle Reaches of Heihe River Basin from 2000 to 2015

YOU Nanshan, MENG Jijun, SUN Mutian

2019, 55(1): 171-181. DOI: 10.13209/j.0479-8023.2018.075

Asbtract ( )

HTML

PDF (14890KB) ( )

Related Articles | Metrics

The upper and middle reaches of Heihe River Basin (HRB), which is the second largest inland river basin in the arid area of the northern China, was chosen as study area. Based on monthly normalized difference vegetation index (NDVI) derived from MODIS sensor, monthly temperature and precipitation data observed by meteorological stations, DEM and basic geographical information, the authors analyzed the spatio-temporal change of NDVI and its relationship with climate from 2000 to 2015 using empirical approach. It was found that NDVI in the upper and middle reaches of Heihe River basin increased generally; the increasing rate of NDVI in summer was higher than that in autumn and spring; the area with rapid increasing rate of NDVI was mainly located in the oasis along the Heihe river; the significant decrease of NDVI occurred in the urban areas of Zhangye, Jiuquan and other cities. It is concluded that the correlation of NDVI with precipitation in summer was higher than that with temperature, whereas NDVI in spring and autumn exhibited higher correlation with temperature. NDVI in the grassland, gobi and desert far away from the main river had significant correlation with precipitation in summer, but NDVI in oasis adjacent to the main river did not show the significant correlation with precipitation. The memory effects was also recognized when NDVI responding to precipitation. The general time lag of NDVI variation in summer responding to precipitation was about a month, but it could extend to 2 months. The results are proposed to provide references for regional vegetation restoration and ecosystem management.

Ecological Tension Index Assessment in China Based on RBFN Model

WANG Yuqi, CHENG Shupeng, LU Wentao, FU Zhenghui, GUO Huaicheng

2019, 55(1): 182-188. DOI: 10.13209/j.0479-8023.2018.091

Asbtract ( )

HTML

PDF (1601KB) ( )

Related Articles | Metrics

This paper select related indexes of ecological tension to establish the RBFN (radial basis function network) model, which is trained and tested according to the research results of ecological tension at different time in different regions. Then, the model is used to evaluate ecological tension of China’s 31 provinces, autonomous regions and municipalities in 2008 and 2013, and the evaluation results visualization expressed with GIS. The results show that the half area is the ecological pressure security status, and Beijing has the largest ecological pressure all the time; 22 provincial administrative regions’ ecological tension are aggravated from 2008 to 2013; regionally, the ecological pressure is the largest in North China, and the smallest in the northwest.

Urban River Landscape Planning Based on Landscape Evaluation: A Case Study of Panlong River in Kunming

LIU Jiaju, WANG Yuhong, ZHAO Long, GUO Huaicheng

2019, 55(1): 189-196. DOI: 10.13209/j.0479-8023.2018.093

Asbtract ( )

HTML

PDF (2633KB) ( )

Related Articles | Metrics

To construct the evaluation index system of urban river landscape, the landscape evaluation of the river is taken as the target level, and the eco-environmental index, social economy and aesthetics are taken as the guideline. Water quality, species diversity, water transparency, flood control, waterscape utilization, landscape accessibility, color beauty, form beauty, and regional culture are used as index layers. Analytic Hierarchy Process (AHP) is used to determine the weight of each indicator layer, and the use of distance index method is used to build the evaluation model. The evaluation model was used to evaluate the landscape of the upper and lower reaches of Panlong River in Kunming. The quantitative evaluation and qualitative evaluation are combined to provide guidance for planning and design of urban river system landscape environment and to build a new model of water restoration planning in order to provide reference for sustainable development.

Table of Content