[an error occurred while processing this directive]

Table of Content

    20 March 2017, Volume 53 Issue 2
    Orginal Article
    Improving Query-Focused Summarization with CNN-Based Similarity
    Wenhao YING, Xinyan XIAO, Sujian LI, Yajuan LÜ, Zhifang SUI
    2017, 53(2):  197-203.  DOI: 10.13209/j.0479-8023.2017.028
    Asbtract ( )   HTML ( )   PDF (1290KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In search services, users can get information more conveniently by reading the succinct answers to their questions. This paper introduces a feature-based method for the query-focused summarization to extract the answer summary of a user query. A convolutional neural network (CNN) is used to learn the semantic representation of a sentence, by which the similarity between a candidate answer sentence and a user query is evaluated. The neural network is trained under the framework of max-margin learning. Experiments in Baidu Knows verify that the proposed method can generate the concise answer of a user query.

    Word Sense Disambiguation Based on Domain Knowledge and Word Vector Model
    An YANG, Sujian LI, Yun LI
    2017, 53(2):  204-210.  DOI: 10.13209/j.0479-8023.2017.027
    Asbtract ( )   HTML ( )   PDF (291KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A WSD method is presented, using domain keywords and word vector model built from unlabelled data. The effectiveness of the proposed approach is proved, compared with other WSD methods including Lesk on evaluation corpus in environmental domain. Through employing knowledge from different fields, proposed method can be adapted into the WSD task of other domains.

    Research on Automatic Writing of NBA Sports News
    Yujing CHEN, Xueqiang LÜ, Jianshe ZHOU, Ning LI
    2017, 53(2):  211-218.  DOI: 10.13209/j.0479-8023.2017.034
    Asbtract ( )   HTML ( )   PDF (522KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Based on the characteristics of NBA sports news and text broadcast, a method of automatic writing of NBA sports news is proposed. According to the score difference of two teams, the score difference function is constructed. A data slice algorithm and a data synthesis algorithm based on the character of score difference function are proposed. The live data slices are classified, and the template library of NBA sports report is constructed according to the category of data and the history of the NBA sports news reports. The information of data piece is filled into the template with the team and the player’s performance as the center, and an automatically generated NBA sports news release can be obtained. Four indicators are put forward to measure the quality of automatic writing NBA sports news. The experiments show that the proposed method is effective and feasible, the writing speed is fast, and it can help the news writing of the event.

    A Structure and Style Model for Chinese Character Dynamic Generation
    Qingsheng LI, Qiang XU, Jianguo XIAO, Quan LIU, Jiefang ZHANG
    2017, 53(2):  219-229.  DOI: 10.13209/j.0479-8023.2017.048
    Asbtract ( )   HTML ( )   PDF (1808KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    This paper presents a Chinese generation model based on Chinese characters structure and style. The model is descripted by the stroke element, stroke element vector, path vector, string vector and yoke vector, including structure of Chinese characters and style of Chinese character. It can be used to dynamically generate the outline of the personalization True type font, and the methods of store the Chinese character on the Web and output to client is implemented. The best way to overcome the problem on Chinese font design and font generation is founded. This model is efficacious in Chinese information cloud storage and cloud service, and also provides an effective strategy and methods for the design of a deeper cloud character information service.

    Improve Automatic Evaluation of Machine Translation Using Specific-Domain Paraphrase
    Lilin ZHANG, Maoxi LI, Wenyan XIAO, Jianyi WAN, Mingwen WANG
    2017, 53(2):  230-238.  DOI: 10.13209/j.0479-8023.2017.030
    Asbtract ( )   HTML ( )   PDF (355KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Since the paraphrase extracted from the general domain tends to cause paraphrase match deviation in the specific-domain automatic evaluation of machine translation, this paper proposes an approach exploited specific-domain paraphrase related to the test set to enhance automatic evaluation of machine translation. First, the K-means algorithm is utilized to cluster general-domain monolingual corpus, and the specific-domain training data via improved M-L approach is obtained. Then, the specific-domain paraphrase table is extracted from the training data by Markov network model. Finally, the extracted paraphrase table is applied to automatic MT evaluation metrics to improve word match. The experimental results on the dataset of WMT’14 Metrics task and WMT’15 Metrics task show that the METEOR metric and the TER metric using the specific-domain paraphrase table yield better performance than that using the general-domain paraphrase table.

    A Study of Articulatory Features Based Detection of Pronunciation Erroneous Tendency
    Leyuan QU, Yanlu XIE, Jinsong ZHANG
    2017, 53(2):  239-246.  DOI: 10.13209/j.0479-8023.2017.029
    Asbtract ( )   HTML ( )   PDF (702KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    This paper proposed to apply senone log-likelihood ratio based articulatory features (AFs) to improve pronunciation erroneous tendency (PET) detection performance. The feedback information of articulation-placement and articulation-manner could be derived from the definition of PET. The framework of the method involved two main steps. 1) A bank of attribute extractors based on neural networks were trained to estimate the log-likelihood ratio (LLR) for each senone at a frame level. 2) AFs composed of those LLRs outputted from each attribute extractor were used for detecting PETs. Results demonstrated that the proposed system had better performance than the baseline system using MFCC. Moreover, substantial improvements were obtained by combining AFs with MFCC, achieving a lower false rejection rate of 5.0%, a lower false acceptance rate of 30.8% and a higher diagnostic accuracy of 89.8%.

    Microblog Sentiment Classification via Combining Rule-based and Machine Learning Methods
    Jie JIANG, Rui XIA
    2017, 53(2):  247-254.  DOI: 10.13209/j.0479-8023.2017.031
    Asbtract ( )   HTML ( )   PDF (429KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Based on the shortcomings of sentiment analysis, this paper implemented a rule-based sentiment classification method and designed a basic feature set for machine learning methods. A sentiment analysis method via a combination of rule-based and machine learning methods is proposed. An effective integration feature set is obtained by adding various rule-based features to the basic feature set after expanding and converting them. The proposed method outperforms the baseline of any single method. Finally ensemble of three different classifiers is used to make further improvement on the performance of microblog sentiment classification.

    A Sentence Segmentation Method for Ancient Chinese Texts Based on Recurrent Neural Network
    Boli WANG, Xiaodong SHI, Jinsong SU
    2017, 53(2):  255-261.  DOI: 10.13209/j.0479-8023.2017.032
    Asbtract ( )   HTML ( )   PDF (2307KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    This paper proposes an automatic sentence segmentation method for ancient Chinese texts based on recurrent neural network (RNN). A bi-directional RNN structure with gated recurrent units (GRU) is implemented, and state transition probability and length penalty are employed in decoding to improve the accuracy. Experimental results show that proposed model achieves higher F1 score than traditional methods.

    An Individual-Group-Merchant Relation Model for Identifying Online Fake Reviews
    Chuanming YU, Bolin FENG, Yuheng ZUO, Baiyun CHEN, Lu AN
    2017, 53(2):  262-272.  DOI: 10.13209/j.0479-8023.2017.033
    Asbtract ( )   HTML ( )   PDF (661KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A novel individual-group-merchant relation model is proposed to automatically identify fake reviews on E-commerce platforms, which focuses on the characteristics of fake reviewers’ behaviors instead of review contents. Three sets of indicators are proposed, i.e. individual indicators, group indicators and merchants’ indicators. To validate the model, an empirical study of fake review identification from a Chinese E-commerce platform is implemented. A number of 97804 reviews posted from 9558 different IP addresses, which are related to 93 online stores, are selected as test data. Results show that the F1-measure values of the proposed model on identifying fake reviewers, online merchants and groups with credit manipulation are 82.62%, 59.26% and 95.12%, respectively. Utilizing logistic regression and K nearest neighbor classifier based on the comments of the content as the baseline methods, the F1-measure values are 52.63% and 76.75%, respectively. Thus, the IGMRM model outperforms traditional methods in identifying fake reviewers.

    Stock Index Prediction Based on Text Information
    Li DONG, Zhongqing WANG, Deyi XIONG
    2017, 53(2):  273-278.  DOI: 10.13209/j.0479-8023.2017.037
    Asbtract ( )   HTML ( )   PDF (384KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Sentiment analysis strategy was used to predict stock market index. Support vector machine was applied to construct predict model based on textual information (i.e., lexical information, sentimental words, and sentiment categories) extracted from social media and stock indicators. Experiment results show that the proposed method can obtain the best results, compared with many different predictive model.

    Exploit Comparable Corpus to Chinese Zero Pronoun Resolution
    Ziyi YANG, Zhengxian GONG, Fang KONG, Guodong ZHOU
    2017, 53(2):  279-286.  DOI: 10.13209/j.0479-8023.2017.038
    Asbtract ( )   HTML ( )   PDF (350KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A bilingual approach based on a comparable corpus is proposed to better detect and to resolve Chinese zero pronouns. The concept of English equivalent sentence is defined firstly. Then the equivalent sentence is employed to redefine the distance between sentences and to extract bilingual word alignment features. In this way, both zero pronoun detection and resolution of the baseline system from bilingual perspective are improved. The experiments conducted on the OntoNotes5.0 corpus show that the proposed approach can significantly outperform the state-of-the-art system.

    A Comparative Study on English-Chinese Machine Transliteration
    Enting GAO, Xiangyu DUAN
    2017, 53(2):  287-294.  DOI: 10.13209/j.0479-8023.2017.039
    Asbtract ( )   HTML ( )   PDF (471KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the aim to study the two main methods on machine transliteration: traditional statistical method and the current prevalent deep neural network method, the authors carry out the comparative study on them with two typical systems per method The experiments show that traditional statistical method and deep neural network method perform comparatively regarding evaluation metrics, while manifest difference on individual transliteration result. A system combination method is proposed to balance the strengths of all systems. Experimental results show that system combination significantly improves the transliteration quality over single system.

    A Tree-to-String EBMT Method by Integrating Joint Model of Chinese Segmentation and Dependency Parsing
    Dandan WANG, Jin’an XU, Yufeng CHEN, Yujie ZHANG, Xiaohui YANG
    2017, 53(2):  295-304.  DOI: 10.13209/j.0479-8023.2017.035
    Asbtract ( )   HTML ( )   PDF (523KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In consideration of the complexity and high cost of system construction in traditional examplebased machine translation (EBMT) methods, the authors propose a Chinese-English tree-to-string EBMT method. Compared with the traditional methods, the preposed approach just needed to implement the processing of source language parsing. Word segmentation, POS tagging and dependency parsing were jointed to relieve the affections of error propagation and failure of feature extraction at different levels. Moreover, the authors extracted and generalized bilingual word and phase alignments from examples and templates by using the dependency structure of source language. Experimental results show that the preposed method can achieve better performance significantly than baseline systems.

    Integrating Voice Features into Japanese-English Hierarchical Phrase Based Model
    Nan WANG, Jin’an XU, Fang MING, Yufeng CHEN, Yujie ZHANG
    2017, 53(2):  305-313.  DOI: 10.13209/j.0479-8023.2017.036
    Asbtract ( )   HTML ( )   PDF (580KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The voice of each language usually keeps different syntactic structure. In machine translation, it causes relatively low translation quality. To resolve this problem, an approach is proposed by integrating voice features into hierarchical phrase based (HPB) models. In the proposed method, corpus is firstly classified into three categories from Japanese side: passive voice, potential voice and others. Secondly, passive and potential sentences are classified into several groups according to the characteristics of English to build maximum entropy models for rules. Finally, bilingual voice features are integrated into log linear model for improving translation results and the accuracy of rule selection during the translation of passive and potential sentences. In Japanese to English translation task, large scale experiment shows that the proposed method can not only improve the problem of long distance reordering but also improve translation quality of both passive and potential voice test sets.

    The Method of Recommended Trust Evaluation Based on Grey Correlation Analysis
    Bin ZHAO, Jingsha HE, Yixuan ZHANG, Peng ZHAI
    2017, 53(2):  314-320.  DOI: 10.13209/j.0479-8023.2016.112
    Asbtract ( )   HTML ( )   PDF (620KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problem of objectively evaluating recommendation trust using the weight of third party’s recommendation in the development of a solution for access control in open networks, a method of recommended trust evaluation based on the gray correlation analysis is proposed, which uses grey correclaton analysis theory between the entities of the development trend of the different degree to evaluate recommended weight in open network. Examples and simulation experiments show that results derived from the computation of the weight of the recommending entity is consistent with the actual results, which helps to verify that the proposed method can ensure the effectiveness and objectiveness of the evaluation decision on recommendation trust.

    Winter 2010‒2011: A Case Study of the Impact of La Niña on the Arctic Vortex
    Liu SHI, Zuntao FU
    2017, 53(2):  321-328.  DOI: 10.13209/j.0479-8023.2016.100
    Asbtract ( )   HTML ( )   PDF (1824KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The relationship between strong arctic vortex and the La Niña events is found out by a case study of winter 2010-2011, using the NCEP/NCAR (daily and month mean) dataset. During the winter 2010-2011, the Arctic vortex was extremely stronger and persistent longer than normal, at the same time La Niña signals were surprisingly strong and long-lasting. The impact of La Niña on the polar vortex was investigated. Results show that the Pacific/North American (PNA) Pattern and corresponding temperature anomalies were forced by this La Niña event and the atmospheric circulation was greatly modified, too. Due to the negative heat flux in Aleutian region, the wave activities from the troposphere to the stratosphere were much weaker than their climatology. In addition, composite analyses also verified this La Niña’s impact on Arctic Vortex, where 13 strongest La Niña cases during the period of 1948?2010 were selected.

    Study of Recording System and Objective Function for Microseismic Source Location
    Luolan LI, Chuan HE, Yuyang TAN
    2017, 53(2):  329-343.  DOI: 10.13209/j.0479-8023.2016.091
    Asbtract ( )   HTML ( )   PDF (13807KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Through synthetic data tests, the influence on source location results of surface and downhole recording systems are discussed, as well as the combination of both. The results indicate that joint use of surface and downhole recording systems can significantly improve the location accuracy. With the downhole recording system, the location results obtained by adopting different objective functions in source location algorithm are compared. Moreover, a new objective function is also proposed. The effectiveness of the new objective function is tested on synthetic and real data sets. The results demonstrate that this objective function shows better convergency in both horizontal and vertical directions, and it can produce more reliable location results.

    Toponym Resolution Based on Geo-relevance and D-S Theory
    Xingguang WANG, Ruijie ZHANG, Yi ZHANG
    2017, 53(2):  344-352.  DOI: 10.13209/j.0479-8023.2016.090
    Asbtract ( )   HTML ( )   PDF (560KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    ing at the situation that previous toponym resolution researches largely lack theoretical basis and a general formal way, a concept of geo-relevance based on Tobler’s Frist Law is proposed to formalize vicinity among geographic entities. Then a toponym resolution computing model based on dempster-shafer (D-S) theory is proposed to represent and combine co-occurring toponym evidences in context. The cognitive process of human reading and understanding spatiotemporal semantics in text are simulated by D-S theory, while a general and scalable formal framework for toponym resolution is provided. Finally, an experiment evaluation is given with a good result of F1 value (89.60%).

    A Precise Survey Method for Objects at a Distance Based on Stereo Panorama
    Xiaodong LI, Min SUN, Hui ZHENG, Cheng JIANG, Xiang REN, Lei LIU
    2017, 53(2):  353-359.  DOI: 10.13209/j.0479-8023.2016.101
    Asbtract ( )   HTML ( )   PDF (1368KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A survey method for objects at a distance is proposed in order to resolve the problem of lacking geospatial information in stereo panorama. The current methods mainly aim at the survey for close range objects, while for a bit far range objects or even for long range objects measurement with a non-metric camera, the accuracy of the result is low if there are no control points provided. In the proposed method, GPS/INS sensors are utilized to acquire the camera’s pose data, and the original values in the relative orientation procedure are set in a new way. As a consequence, objects can be measured much more precisely on the stereo panorama, with the precision up to 1% of the distance between the camera and the objects in the performed experiments.

    Study on Spatial-Temporal Collocation of Land Reclamation Based on Dual Self-organizing Model
    Yanmin REN, Yahui XU, Yu LIU, Xiumei TANG, Xuedong WANG
    2017, 53(2):  360-368.  DOI: 10.13209/j.0479-8023.2017.004
    Asbtract ( )   HTML ( )   PDF (2440KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Taking Tunchang County in Hainan Province as a case study, dual self-organizing model accounting for geographical space as well as attribute space, was proposed. The geographic space information included the longitude and the latitude of the administrative villages. These indices such as the potential, the urgency and the feasibility were combined to construct the attribute space. The results demonstrated that the potential, the urgency and the feasibility of land reclamation were quite different among the villages. The model scores for the villages were significantly higher in the southern region than that in the northern region, and they were higher in the eastern region than that in the western region. The most desired land reclamation projects would be carried out in Poxin Town, Nankun Town, Xichang Town, Tuncheng Town. The 161 villages were divided into 6 project regions through dual self-organizing model. Based on the comprehensive score, the 6 project regions were classified into three types: priority remediation area (near-term), key remediation area (medium-term) and moderate remediation area (long-term). The area percent of three types were 25.14%, 41.83% and 33.03%, respectively. The developing orientations and suggestions for the land reclamation projects were given according to the characteristics of different influence factors. The results provide the scientific foundation in planning and implementing the project of land reclamation in Tunchang County, and is helpful in improving the level of land consolidation planning as well as promoting the land reclamation progress and the sustainable development.

    Modeling Nutrients Exports by Rivers from Watersheds to River Mouth: Case Study of Beijiang River Basin
    Lili LI, Shengji LUAN
    2017, 53(2):  369-377.  DOI: 10.13209/j.0479-8023.2017.018
    Asbtract ( )   HTML ( )   PDF (598KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Global NEWS (Global Nutrient Export from WaterSheds) is an international modeling effort with few Chinese application cases, and this essay applied the model with modifications to Beijiang River Basin, one of the three main sub-basins of the Pearl River, in order to estimate the river basin-level export of multiple nutrient elements and elemental forms from land sources within the river basin to river mouths. A reliable environmental database of Beijiang River Basin was established by literature review and statistics collection, and with the help of ArcGIS technology. Model calibration and verification showed that the Nash-Sutcliffe model efficiency was 0.61 on DIN (Dissolved Inorganic Nitrogen) loads (t/a) exported at the basin mouth, indicating that the model performs reasonably well for DIN. Modelling results show that 1) in 2010, dissolved nitrogen exports (load) from Beijiang River Basin was 37.5 thousand t/a, which was 9.27% higher than that in 2000, with DIN accounting for 83.51% and DON (Dissolved Organic Nitrogen) accounting for 16.49%. 2) In 2010, dissolved phosphorus exports (load) from Beijiang River Basin was 46.3 thousand t/a, which was 30.05% higher than that in 2000, and contained 86.21% of DIP (dissolved inorganic phosphorus) and 13.79% of DOP (dissolved organic phosphorus). 3) Spatially, nutrients exports (load) from Sui River Basin, one of the downstream sub-basins, and nutrients exports (load) from Lian River Basin, one of the midstream sub-basins, were relatively higher than those from other sub-basins, indicating the necessity of controlling nutrient pollution in the two sub-basins. 4) Atmospheric nitrogen deposition was the major source of DIN export load, followed by synthetic fertilizer and biological nitrogen fixation, while animal wastewater discharging was the major source of DIP export load, followed by synthetic fertilizer. The results also show that the NEWS model is applicable to China’s small-to-medium river basins.

    Estimation of Long-Term Trends and Loads with Low-Frequency Water Quality Sampling in the Baoxiang River, One Tributary to Dianchi Lake
    Na LI, Huaicheng GUO
    2017, 53(2):  378-386.  DOI: 10.13209/j.0479-8023.2017.019
    Asbtract ( )   HTML ( )   PDF (885KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Studies of water quality trends and pollutant loads in the Baoxiang River, a tributary to Dianchi Lake were limited by the lack of consistent data. This study evaluated long-term trends and loads using ESTREND and LOADEST with water quality data collected with low-frequency sampling and continuous daily flow data calculated by Muskingum method. Significantly increasing trends in nutrient (NH3-N, TN, and TP) concentrations were detected at the 0.05 probability level. TSS concentration showed a significant decreasing trend of 12.34 percent per year. The similar results of unadjusted and flow-adjusted concentration indicated that these trends were caused by variation in pollutant emission rather than in river discharge. Regression models within LOADEST performed very well. Most of pollutants great loaded in the wet season in comparison to the dry and normal season, due to increased transports of nonpoint source pollution. The results indicate that it is the effective way to evaluation for low-frequency sampling, and methodology can be used in other watersheds.

    Characterization of Soil Fungal Community in Response to Heavy Metal Pollution in Lead-Zinc Mining Area
    Jinshui YANG, Yang YANG, Liangming SUN, Weijie LIU, Yuan ZENG, Chunping DENG, Guanlan XING, Hongli YUAN
    2017, 53(2):  387-396.  DOI: 10.13209/j.0479-8023.2016.122
    Asbtract ( )   HTML ( )   PDF (2817KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Heavy metal contamination is one of the most concerned global environmental problems and the soil heavy metal contamination is especially severe in lead-zinc mining areas in China. In order to study the soil fungal community composition changes responding to different degree of heavy metal pollution, soil samples from the lead-zinc mine field of Yunnan Mengnuo were studied. 5 samples from the heavy metal pollution soil (HP) and other 4 from the low pollution (LP), based on the cluster analysis of heavy metal contents and the physical and chemical properties of the sample were analyzed. Genomic DNA of the soil samples were extracted and the Internal Transcribed Spacer (ITS) genes were sequenced by the high-through sequencing Illumina MiSeq. The fungal communities at different taxonomic levels (Phylum, Class, Order, Family, Genus and Species) were compared. In HP samples, the abundance of unclassified fungi were the highest, then followed by Aspergillus, Un--s-Clavulinaceae sp. and Un--s-fungal sp. ARIZ L453 respectively. In LP samples, the unclassify fungi were also high, but less than HP. The relative abundance of fungi from high to low was Geastrum, Aspergillus and Mortierella. The Representational Difference Analysis (RDA) showed that different heavy metals influence fungal community diversity and the concentrations of Pb was significantly correlated with fungal community.