Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2020, Vol. 56 ›› Issue (1): 155-163.DOI: 10.13209/j.0479-8023.2019.035

Previous Articles     Next Articles

Improving One-Class Classification of Remote Sensing Data by Using Active Learning: A Case Study of Positive and Unlabeled Learning

SUN Yi, LI Peijun   

  1. Institute of Remote Sensing and Geographic Information System, Peking University, Beijing 100871
  • Received:2019-01-24 Revised:2019-05-06 Online:2020-01-20 Published:2020-01-20
  • Contact: LI Peijun, E-mail: pjli(at)pku.edu.cn

利用主动学习改进遥感图像单类分类: 以正类和未标记样本学习方法为例

孙熠, 李培军   

  1. 北京大学遥感与地理信息系统研究所, 北京100871
  • 通讯作者: 李培军, E-mail: pjli(at)pku.edu.cn
  • 基金资助:
    国家自然科学基金(41371329)资助

Abstract:

To address the problem that quality and quantity of training samples directly affect accuracy of oneclass classification (OCC) methods, this paper investigates the use of active learning in selection of training samples of target class (positive samples) for improving the performance of OCC, by taking positive and unlabeled learning (PUL) as an example. PUL is first trained with sufficient training samples selected randomly until a stable accuracy is reached. Most informative positive and negative training samples collected by using active learning strategy are then added in PUL classification. The experimental results show that after sufficient samples are used for classification, the use of positive samples selected by using active learning still outperformed that using sufficient positive samples selected randomly. PUL classification by adding both positive and negative samples outperformed that by adding positive samples only. Furthermore, PUL classification using positive samples after removal of redundant positive samples from those directly selected by active learning obtains accuracy comparable to that using more positive samples directly selected by active learning, whereas less samples are needed in the case that redundant samples are removed. This study demonstrates that selecting and adding samples by active learning provides a more effective way of improving accuracy for OCC.

Key words: one-class classification, active learning, positive and unlabeled learning (PUL)

摘要:

针对单类分类方法中只用正类训练样本导致训练样本数量和质量的选择直接影响分类结果精度的问题, 以正类和未标记样本学习(PUL)为例, 研究如何利用主动学习选择训练样本, 以求改善单类分类的精度。首先用随机选取的训练样本进行PUL分类, 直到获得稳定的分类精度, 然后利用主动学习选择和增加最有用(informative)的正类或负类样本, 用于PUL分类。结果表明, 当利用足够多的随机选取的正类样本得到稳定的分类精度后, 利用主动学习选择和增加正类样本可以提高分类精度; 利用主动学习的同时加入正类和负类样本, 可以得到比只加入正类样本更高的分类精度; 将利用主动学习得到的正类样本经相似性筛选后得到的正类样本, 分类精度与直接利用主动学习选择的样本相似, 但达到同样精度时需要更少的样本。因此, 利用主动学习选择和增加样本可以有效地改善单类分类的精度。

关键词: 单类分类, 主动学习, 正类和未标记样本学习(PUL)