北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 49-57.DOI: 10.13209/j.0479-8023.2016.025

上一篇    下一篇

一种汉字笔画自动提取基准测试库

陈旭东, 连宙辉, 唐英敏, 肖建国   

  1. 北京大学计算机科学技术研究所, 北京 100871
  • 收稿日期:2105-06-19 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 连宙辉, E-mail: lianzhouhui(at)pku.edu.cn
  • 基金资助:
    国家自然科学基金(61202230, 61472015)、863 计划(2014AA015102)和北京市自然科学基金(4152022)资助

A Benchmark for Stroke Extraction of Chinese Characters

CHEN Xudong, LIAN Zhouhui, TANG Yingmin, XIAO Jianguo#br#   

  1. Institute of Computer Science and Technology, Peking University, Beijing 100871
  • Received:2105-06-19 Online:2016-01-20 Published:2016-01-20
  • Contact: LIAN Zhouhui, E-mail: lianzhouhui(at)pku.edu.cn

摘要:

构建一个提供评测工具的笔画基准测试库, 其中包含一个人工搭建的笔画数据库, 该数据库拥有4种字体的汉字图像以及对应的人工提取的笔画信息。通过比较算法自动提取的笔画结果和数据库中的标准笔画之间的差异, 测试库可以评测笔画自动提取算法的性能。还提出一种新的基于Delaunay三角剖分的方法, 可以有效地从汉字图像中提取出笔画信息。在测试库中对现有的3 种笔画提取方法进行比较, 实验数据表明, 所提出的笔画基准测试库能够对笔画提取算法给出有效的评测, 并且新的算法在汉字笔画提取的性能中效率较高。

关键词: 基准测试库, 笔画提取, 汉字

Abstract:
Abstract This paper presents a benchmark, which includes a manually-constructed database and evaluation tools.
Specifically, the database contains a number of images of Chinese characters represented in four commonly-used
font styles and corresponding stroke images manually segmented from character images. Performance of a given
stroke extraction method can be evaluated by calculating dissimilarities of the automatic segmentation results and
the ground truth using two specially-designed metrics. Moreover, the authors also propose a new method based on
Delaunay triangulation to effectively extract strokes from Chinese characters. Experimental results obtained by
comparing three algorithms demonstrate that the benchmark works well for the evaluation of stroke extraction
approaches and the proposed method performs considerably well in the application of stroke extraction for Chinese
characters.

Key words: benchmark, stroke extraction, Chinese characters

中图分类号: