北京大学学报(自然科学版)

基于多级特征提取的中文文本图像压缩算法

胡奎,汤帜,高良才   

  1. 北京大学计算机科学技术研究所, 北京 100871;
  • 收稿日期:2009-10-23 出版日期:2010-11-20 发布日期:2010-11-20

Chinese Textual Image Compression Based on Multi-feature Extraction

HU Kui, TANG Zhi, GAO Liangcai   

  1. Institute of Computer Science and Technology, Peking University, Beijing 100871;
  • Received:2009-10-23 Online:2010-11-20 Published:2010-11-20

摘要: 针对中文文本图像的特点, 提出了一种改进的压缩算法MC-JBIG2 。该算法首先对中文字符进行多级特征提取, 然后将提取到的特征数据用于一个级联聚类算法中以替代传统 JBIG2 中的模式匹配过程。实验表明, MC-JBIG2 改进了传统 JBIG2 算法对中文文本图像压缩的不足, 能够保证在内容无损的情况下提高对中文文本图像的压缩率, 同时该算法对英文本图像的压缩也有一定的改进。

关键词: 文本图像压缩, 模式匹配, 聚类, JBIG2

Abstract: Based on the characteristics of Chinese textual images, an improved compression algorithm MC-JBIG2 is developed. First, multilevel feature data of Chinese characters are extracted; then the data are used in a cascaded clustering algorithm to replace the pattern matching procedure of JBIG2. Experiment results show MC-JBIG2 is highly efficient and can improve compression ratio in the criterion of not involving substitution errors. Despite designed for Chinese textual images, MC-JBIG2 can also improve compression ratio of English textual images.

Key words: text( textual) image compression, pattern matching, clustering, JBIG2

中图分类号: