北京大学学报(自然科学版)

一种新的H.264/AVC标量量化并行VLSI结构

彭春干1,于敦山1,曹喜信2,盛世敏1   

  1. 1北京大学信息科学技术学院微电子学系SoC试验室,北京100871;2北京大学软件微电子学院,北京102600;
  • 收稿日期:2007-06-11 出版日期:2008-07-20 发布日期:2008-07-20

A Novel Parallel VLSI Architecture for H.264/AVC Scalar Quantization

PENG Chungan1 YU Dunshan1, CAO Xixin2, SHENG Shimin1   

  1. 1SoC Lab, Department of Microelectronics, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871; 2School of Software and Microelectronics, Peking University, Beijing 102600;
  • Received:2007-06-11 Online:2008-07-20 Published:2008-07-20

摘要: 针对H.264视频编码标准关键技术52级标量量化的VLSI实现过程中,传统结构的速度和面积不能有效满足H.264在高速高并行编码应用中的实时要求,通过采用部分CSD码无符号压缩移位加法树、参考电平连线、对量化系数和步长重新进行分组分段编码等方法,有效替代了H.264标量量化过程中出现的矩阵乘法、查表、除法等不利于硬件加速的算法,提出了一种非常适合流水加速的基于4×4块并行的VLSI结构,通过控制级联加法器级数就可以有效调节其速度性能,当级数为2时,其块处理速率可以达到121.6MHz, 能够满足4096×2304@120Hz视频的实时处理要求。该结构在面积和功耗方面较传统结构也有较大的改进,采用SMIC 0.13μm工艺单元库,综合时钟频率设为100MHz时,等效门和功耗分别节省了38%和30%。

关键词: H.264, VLSI结构, 视频编码

Abstract: 52-level scalar quantization technology plays an important role in H.264/AVC. A novel parallel VLSI architecture is proposed for its hardware implementation, in which the 4×4 matrix multiplications is replaced by 16 unsigned compressed shift-adder-trees using partial CSD code scheme, switching reference wirings substitutes for look-up operation, and division is also avoided effectively, and no ROM or RAM is adopted in the overall quantizer. It can fulfill all the quantization calculations for all H.264 hybrid transform in 4×4 block parallelism. Its block throughput can reach 121.6MHz, which can meet the real-time requirement for 4096×2304@120Hz (119.43936M/s) video compression. Compared with the conventional architecture, 38% cost and 30% power are saved. Considering speed and cost optimization, this architecture is very suitable for pipeline acceleration, and it is a useful IP for high resolution H.264 encoder VLSI realization.

Key words: H.264, VLSI, video coding

中图分类号: