A Multitransform Architecture for H.264/AVC High-Profile Coders

Abstract

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> This paper presents a high-throughput, cost-effective implementation of six different integer transforms in the H.264/AVC high-profile coders, i.e., 4 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex></formula> 4 forward, 4 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex></formula> 4 inverse, forward Hadamard, inverse Hadamard, 8 <formula formulatype="inline"> <tex Notation="TeX">$\times$</tex></formula> 8 forward, and 8 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 8 inverse transform, all integrated as a shared hardware. The 4 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 4 transform matrices are regularized by using permutation, partitioned into 2 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 2 blocks, and factored for maximal hardware sharing. By using two types of 4 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 4 transform matrices included in an 8 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex></formula> 8 transform matrix, two different 8 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 8 transforms are both described as three steps and unified with minor modification. To improve throughput of the transform, two independent 4 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex> </formula> 4 transform blocks within the 8 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex></formula> 8 transform block operate in parallel in the 4 <formula formulatype="inline"> <tex Notation="TeX">$\times$</tex></formula> 4 transform mode, while the two-stage pipelined architecture is used in the 8 <formula formulatype="inline"><tex Notation="TeX">$\times$</tex></formula> 8 transform mode. Using 0.18-<formula formulatype="inline"><tex Notation="TeX">${\mu}{\rm m}$</tex></formula> CMOS technology, the maximum operating frequency of the proposed multitransform architecture is 200 MHz, which achieves 4.1 Gpixels/sec throughput rate with the hardware cost of 63618 gates. Compared with existing designs, the proposed design delivers at least 54% higher throughput at 38% higher throughput/area ratio in Adaptive Block-size Transform (ABT) mode. </para>

References

Page 1

	Year	Citations

Page 1