Title: Integer cosine transform matrix for picture coding
Abstract: An integer transform matrix is used for implementing a Discrete Cosine Transform (DCT). Optimized values for the integer transform matrix are derived that satisfy certain normalization constraints and that also minimize the frequency distortion in the transform matrix.
Patent Number: 6,990,506 Issued on 01/24/2006 to Sun
| Inventors:
|
Sun; Shijun (Vancouver, WA)
|
| Assignee:
|
Sharp Laboratories of America, Inc. (Camas, WA)
|
| Appl. No.:
|
017722 |
| Filed:
|
December 13, 2001 |
| Current U.S. Class: |
708/402; 382/250 |
| Current Intern'l Class: |
G06F 17/14 (20060101) |
| Field of Search: |
708/402
382/250
|
References Cited [Referenced By]
U.S. Patent Documents
| 5703799 | Dec., 1997 | Ohta.
| |
| 5999656 | Dec., 1999 | Zandi et al.
| |
| 5999957 | Dec., 1999 | Ohta.
| |
| 2004/0046754 | Mar., 2004 | Mayer et al.
| |
| 2004/0126021 | Jul., 2004 | Sull et al.
| |
Other References
Ying-Jui Chen, Soontorn Oraintara, and Truong Nguyen, "Video Compression Using
Integer DCT", Proceedings of IEEE International Conference on Image Processing
(ICIP), 2000, pp. 1-4.
Mathias Wien, Claudia Mayer, and Jens-Rainer Ohm, "Integer Transforms for H.26L
Using Adaptive Block Transforms", ITU-T Q15/SG16, Document Q15/K24, Portland, Oregon,
Aug. 2000, pp. 1-5.
Mathias Wien and Shijun Sun, "ICT Comparison for Adaptive Block Transforms",
ITU-T VCEG-L12, Germany, Jan. 2001, pp. 1-6.
P.P. Vaidyanathan, "Multirate Systems and Filter Banks", Prentice Hall, 1993,
pp. 830-833.
|
Primary Examiner: Malzahn; D. H.
Attorney, Agent or Firm: Marger Johnson & McCollom, P.C.
Parent Case Text
This application claims priority from U.S. Provisional Application Ser. No.
60/255,352, filed Dec. 13, 2000.
Claims
What is claimed is:
1. A system for processing data, comprising:
a processor using a transform matrix:
##EQU9##
to transform the data, where:
n
0=17, n
1=22, n
2=24, n
3=28, n
4=23,
n
5=12, n
6=20, n
7=20,
n
8=17, n
9=12, n
10=12, n
11=16, n
12=7,
n
13=8, n
14=6, and n
15=6.
2. A system according to claim 1 wherein the processor conducts a discrete cosine
transform on the data according to the following:
Cn×m=Tm×Bn×m×TnT,
where B
n×m is an image block of data with n columns and m rows,
T
n and T
m are the horizontal and vertical transform matrices
of size n×n and m×m, respectively, and C
n×m denotes the
cosine transformed n×m image block.
3. A system according to claim 1 wherein the processor conducts an inverse discrete
cosine transform on the data according to the following:
Bn×m=TmT×Cn×m×Tn,
where B
n×m denotes the inverse discrete cosine transformed image
block with n columns and m rows, T
n and T
m represent the
horizontal and vertical integer transform matrices of size n×n and m×m,
respectively, and C
n×m denotes a cosine transformed n×m image block.
4. A system according to claim 1 wherein the system is a device that receives,
stores or transmits image data.
5. A system according to claim 1 including a memory that stores the transform matrix.
6. A system according to claim 5 wherein the memory stores different sized transform
matrices, and the processor applies the different sized transform matrices according
to a block size for a portion of the data being transformed.
7. A system according to claim 1 wherein the transform matrix is used for digital
video coding.
8. An article of manufacture comprising computer-readable media containing instructions
that, when executed or interpreted by a digital processor or cooperating processors,
cause that processor or processors to perform a method of processing data, the
method comprising:
using a transform matrix to process the data where the transform matrix is a
2
m×2
m transform matrix that uses the following normalization
constraints:
##EQU10##
where, norm is an integer representing a normalization factor of the transform
matrix; and
selecting the norm that minimizes a DCT distortion function:
##EQU11##
where d
i=t
i·DCT , t
i is a base vector of
the transform matrix, and DCT is a real Discrete Cosine Transform.
9. The article of manufacture of claim 8 wherein m=16 and the values of the transform
matrix comprise the following:
##EQU12##
where,
n
0=17, n
1=22, n
2=24, n
3=28, n
4=23,
n
5=12, n
6=20, n
7=20,
n
8=17, n
9=12, n
10=12, n
11=16, n
12=7,
n
13=8, n
14=6, and n
15=6.
10. The article of manufacture of claim 8 including:
receiving variable sized macroblocks of image data;
selecting transform matrices corresponding to the variable sized macroblocks; and
applying the selected transform matrices to the macroblocks.
11. The article of manufacture of claim 8 including using different 4×4,
8×8, and 16×16 transform matrices for Discrete Cosine Transforming different
blocks of an image in the data.
12. The article of manufacture of claim 8 including basing the constraints used
for deriving the transform matrix on a Hadamard transform.
Description
BACKGROUND
Integer-based transform matrices are used for transform coding of digital
signals, such as for coding image/video signals. Discrete Cosine Transforms (DCTs)
are widely used in block-based transform coding of image/video signals, and have
been adopted in many Joint Photographic Experts Group (JPEG), Motion Picture Experts
Group (MPEG), and network protocol standards, such as MPEG-1, MPEG-2, H.261, and
H.263. Ideally, a DCT is a normalized orthogonal transform that uses real-value
numbers. This ideal DCT is referred to as a real DCT. Conventional DCT implementations
use floating-point arithmetic that require high computational resources. To reduce
the computational burden, DCT algorithms have been developed that use fix-point
or large integer arithmetic to approximate the floating-point DCT. However, none
of these approaches has been able to guarantee coding reversibility. Coding reversibility
refers to the ability of a transform algorithm to transform a signal and then inverse
transform the transformed signal as closely as possible back to the original signal,
without inducing error into the original signal.
Integer transform techniques have been developed to provide coding reversibility.
Some of these techniques are described in the following documents. U.S. Pat. No.
5,999,957, Ohta, (2000) "Lossless Transform Coding System For Digital Signals,"
assigned to NEC Corporation which is a division of U.S. Pat. No. 5,703,799. U.S.
Pat. No. 5,999,656, Zandi and Schwartz, (1999) "Overlapped Reversible Transforms
for Unified Lossless/Lossy Compression," assigned to Ricoh Corporation. U.S. Pat.
No. 5,703,799, Ohta, (1997) "Lossless Transform Coding System For Digital Signals,"
assigned to NEC Corporation. Ying-Jui Chen, Soontorn Oraintara, and Truong Nguyen,
"Video Compression Using Integer DCT," Proceedings of IEEE International Conference
on Image Processing (ICIP), 2000.
Among these techniques, transform matrices are derived based on a Hadamard
transform to approximate the real DCT. Usually, small integer transform coefficients
are chosen to improve coding efficiency. However, the Hadamard transform is a redundant
transform in the frequency (or real DCT) domain. These transform matrices were
developed without considering the distortion in the frequency domain with respect
to the real DCT. The normalization of the transform matrix is also not considered.
SUMMARY OF THE INVENTION
An integer transform matrix is used for implementing a Discrete Cosine Transform
(DCT). Optimized values for the integer transform matrix are derived that satisfy
certain normalization constraints and that also minimize the frequency distortion
in the transform matrix.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a representation of a 4×4 integer matrix.
FIG. 2 shows a representation of a 8×8 integer matrix.
FIG. 3 shows a representation of a 16×16 integer matrix.
FIG. 4 shows a block diagram that shows an image processing system that uses
the integer matrices shown in FIGS. 1-3.
DETAILED DESCRIPTION
Different sized blocks in an image can be coded with a discrete cosine
transform. For example, 16×16, 16×8, 8×16, 8×8, 8×4, 4×8,
and 4×4 marcroblocks can be transformed using horizontal and vertical transform
matrices of size 4×4, 8×8, and 16×16. The macroblocks are transformed
according to the following:
Cn×m=Tm×Bn×m×TnT,
where B
n×m denotes an image block with n columns and m rows.
The terms T
n and T
m represent the horizontal and vertical
transform matrices of size n×n and m×m, respectively. The term C
n×m
denotes the cosine transformed n×m block.
An orthogonal 4×4 integer matrix T
m=T
4 can be written
based on a Hadamard transform as shown in FIG. 1. The values n
0, n
1,
n
2, and n
3 represent integers. Matrix normalization requires
the constraints shown in equation 1, where norm is an integer representing the
normalization factor of the matrix.
##EQU1##
The real DCT of the base vector t
i is represented as d
i as
shown in equation 2.
di=ti·DCT EQUATION (2)
DCT represents the real DCT matrix, which contains the basis functions of the
real DCT in column vectors. The frequency distortion of the transform matrix with
respect to the real DCT is then defined in equation 3.
##EQU2##
For each factor norm, the optimal transform matrix, if any, is the one giving
the minimum distortion as defined in equation (3). Since a large value is expected
for |d
i(i)|, a matrix with any zero value d
i(i) will be ruled
out. This can be taken as an implicit requirement by the definition.
The integer coefficients of several optimal 4×4 transform matrices satisfying
equation (1) and having the minimum frequency distortion as defined in equation
(3) are listed in the following table. The corresponding matrix with norm of 13
is being used in International Telecommunication Union Standard ITU-T H.26L. The
information shown in the following table reveals that a norm of 13 is the most
reasonable choice.
|
| |
|
DCT |
| n0, n1, n2, n3 |
norm |
distortion |
|
| 13, 17, 13, 7 |
13 |
0.11% |
| 17, 23, 17, 7 |
17 |
4.88% |
| 25, 17, 25, 31 |
25 |
5.47% |
|
An 8×8 transform matrix is shown in FIG. 2. The normalization constraints
in this case are shown in equation 4.
##EQU3##
The corresponding distortion function is shown in equation 5.
##EQU4##
The integer coefficients of an optimal 8×8 orthogonal transform matrix satisfying
the normalization constraints in equation 4 and having the minimum frequency distortion
for equation 5 are listed in the following table.
|
| |
|
DCT |
| n0, n1, . . . , n7 |
norm |
distortion |
|
| 17, 24, 23, 20, 17, 12, 7, 6 |
17 |
6.95% |
|
The matrix with norm of 17 is currently being used in the H.26L ABT Core Experiment.
A 16×16 matrix is shown in FIG. 3. The normalization constraints for the 16×16
matrix are shown in equation 6.
##EQU5##
The corresponding distortion function for the 16×16 matrix is shown in equation
7.
##EQU6##
The integer coefficients for an optimal 16×16 orthogonal transform matrix
were derived using the constraints in equation 6 and minimizing the distortion
function in equation 7. The optimal matrix values derived for the integer coefficients
are shown in the following table.
|
| |
|
DCT |
| n0, n1, . . . , n15 |
Norm |
distortion |
|
| 17, 22, 24, 28, 23, 12, 20, 20, 17, 12, 12, 16, 7, 8, 6, 6 |
17 |
32.77% |
|
In summary, to derive an optimal 2
m×2
m transform matrix,
there should be (m+1) normalization constraints.
##EQU7##
And for a specific norm, the optimal orthogonal matrix should minimize the DCT
distortion function shown in equation 9.
##EQU8##
Equation 9 effectively multiplies the real DCT with the integer cosine transform
matrix. The ideal result is a normalized matrix with zero values everywhere except
along the diagonal of the resultant matrix. The diagonal coefficients being 1's.
Equation 9 identifies the total error generated by the non-zero values in the nondiagonal
coefficients in the resultant between the real DCT and the integer cosine transform.
Since the transform matrix derived in this manner has minimum distortion with
respect to a real DCT, it can be used with many existing DCT-based coding techniques.
For example, the transform matrix can be used with frequency-based or HVS-based
video coding, such as scanning, quantization, and filtering. There is flexibility
in defining the distortion function. For example, weighting factors can be assigned
to each frequency component based on an HVS mapping.
FIG. 4 shows one example of a system
10 that encodes and decodes data
using the optimized transform matrices shown above. The system
10 can be
any computer, video device, camera, network processing device, etc. that processes
data. Data in block
12 can be any type of information that needs to be transformed.
In one example, the system
10 processes video data.
A Discrete Cosine Transform (DCT) in block
14 uses one of more of the
optimized
transform matrices shown above to DCT transform the data from block
12.
The size of integer matrices used on the data depends on size of the blocks the
image data is sectioned into. For example, in one application, image data may be
sectioned into 4×4 bit macroblocks. In the same or another application, it
may be determined that another section of the same image, or a different image,
can be more efficiently coded by using 16×16 bit macroblocks. The DCT block
14 includes a memory that contains the different 4×4, 8×8, and
16×16 optimized transform matrices described above. The transform matrix corresponding
to the image block size is applied to the individual macro blocks in the image data.
The transformed data is quantized in a block
16 and then variable length
coded (VLC) in block
18. The encoded data is either stored in a memory or
transmitted over a communication channel in block
20.
The data is decoded by first inverse variable length coding the data in block
26 and inverse quantizing (IQ) the data in block
25. An Inverse Discrete
Cosine Transform (IDCT) in block
24 uses the optimized integer cosine matrices
for inverse cosine transforming the decoded data. The inverse cosine transform
is implemented by applying inverse integer matrices for the matrices shown in FIGS.
1-3. For example, the inverse cosine transform is generated according to the following:
Bn×m=TmT×Cn×m×Tn,
where B
n×m denotes the inverse transformed image block with
n columns and m rows, T
n and T
m represent the horizontal
and vertical integer transform matrices of size n×n and m×m, respectively,
and C
n×m denotes the cosine transformed n×m image block.
The inverse integer matrices are selected to correspond to the block sizes used
to originally transform the data. The inversed transformed data is then output
as inverse transformed data in block
20.
The system described above can use dedicated processor systems, micro controllers,
programmable logic devices, or microprocessors that perform some or all of the
operations. Some of the operations described above may be implemented in software
and other operations may be implemented in hardware.
For the sake of convenience, the operations are described as various interconnected
functional blocks or distinct software modules. This is not necessary, however,
and there may be cases where these functional blocks or modules are equivalently
aggregated into a single logic device, program or operation with unclear boundaries.
In any event, the functional blocks and software modules or described features
can be implemented by themselves, or in combination with other operations in either
hardware or software. Having described and illustrated the principles of the invention
in a preferred embodiment thereof, it should be apparent that the invention may
be modified in arrangement and detail without departing from such principles. Claim
is made to all modifications and variation coming within the spirit and scope of
the following claims.
*