Title: Systems and methods for enhanced error concealment in a video decoder
Abstract: The invention is related to methods and apparatus that conceal errors in images of a corrupted video bitstream. One embodiment conceals errors in a missing or corrupted intra-coded macroblock by linearly interpolating data from other macroblocks that correspond to portions of the image above and below the missing or corrupted macroblock. One embodiment can utilize substitute motion vectors for a missing or corrupted predictive-coded macroblock. Another embodiment doubles the received motion vectors and references the doubled motion vectors to a previous-previous frame. Another embodiment adaptively selects which concealment or reconstruction technique is applied according to projected error estimates. Another embodiment conceals errors by replacing corrupted or missing data by combining concealment data in a weighted sum to reduce an estimated error.
Patent Number: 6,990,151 Issued on 01/24/2006 to Kim,   et al.
| Inventors:
|
Kim; Chang-Su (Seoul, KR);
Kim; Jong Won (GwangJu, KR);
Katsavounidis; Ioannis (Pasadena, CA)
|
| Assignee:
|
Intervideo, Inc. (Fremont, CA)
|
| Appl. No.:
|
092366 |
| Filed:
|
March 5, 2002 |
| Current U.S. Class: |
375/240.27 |
| Current Intern'l Class: |
H04B 1/66 (20060101) |
| Field of Search: |
375/24012-24029
345/619
|
References Cited [Referenced By]
U.S. Patent Documents
| 5436664 | Jul., 1995 | Henry.
| |
| 5442400 | Aug., 1995 | Sun et al.
| |
| 5502573 | Mar., 1996 | Fujinami.
| |
| 5568200 | Oct., 1996 | Pearlstein et al.
| |
| 5841477 | Nov., 1998 | Kim.
| |
| 5912707 | Jun., 1999 | Kim.
| |
| 5936674 | Aug., 1999 | Kim.
| |
| 5995171 | Nov., 1999 | Enari et al.
| |
| 6141448 | Oct., 2000 | Khansari et al.
| |
| 6148026 | Nov., 2000 | Puri et al.
| |
| 6704363 | Mar., 2004 | Kim.
| |
| 6768495 | Jul., 2004 | Valente.
| |
| 2002/0181594 | Dec., 2002 | Katsavounidis et al.
| |
| 2003/0012285 | Jan., 2003 | Kim.
| |
| 2003/0012287 | Jan., 2003 | Katsavounidis et al.
| |
Other References
U.S. Appl. No. 10/092,376, filed Mar. 5, 2002, Katsavounidis, et al.
U.S. Appl. No. 10/092,353, filed Mar. 5, 2002, Katsavounidis, et al.
U.S. Appl. No. 10/092,384, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,340, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,375, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,345, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,392, filed Mar. 5, 2002, Katsavounidis, et al.
U.S. Appl. No. 10/092,383, filed Mar. 5, 2002, Zhao, et al.
U.S. Appl. No. 10/092,373, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,339, filed Mar. 5, 2002, Kim, et al.
U.S. Appl. No. 10/092,394, filed Mar. 5, 2002, Katsavounidis, et al.
M. Budagavi and J. D. Gibson, "Error propagation in motion compensated video
over wireless channels," in Proc. ICIP'97, vol. 2, Oct. 1997, pp. 89-92.
JinGyeong Kim, JongWon Kim and C.-C. Jay Kuo, "An Integrated AIR/UEP Scheme for
Robust Video Transmission with a Corruption Model" Paper presented at ITCOM 2001
(Aug. 2001).
Chang-Su Kim, Ph.D. Thesis, "On the Techniques for Robust Transmission of Video
Sequence over Noisy Channel" Graduate School of Seoul National University, Department
of Electrical Engineering, Aug. 2000 pp. 1-140.
Seung Hwan Kim, Chang-Su Kim, and Sang-Uk Lee, "Enhanced motion compensation
algorithm based on second-order prediction," Proc. ICIP-2000, vol. 2, pp. 875-878,
Sep. 2000.
Chang-Su Kim, Rin-Chul Kim, and Sang-Uk Lee, "Robust transmission of video sequence
using double-vector motion compensation," IEEE Transactions on Circuits and Systems
for Video Technology, vol. 11, No. 9, pp. 1011-1021, Sep. 2001.
Lifeng Zhao, Jitae Shin, JongWon Kim, and C.-C. Jay Kuo, "FGS MPEG-4 video streaming
with constant quality rate adaptation, prioritized packetization and differentiated
forward," in Proc. SPIE ITCOM 2001: Video Technologies for Multimedia Applications,
Denver, CO, Aug. 2001.
Lifeng Zhao, JongWon Kim, and C.-C. Jay Kuo, "MPEG-4 FGS video streaming with
constant-quality rate control and differentiated forwarding," in Proc. SPIE Visual
Communications and Image Processing 2002, San Jose, CA, Jan. 2002.
|
Primary Examiner: Vo; Tung
Attorney, Agent or Firm: Rosenberg, Klein & Lee
Parent Case Text
RELATED APPLICATION
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional
Application No. 60/273,443, filed Mar. 5, 2001; U.S. Provisional Application No.
60/275,859, filed Mar. 14, 2001; and U.S. Provisional Application No. 60/286,280,
filed Apr. 25, 2001, the entireties of which are hereby incorporated by reference.
Claims
What is claimed is:
1. A method of adaptively producing a video image comprising:
receiving video data for a frame; determining whether the video data is intra-coded
or predictive-coded; when the video data is intra-coded: determining whether the
intra-coded video data corresponds to an error; concealing the error when the intra-coded
video data corresponds to the error; setting an error value that is associated
with at least a portion of the video packet to a first predetermined value when
the intra-coded video data corresponds to the error; resetting the error value
when no error for the intra-coded video data is detected; and using the intra-coded
video data when no error for the intra-coded video data is detected; when the video
data is predictive-coded, determining whether the predictive-coded video data corresponds
to an error; when the predictive-coded video data corresponds to an error: using
the predictive-coded video data when no error for the predictive-coded video data
is detected and the associated error value is reset; projecting a first estimated
error corresponding to use of the predictive-coded video data when no error is
detected for the predictive-coded video data and the associated error value is
not reset; projecting a second estimated error corresponding to use of a first
predictive-coded error concealment technique when no error is detected for the
predictive-coded video data and the associated error value is not reset; selecting
between the use of the predictive-coded video data and the use of the first predictive-coded
error concealment technique based on a comparison between the first projected estimated
error and the second projected estimated error; and updating the error value according
to which of the predictive-coded video data and the first predictive-coded error
concealment technique is selected; and when the predictive-coded video data corresponds
to an error: applying a second predictive-coded error concealment technique; and
updating the error value according to the second predictive-coded error concealment technique.
2. The method as defined in claim 1, wherein the first predictive-coded error
concealment technique and the second predictive-coded error concealment technique
are the same.
3. The method as defined in claim 1, wherein the projecting a second estimated
error further comprises projecting a plurality of estimated errors corresponding
to a plurality of error concealment techniques for predictive coding, and wherein
the selecting between the use of the predictive-coded video data and the use of
the predictive-coded error concealment technique further comprises selecting among
the use of the predictive-coded video data and the use of an error concealment
technique from the plurality of error concealment techniques based on the corresponding
estimated error projections.
4. The method as defined in claim 1, wherein the applying the second predictive-coded
error concealment technique further comprises:
projecting a plurality of estimated errors corresponding to a plurality of error
concealment techniques for predictive coding; using the projected estimate errors
to select among the plurality of error concealment techniques; applying the selected
error concealment technique; and adjusting the error value according to the selected
error concealment technique.
5. The method as defined in claim 1, wherein the video data is a macroblock.
6. The method as defined in claim 1, wherein the video data is a video object
plane (VOP).
7. The method as defined in claim 1, wherein the video data is a frame.
8. The method as defined in claim 1, further comprising normalizing the error
value to a range between 0 to 255.
9. The method as defined in claim 1, further comprising multiplying the error
value with a leaky value that has a value of less than 1 in response to an advancement
in a frame sequence.
10. The method as defined in claim 9, wherein the leaky value is about 0.93.
11. The method as defined in claim 1, further comprising maintaining the error
value in a memory array, wherein an error value in the array is associated with
at least one pixel in the image.
12. The method as defined in claim 1, further comprising maintaining the error
value in a memory array, wherein each pixel in the image is associated with an
error value in the array.
13. A method of producing a video image comprising: receiving data for a video
frame; determining whether the video frame is a predictive-coded frame or is an
intra-coded frame; performing the following when the video frame is the predictive-coded
frame: determining whether a group of video data from the video frame corresponds
to an error; when there is no error in the group of video data: determining whether
the group of video data is intra-coded or predictive-coded; intra-decoding the
group of video data when the group of video data is intra coded; resetting an error
variance associated with at least a portion of the group of video data when the
group of video data is intra coded; using a first weighted sum to reconstruct a
portion of an image corresponding to the group of video data when the video data
is intra coded, where the first weighted sum combines results of at least a first
and a second technique; and updating the error variance according to the first
weighted sum used to reconstruct the portion of the image; and when there is an
error in the group of video data: concealing the error in the portion of the image
corresponding to the group of video data; and updating the error variance according
to the error concealment.
14. The method as defined in claim 13, wherein the group of video data comprises
a macroblock.
15. The method as defined in claim 13, wherein the group of video data comprises
a video object plane (VOP).
16. The method as defined in claim 13, wherein the group of video data comprises
missing data.
17. The method as defined in claim 13, wherein the concealing the error further
comprises using a second weighted sum to conceal the portion of the image corresponding
to the group of video data, where the second weighted sum combines results of at
least at least two error concealing techniques.
18. The method as defined in claim 13, wherein the first weighted sum weighs
the results of the first and the second technique according to values that are
related to inverses of expected errors of the first and the second techniques.
19. The method as defined in claim 13, wherein the first technique comprises
constructing the portion of the image from a first reference portion of a previous
frame and the second technique comprises constructing the portion of the image
from a second reference portion of a frame that is prior to the previous frame.
20. The method as defined in claim 13, wherein the second weighted sum weighs
the results of the third and the fourth error concealing techniques according to
inverses of expected errors of the third and the fourth error concealing techniques, respectively.
21. The method as defined in claim 13, when the video frame is the predictive-coded
frame, further comprising: receiving a next group of video data; and continuing
execution of the method until the groups of video data are processed.
22. The method as defined in claim 13, further comprising: performing the following
when the video frame is the intra-coded frame: determining whether a group of data
from the corresponds to an error; when there is no error in the group of video
data: intra-decoding the group of video data; and resetting an error variance associated
with at least a portion of the group of video data; and when there is error in
the group of video data: concealing the error in the portion of the image corresponding
to the group of video data; and setting the error variance to a predetermined value.
23. The method as defined in claim 22, when the video frame is the intra-coded
frame, further comprising: receiving a next group of video data; and continuing
execution of portions of the method corresponding to groups of data in an intra-decoded
frame until the groups of video data are processed.
Description
APPENDIX A
Appendix A, which forms a part of this disclosure, is a list of commonly
owned copending U.S. patent applications. Each one of the applications listed in
Appendix A is hereby incorporated herein in its entirety by reference thereto.
COPYRIGHT RIGHTS
A portion of the disclosure of this patent document contains material which is
subject to copyright protection. The copyright owner has no objection to the facsimile
reproduction by any one of the patent document or the patent disclosure, as it
appears in the Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention is related to video decoding techniques. In particular, the invention
relates to systems and methods of concealing errors in images of a corrupted video bitstream.
2. Description of the Related Art
A variety of digital video compression techniques have arisen to transmit or
to
store a video signal with a lower bandwidth or with less storage space. Such video
compression techniques include international standards, such as H.261, H.263, H.263+,
H.263++, H.26L, MPEG-1, MPEG-2, MPEG-4, and MPEG-7. These compression techniques
achieve relatively high compression ratios by discrete cosine transform (DCT) techniques
and motion compensation (MC) techniques, among others. Such video compression techniques
permit video bitstreams to be efficiently carried across a variety of digital networks,
such as wireless cellular telephony networks, computer networks, cable networks,
via satellite, and the like.
Unfortunately for users, the various mediums used to carry or transmit
digital video signals do not always work perfectly, and the transmitted data can
be corrupted or otherwise interrupted. Such corruption can include errors, dropouts,
and delays. Corruption occurs with relative frequency in some transmission mediums,
such as in wireless channels and in asynchronous transfer mode (ATM) networks.
For example, data transmission in a wireless channel can be corrupted by environmental
noise, multipath, and shadowing. In another example, data transmission in an ATM
network can be corrupted by network congestion and buffer overflow.
Corruption in a data stream or bitstream that is carrying video can cause
disruptions to the displayed video. Even the loss of one bit of data can result
in a loss of synchronization with the bitstream, which results in the unavailability
of subsequent bits until a synchronization codeword is received. These errors in
transmission can cause frames to be missed, blocks within a frame to be missed,
and the like. One drawback to a relatively highly compressed data stream is an
increased susceptibility to corruption in the transmission of the data stream carrying
the video signal.
Those in the art have sought to develop techniques to mitigate against the
corruption of data in the bitstream. For example, error concealment techniques
can be used in an attempt to hide errors in missing or corrupted blocks. However,
conventional error concealment techniques can be relatively crude and unsophisticated.
In another example, forward error correction (FEC) techniques are used to recover
corrupted bits, and thus reconstruct data in the event of corruption. However,
FEC techniques disadvantageously introduce redundant data, which increases the
bandwidth of the bitstream for the video or decreases the amount of effective bandwidth
remaining for the video. Also, FEC techniques are computationally complex to implement.
In addition, conventional FEC techniques are not compatible with the international
standards, such as H.261, H.263, MPEG-2, and MPEG-4, but instead, have to be implemented
at a higher, "systems" level.
SUMMARY OF THE INVENTION
The invention is related to methods and apparatus that conceal errors in images
of a corrupted video bitstream. One embodiment conceals errors in a missing or
corrupted intra-coded macroblock by linearly interpolating data from other macroblocks
that correspond to portions of the image above and below the missing or corrupted
macroblock. One embodiment can utilize substitute motion vectors for a missing
or corrupted predictive-coded macroblock. Another embodiment doubles the received
motion vectors and references the doubled motion vectors to a previous-previous
frame. Another embodiment adaptively selects which concealment or reconstruction
technique is applied according to projected error estimates. Another embodiment
conceals errors by replacing corrupted or missing data by combining concealment
data in a weighted sum to reduce an estimated error.
One embodiment of the invention includes a video decoder that conceals errors
received in a video bitstream, the video decoder comprising an error detection
circuit adapted to detect errors in the video bitstream; a memory device configured
to provide an indication of an error in a portion of a video bitstream corresponding
to a portion in an image; a control circuit configured to be responsive to an indication
of the error in a first portion of the image, where the control circuit is further
configured to detect if a second portion above the first portion in the image and
if a third portion below the first portion in the image are error-free, where the
control circuit is further configured to interpolate between corresponding data
in the second portion of the image and corresponding data in the third portion
of the data to conceal the error.
Another embodiment according to the invention includes a video decoder that
adaptively conceals errors received in a video bitstream, the video decoder comprising:
a memory module adapted to maintain error values for selected portions of an image;
a plurality of error resilience modules that generate images in response to errors;
a prediction module adapted to generate a plurality of predictions of error values
corresponding to the plurality of error resilience modules; a control module adapted
receive an indication of an error in the video bitstream and, in response, to select
an error resilience module from the error resilience module based on a comparison
of the predictions of error values.
One embodiment of the invention includes a video decoder that conceals errors
received in a video bitstream, the video decoder comprising: a memory module adapted
to maintain error variances for selected portions of an image; a plurality of error
resilience modules that generate images in response to errors; a prediction module
adapted to generate a plurality of weights corresponding to the plurality of error
resilience modules; a control module adapted receive an indication of an error
in the video bitstream and, in response, to combine outputs of selected error resilience
modules with the weights from the prediction module to conceal the error.
One embodiment of the invention includes an optimizer circuit that selectively
applies an error concealment technique from among a plurality of error concealment
techniques comprising: means for maintaining an estimated error relating to at
least a portion of an image; means for using the estimated error to generate a
plurality of projected error estimates corresponding to application of an error
concealment technique; and means for selecting the error concealment technique
that provides the lowest projected error estimate.
One embodiment of the invention includes a method of concealing errors in a video
decoder comprising: detecting an error in a first portion of a video bitstream
that is intra-coded; determining that a second portion of an image above the first
portion and a third portion of the image below the first portion are not corrupted;
and interpolating pixels in the first portion between a first horizontal row of
pixels in the second portion and a second horizontal row of pixels in the third
portion to conceal errors when the second portion and the third portion are not corrupted.
One embodiment of the invention includes a method of concealing errors in a video
decoder comprising: detecting an error in a first portion of a video bitstream
that is predictive-coded; providing a substitute motion vector when the error relates
to a standard motion vector; using a first reference portion of a previous frame
with the substitute motion vector to reconstruct when the first reference portion
is available; and using a second reference portion of a second frame that is prior
to the previous frame when the first reference portion of the previous frame is
not available.
One embodiment of the invention includes a method of adaptively producing a video
image comprising: receiving video data for a frame; determining whether the video
data is intra-coded or predictive-coded; when the video data is intra-coded: determining
whether the intra-coded video data corresponds to an error; concealing the error
when the intra-coded video data corresponds to the error; setting an error value
that is associated with at least a portion of the video packet to a first predetermined
value when the intra-coded video data corresponds to the error; resetting the error
value when no error for the intra-coded video data is detected; and using the intra-coded
video data when no error for the intra-coded video data is detected; when the video
data is predictive-coded, determining whether the predictive-coded video data corresponds
to an error; when the predictive-coded video data corresponds to an error: using
the predictive-coded video data when no error for the predictive-coded video data
is detected and the associated error value is reset; projecting a first estimated
error corresponding to use of the predictive-coded video data when no error is
detected for the predictive-coded video data and the associated error value is
not reset; projecting a second estimated error corresponding to use of a first
predictive-coded error concealment technique when no error is detected for the
predictive-coded video data and the associated error value is not reset; selecting
between the use of the predictive-coded video data and the use of the first predictive-coded
error concealment technique based on a comparison between the first projected estimated
error and the second projected estimated error; and updating the error value according
to which of the predictive-coded video data and the first predictive-coded error
concealment technique is selected; and when the predictive-coded video data corresponds
to an error: applying a second predictive-coded error concealment technique; and
updating the error value according to the second predictive-coded error concealment technique.
One embodiment of the invention includes a method of producing a video image
comprising: receiving data for a video frame; determining whether the video frame
is a predictive-coded frame or is an intra-coded frame; performing the following
when the video frame is the predictive-coded frame: determining whether a group
of video data from the video frame corresponds to an error; when there is no error
in the group of video data: determining whether the group of video data is intra-coded
or predictive-coded; intra-decoding the group of video data when the group of video
data is intra coded; resetting an error variance associated with at least a portion
of the group of video data when the group of video data is intra coded; using a
first weighted sum to reconstruct a portion of an image corresponding to the group
of video data when the video data is intra coded, where the first weighted sum
combines results of at least a first and a second technique; and updating the error
variance according to the first weighted sum used to reconstruct the portion of
the image; and when there is an error in the group of video data: concealing the
error in the portion of the image corresponding to the group of video data; and
updating the error variance according to the error concealment.
One embodiment of the invention includes a method of selecting an error concealment
technique from among a plurality of error concealment techniques comprising: maintaining
an estimated error relating to at least a portion of an image; using the estimated
error to generate a plurality of projected error estimates corresponding to application
of an error concealment technique; and selecting the error concealment technique
that provides the lowest projected error estimate.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features of the invention will now be described with reference
to the drawings summarized below. These drawings and the associated description
are provided to illustrate preferred embodiments of the invention and are not intended
to limit the scope of the invention.
FIG. 1 illustrates a networked system for implementing a video distribution
system in accordance with one embodiment of the invention.
FIG. 2 illustrates a sequence of frames.
FIG. 3 is a flowchart generally illustrating a process of concealing errors
or missing data in a video bitstream.
FIG. 4 illustrates a process of temporal concealment of missing motion vectors.
FIG. 5 is a flowchart generally illustrating a process of adaptively concealing
errors in a video bitstream.
FIG. 6 is a flowchart generally illustrating a process that can use weighted
predictions to compensate for errors in a video bitstream.
FIG. 7A illustrates a sample of a video packet with DC and AC components for
an I-VOP.
FIG. 7B illustrates a video packet for a P-VOP.
FIG. 8 illustrates an example of discarding a corrupted macroblock.
FIG. 9 is a flowchart that generally illustrates a process according to an embodiment
of the invention of partial RVLC decoding of discrete cosine transform (DCT) portions
of corrupted packets
FIGS. 10-13 illustrate partial RVLC decoding strategies.
FIG. 14 illustrates a partially corrupted video packet with at least one intra-coded macroblock.
FIG. 15 illustrates a sequence of macroblocks with AC prediction.
FIG. 16 illustrates a bit structure for an MPEG-4 data partitioning packet.
FIG. 17 illustrates one example of a tradeoff between block error rate (BER)
correction capability versus overhead.
FIG. 18 illustrates a video bitstream with systematic FEC data.
FIG. 19 is a flowchart generally illustrating a process of decoding systematically
encoded FEC data in a video bitstream.
FIG. 20 is a block diagram generally illustrating one process of using a ring
buffer in error resilient decoding of video data.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Although this invention will be described in terms of certain preferred
embodiments, other embodiments that are apparent to those of ordinary skill in
the art, including embodiments that do not provide all of the benefits and features
set forth herein, are also within the scope of this invention. Accordingly, the
scope of the invention is defined only by reference to the appended claims.
The display of video can consume a relatively large amount of bandwidth, especially
when the video is displayed in real time. Moreover, when the video bitstream is
wirelessly transmitted or is transmitted over a congested network, packets may
be lost or unacceptably delayed. Even when a packet of data in a video bitstream
is received, if the packet is not timely received due to network congestion and
the like, the packet may not be usable for decoding of the video bitstream in real
time. Embodiments of the invention advantageously compensate for and conceal errors
that occur when packets of data in a video bitstream are delayed, dropped, or lost.
Some embodiments reconstruct the original data from other data. Other embodiments
conceal or hide the result of errors so that a corresponding display of the video
bitstream exhibits relatively fewer errors, thereby effectively increasing the
signal-to-noise ratio (SNR) of the system. Further advantageously, embodiments
of the invention can remain downward compatible with video bitstreams that are
compliant with existing video encoding standards.
FIG. 1 illustrates a networked system for implementing a video distribution
system in accordance with one embodiment of the invention. An encoding computer
102 receives a video signal, which is to be encoded to a relatively compact
and robust format. The encoding computer
102 can correspond to a variety
of machine types, including general purpose computers that execute software and
to specialized hardware. The encoding computer
102 can receive a video sequence
from a wide variety of sources, such as via a satellite receiver
104, a
video camera
106, and a video conferencing terminal
108. The video
camera
106 can correspond to a variety of camera types, such as video camera
recorders, Web cams, cameras built into wireless devices, and the like. Video sequences
can also be stored in a data store
110. The data store
110 can be
internal to or external to the encoding computer
102. The data store
110
can include devices such as tapes, hard disks, optical disks, and the like. It
will be understood by one of ordinary skill in the art that a data store, such
as the data store
110 illustrated in FIG. 1, can store unencoded video,
encoded video, or both. In one embodiment, the encoding computer
102 retrieves
unencoded video from a data store, such as the data store
110, encodes the
unencoded video, and stores the encoded video to a data store, which can be the
same data store or another data store. It will be understood that a source for
the video can include a source that was originally taken in a film format.
The encoding computer
102 distributes the encoded video to a receiving
device, which decodes the encoded video. The receiving device can correspond to
a wide variety of devices that can display video. For example, the receiving devices
shown in the illustrated networked system include a cell phone
112, a personal
digital assistant (PDA)
114, a laptop computer
116, and a desktop
computer
118. The receiving devices can communicate with the encoding computer
102 through a communication network
120, which can correspond to
a variety of communication networks including a wireless communication network.
It will be understood by one of ordinary skill in the art that a receiving device,
such as the cell phone
112, can also be used to transmit a video signal
to the encoding computer
102.
The encoding computer
102, as well as a receiving device or decoder, can
correspond to a wide variety of computers. For example, the encoding computer
102
can be any microprocessor or processor (hereinafter referred to as processor) controlled
device, including, but not limited to a terminal device, such as a personal computer,
a workstation, a server, a client, a mini computer, a main-frame computer, a laptop
computer, a network of individual computers, a mobile computer, a palm top computer,
a hand held computer, a set top box for a TV, an interactive television, an interactive
kiosk, a personal digital assistant (PDA), an interactive wireless communications
device, a mobile browser, a Web enabled cell phone, or a combination thereof. The
computer may further possess input devices such as a keyboard, a mouse, a trackball,
a touch pad, or a touch screen and output devices such as a computer screen, printer,
speaker, or other input devices now in existence or later developed.
The encoding computer
102, as well as a decoder, described can correspond
to a uniprocessor or multiprocessor machine. Additionally, the computers can include
an addressable storage medium or computer accessible medium, such as random access
memory (RAM), an electronically erasable programmable read-only memory (EEPROM),
hard disks, floppy disks, laser disk players, digital video devices, Compact Disc
ROMs, DVD-ROMs, video tapes, audio tapes, magnetic recording tracks, electronic
networks, and other techniques to transmit or store electronic content such as,
by way of example, programs and data. In one embodiment, the computers are equipped
with a network communication device such as a network interface card, a modem,
Infra-Red (IR) port, or other network connection device suitable for connecting
to a network. Furthermore, the computers execute an appropriate operating system,
such as Linux, Unix, Microsoft® Windows® 3.1, Microsoft® Windows®
95, Microsoft® Windows® 98, Microsoft® Windows® NT, Microsoft®
Windows® 2000, Microsoft® Windows® Microsoft® Windows®
XP, Apple® MacOS®, IBM® OS/2®, Microsoft® Windows®
CE, or Palm OS®. As is conventional, the appropriate operating system may
advantageously include a communications protocol implementation, which handles
all incoming and outgoing message traffic passed over the network, which can include
a wireless network. In other embodiments, while the operating system may differ
depending on the type of computer, the operating system may continue to provide
the appropriate communications protocols necessary to establish communication links
with the network.
FIG. 2 illustrates a sequence of frames. A video sequence includes multiple
video frames taken at intervals. The rate at which the frames are displayed is
referred to as the frame rate. In addition to techniques used to compress still
video, motion video techniques relate a frame at time k to a frame at time k-1
to further compress the video information into relatively small amounts of data.
However, if the frame at time k-1 is not available due to an error, such as a transmission
error, conventional video techniques may not be able to properly decode the frame
at time k. As will be explained later, embodiments of the invention advantageously
decode the video stream in a robust manner such that the frame at time k can be
decoded even when the frame at time k-1 is not available.
The frames in a sequence of frames can correspond to either interlaced frames
or to non-interlaced frames, i.e., progressive frames. In an interlaced frame,
each frame is made of two separate fields, which are interlaced together to create
the frame. No such interlacing is performed in a non-interlaced or progressive
frame. While illustrated in the context of non-interlaced or progressive video,
the skilled artisan will appreciate that the principles and advantages described
herein are applicable to both interlaced video and non-interlaced video. In addition,
while certain embodiments of the invention may be described only in the context
of MPEG-2 or only in the context of MPEG-4, the principles and advantages described
herein are applicable to a broad variety of video standards, including H.261, H.263,
MPEG-2, and MPEG-4, as well as video standards yet to be developed. In addition,
while certain embodiments of the invention may describe error concealment techniques
in the context of, for example, a macroblock, the skilled practitioner will appreciate
that the techniques described herein can apply to blocks, macroblocks, video object
planes, lines, individual pixels, groups of pixels, and the like.
The MPEG-4 standard is defined in "Coding of Audio-Visual Objects: Systems,"
14496-1, ISO/IEC JTC1/SC29/WG11 N2501, November 1998, and "Coding of Audio-Visual
Objects: Visual," 14496-2, ISO/IEC JTC1/SC29/WG11 N2502, November 1998, and the
MPEG-4 Video Verification Model is defined in ISO/IEC JTC 1/SC 29/WG11, "MPEG-4
Video Verification Model 17.0," ISO/IEC JTC1/SC29/WG11 N3515, Beijing, China, July
2000, the contents of which are incorporated herein in their entirety.
In an MPEG-2 system, a frame is encoded into multiple blocks, and each block
is
encoded into six macroblocks. The macroblocks include information, such as luminance
and color, for composing a frame. In addition, while a frame may be encoded as
a still frame, i.e., an intra-coded frame, frames in a sequence of frames can be
temporally related to each other, i.e., predictive-coded frames, and the macroblocks
can relate a section of one frame at one time to a section of another frame at
another time.
In an MPEG-4 system, a frame in a sequence of frames is further encoded into a
number of video objects known as video object planes (VOPs). A frame can be encoded
into a single VOP or in multiple VOPs. In one system, such as a wireless system,
each frame includes only one VOP so that a VOP is a frame. The VOPs are transmitted
to a receiver, where they are decoded by a decoder back into video objects for
display. A VOP can correspond to an intra-coded VOP (I-VOP), to a predictive-coded
VOP (P-VOP) to a bidirectionally-predictive coded VOP (B-VOP), or to a sprite VOP
(S-VOP). An I-VOP is not dependent on information from another frame or picture,
i.e., an I-VOP is independently decoded. When a frame consists entirely of I-VOPs,
the frame is called an I-Frame. Such frames are commonly used in situations such
as a scene change. Although the lack of dependence on content from another frame
allows an I-VOP to be robustly transmitted and received, an I-VOP disadvantageously
consumes a relatively large amount of data or data bandwidth as compared to a P-VOP
or B-VOP. To efficiently compress and transmit video, many VOPs in video frames
correspond to P-VOPs.
A P-VOP efficiently encodes a video object by referencing the video object to
a
past VOP, i.e., to a video object (encoded by a VOP) earlier in time. This past
VOP is referred to as a reference VOP. For example, where an object in a frame
at time k is related to an object in a frame at time k-1, motion compensation encoded
in a P-VOP can be used to encode the video object with less information than with
an I-VOP. The reference VOP can be either an I-VOP or a P-VOP.
A B-VOP uses both a past VOP and a future VOP as reference VOPs. In a real-time
video bitstream, a B-VOP should not be used. However, the principles and advantages
described herein can also apply to a video bitstream with B-VOPs. An S-VOP is used
to display animated objects.
The encoded VOPs are organized into macroblocks. A macroblock includes sections
for storing luminance (brightness) components and sections for storing chrominance
(color) components. The macroblocks are transmitted and received via the communication
network
120. It will be understood by one of ordinary skill in the art that
the communication of the data can further include other communication layers, such
as modulation to and demodulation from code division multiple access (CDMA). It
will be understood by one of ordinary skill in the art that the video bitstream
can also include corresponding audio information, which is also encoded and decoded.
FIG. 3 is a flowchart
300 generally illustrating a process of concealing
errors or missing data in a video bitstream. The errors can correspond to a variety
of problems or unavailability including a loss of data, a corruption of data, a
header error, a syntax error, a delay in receiving data, and the like. Advantageously,
the process of FIG. 3 is relatively unsophisticated to implement and can be executed
by relatively slow decoders.
Upon the detection of an error, the process starts at a first decision block
304. The first decision block
304 determines whether the error relates
to intra-coding or predictive-coding. It will be understood by the skilled practitioner
that the intra-coding or predictive-coding can refer to frames, to macroblocks,
to video object planes (VOPs), and the like. While illustrated in the context of
macroblocks, the skilled artisan will appreciate that the principles and advantages
described in FIG. 3 also apply to video object planes and the like. The process
proceeds from the first decision block
304 to a first state
308 when
the error relates to an intra-coded macroblock. When the error relates to a predictive-coded
macroblock, the process proceeds from the first decision block
304 to a
second decision block
312. It will be understood that the error for a predictive-coded
macroblock can arise from a missing macroblock in a present frame at time t, or
from an error in a reference frame at time t-1 from which motion is referenced.
In the first state
308, the process interpolates or spatially conceals
the error in the intra-coded macroblock, termed a missing macroblock. In one embodiment,
the process conceals the error in the missing macroblock by linearly interpolating
data from an upper macroblock that is intended to be displayed "above" the missing
macroblock in the image, and from a lower macroblock that is intended to be displayed
"below" the missing macroblock in the image. Techniques other than linear interpolation
can also be used.
For example, the process can vertically linearly interpolate using a line denoted
lb copied from the upper macroblock and a line denoted lt copied from the lower
macroblock. In one embodiment, the process uses the lowermost line of the upper
macroblock as lb and the topmost line of the lower macroblock as lt.
Depending on the circumstances, the upper macroblock and/or the lower macroblock
may also not be available. For example, the upper macroblock and/or the lower macroblock
may have an error. In addition, the missing macroblock may be located at the upper
boundary of an image or at the lower boundary of the image.
One embodiment of the invention uses the following rules to conceal errors in
the missing macroblock when linear interpolation between the upper macroblock and
the lower macroblock is not applicable.
When the missing macroblock is at the upper boundary of the image, the topmost
line of the lower macroblock is used as lb. If the lower macroblock is also missing,
the topmost line of the next-lower macroblock in the image is used as lb, and so
forth, if further lower macroblocks are missing. If all the lower macroblocks are
missing, a gray line is used as lb.
When the missing macroblock is at the lower boundary of the image or the lower
macroblock is missing, lb, the lowermost line of the upper macroblock, is also
used as lt.
When the missing macroblock is neither at the upper boundary of the image nor
at the lower boundary of the image, and interpolation between the upper macroblock
and the lower macroblock is not applicable, one embodiment of the invention replaces
the missing macroblock with gray pixels (Y=U=V=128 value).
According to one decoding standard, MPEG-4, pixels that are associated
with a block with an error are stored as a "0," which corresponds to green pixels
in a display. Gray pixels can be closer than green to the colors associated with
a missing block, and simulation tests have observed a 0.1 dB improvement over the
green pixels with relatively little or no increase in complexity. For example,
the gray pixel color can be implemented by a copy instruction. When the spatial
concealment is complete, the process ends.
When the error relates to a predictive-coded macroblock, the second decision
block 312 determines whether another motion vector is available to be used
for the missing macroblock. For example, the video bitstream may also include another
motion vector, such as a redundant motion vector, which can be used instead of
a standard motion vector in the missing macroblock. In one embodiment, a redundant
motion vector is estimated by doubling the standard motion vector. One embodiment
of the redundant motion vector references motion in the present frame at time t
to a frame at time t-2. When both the frame at time t-2 and the redundant motion
vector are available, the process proceeds from the second decision block 312
to a second state 316, where the process reconstructs the missing macroblock
from the redundant motion vector and the frame at time t-2. Otherwise, the process
proceeds from the second decision block 312 to a third decision block 320.
In the third decision block 320, the process determines whether the error
is due to a predictive-coded macroblock missing in the present frame, i.e., missing
motion vectors. When the motion vectors are missing, the process proceeds from
the third decision block 320 to a third state 324. Otherwise, the
process proceeds from the third decision block 320 to a fourth decision
block 328.
In the third state 324, the process substitutes the missing motion vectors
in the missing macroblock to provide temporal concealment of the error. One embodiment
of temporal concealment of missing motion vectors is described in greater detail
later in connection with FIG. 4. The process advances from the third state 324
to the fourth decision block 328.
In the fourth decision block 328, the process determines whether an error
is due to a missing reference frame, e.g., the frame at time t-l. If the reference
frame is available, the process proceeds from the fourth decision block 328
to a fourth state 332, where the process uses the reference frame and the
substitute motion vectors from the third state 324. Otherwise, the process
proceeds to a fifth state 336.
In the fifth state 336, the process uses a frame at time t-k as a reference
frame. Where the frame corresponds to the previous-previous frame, k can equal
2. In one embodiment, the process multiplies the motion vectors that were received
in the macroblock or substituted in the third state 324 by a factor, such
as 2 for linear motion, to conceal the error. The skilled practitioner will appreciate
that other appropriate factors may be used depending on the motion characteristics
of the video images. The process proceeds to end until the next error is detected.
FIG. 4 illustrates an exemplary process of temporal concealment of missing motion
vectors. In one embodiment, a macroblock includes four motion vectors. In the illustrated
temporal concealment technique, the missing motion vectors of a missing macroblock
402 are substituted with motion vectors copied from other macroblocks. In
another embodiment, which will be described later, the missing motion vectors of
the missing macroblock 402 are substituted with motion vectors interpolated
from other macroblocks.
When the missing macroblock 402 is below and above other macroblocks
in the image, the process copies motion vectors from an upper macroblock 404,
which is above the missing macroblock 402, and copies motion vectors from
a lower macroblock 406, which is below the missing macroblock 402.
The missing macroblock 402 corresponds to a first missing motion vector
410, a second missing motion vector 412, a third missing motion vector
414, and a fourth missing motion vector 416. The upper macroblock
404 includes a first upper motion vector 420, a second upper motion
vector 422, a third upper motion vector 424, and a fourth upper motion
vector 426. The lower macroblock 406 includes a first lower motion
vector 430, a second lower motion vector 432, a third lower motion
vector 434, and a fourth lower motion vector 436.
When both the upper macroblock 404 and the lower macroblock 406
are available and include motion vectors, the illustrated process uses the third
upper motion vector 424 as the first missing motion vector 410, the
fourth upper motion vector 426 as the second missing motion vector 412,
the first lower motion vector 430 as the third missing motion vector 414,
and the second lower motion vector 432 as the fourth missing motion vector 416.
When the missing macroblock 402 at the upper boundary of the image, the
process sets both the first missing motion vector 410 and the second missing
motion vector 412 to the zero vector (no motion). The process uses the first
lower motion vector 430 as the third missing motion vector 414, and
the second lower motion vector 432 as the fourth missing motion vector 416.
When the lower macroblock 406 is corrupted or otherwise unavailable and/or
the missing macroblock 402 is at the lower boundary of the image, the process
sets the third missing motion vector 414 equal to the value used for the
first missing motion vector 410, and the process sets the fourth missing
motion vector 416 equal to the value used for the second missing motion
vector 412.
In one embodiment, the missing motion vectors of the missing macroblock 402
are substituted with motion vectors interpolated from other macroblocks. A variety
of techniques for interpolation exist. In one example, the first missing motion
vector 410 is substituted with a vector sum of the first upper motion vector
420 and 3 times the third upper motion vector 424, i.e., v1
410=v1
420+(3)(v3
424).
In another example, the third missing motion vector 414 can be substituted
with a vector sum of the third lower motion vector 434 and 3 times the first
lower motion vector 430, i.e., v3
414=(3)(v1
430)+v3
434.
FIG. 5 is a flowchart 500 generally illustrating a process of adaptively
concealing errors in a video bitstream. Advantageously, the process of FIG. 5 adaptively
selects a concealment mode such that the error-concealed or reconstructed images
can correspond to relatively less distorted image. Simulation tests predict improvements
of up to about 1.5 decibels (dB) in peak signal to noise ratio. The process of
FIG. 5 can be used to select an error concealment mode even when data for a present
frame is received without an error.
For example, the process can receive three consecutive frames. A first frame
is cleanly received. A second frame is received with a relatively high-degree of
corruption. Data for a third frame is cleanly received, but reconstruction of a
portion of the third frame depends on portions of the second frame, which was received
with a relatively high-degree of corruption. Under certain conditions, it can be
advantageous to conceal portion of the third frame because portions of the third
frame depend on a portions of a corrupted frame. The process illustrated in FIG.
5 can advantageously identify when error concealment techniques should be invoked
even when such error concealment techniques would not be needed by standard video
decoders to provide a display of the corresponding image.
The process starts in a first state 504, where the process receives data
from the video bitstream for the present frame, i.e., the frame at time t. A portion
of the received data may be missing, due to an error, such as a dropout, corruption,
delay, and the like. The process advances from the first state 504 to a
first decision block 506.
In the first decision block 506, the process determines whether the data
under analysis corresponds to an intra-coded video object plane (I-VOP) or to a
predictive-coded VOP (P-VOP). It will be understood by one of ordinary skill in
the art that the process can operate at different levels, such as on macroblocks
or frames, and that a VOP can be a frame. The process proceeds from the first decision
block 506 to a second decision block 510 when the VOP is an I-VOP.
Otherwise, i.e., the VOP is a P-VOP, the process proceeds to a third decision block 514.
In the second decision block 510, the process determines whether there
is an error in the received data for the I-VOP. The process proceeds from the second
decision block 510 to a second state 518 when there is an error.
Otherwise, the process proceeds to a third state 522.
In the second state 518, the process conceals the error with spatial concealment
techniques, such as the spatial concealment techniques described earlier in connection
with the first state 308 of FIG. 3. The process advances from the second
state 518 to a fourth state 526.
In the fourth state 526, the process sets an error value to an error predicted
for the concealment technique used in the second state 518. One embodiment
normalizes the error to a range between 0 and 255, where 0 corresponds to no error,
and 255 corresponds to a maximum error. For example, where gray pixels replace
a pixel in an error concealment mode, the error value can correspond to 255. In
one embodiment, the error value is retrieved from a table of pre-calculated error
estimates. In spatial interpolation, the pixels adjacent to error-free pixels are
typically more faithfully concealed than the pixels that are farther away from
the error-free pixels. In one embodiment, an error value is modeled as 97 for pixels
adjacent to error-free pixels, while other pixels are modeled with an error value
of 215. The error values can be maintained in a memory array on a per-pixel basis,
can be maintained for only a selection of pixels, can be maintained for groups
of pixels, and so forth.
In the third state 522, the process has received an error-free I-VOP and
clears (to zero) the error value for the corresponding pixels of the VOP. Of course,
other values can be arbitrarily selected to indicate an error-free state. The process
advances from the third state 522 to a fifth state 530, where the
process constructs the VOP from the received data and ends. The process can be
reactivated to process the next VOP received.
Returning to the third decision block 514, the process determines
whether the P-VOP includes an error. When there is an error, the process proceeds
from the third decision block 514 to a fourth decision block 534.
Otherwise, the process proceeds to an optional sixth state 538.
In the fourth decision block 534, the process determines whether the error
values for the corresponding pixels are zero or not. If the error values are zero
and there is no error in the data of the present P-VOP, then the process proceeds
to the fifth state 520 and constructs the VOP with the received data as
this corresponds to an error-free condition. The process then ends or waits for
the next VOP to be processed. If the error values are non-zero, then the process
proceeds to a seventh state 542.
In the seventh state 542, the process projects the estimate error value,
i.e., a new error value, that would result if the process uses the received data.
For example, if a previous frame contained an error, that error may propagate to
the present frame by decoding and using the P-VOP of the present frame. In one
embodiment, the estimated error value is about 103 plus an error propagation term,
which depends on the previous error value. The error propagation term can also
include a "leaky" value, such as 0.93, to reflect a slight loss in error propagation
per frame. The process advances from the seventh state 542 to an eighth
state 546.
In the eighth state 546, the process projects the estimated error value
that would result if the process used an error resilience technique. The error
resilience technique can correspond to a wide variety of techniques, such as an
error concealment technique described in connection with FIGS. 3 and 4, the use
of additional motion vectors that reference other frames, and the like. Where the
additional motion vector references the previous-previous frame, one embodiment
uses an error value of 46 plus the propagated error. It will be recognized that
a propagated error in a previous frame can be different than a propagated error
in a previous-previous frame. In one embodiment, the process projects the estimated
error values that would result from a plurality of error resilience techniques.
The process advances from the eighth state 546 to a ninth state 550.
In the ninth state 550, the process selects between using the received
data and using an error resilience technique. In one embodiment, the process selects
between using the received data and using one of multiple error resilience techniques.
The construction, concealment, or reconstruction technique that provides the lowest
projected estimated error value is used to construct the corresponding portion
of the image. The process advances from the ninth state 550 to a tenth state
554, where the process updates the affected error values according to the
selected received data or error resilience technique used to generate the frame,
and the process ends. It will be unders