Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles

Methods and apparatuses for compressing digital image data Number:7,522,774 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

Google
 

Top Breaking News
     Palestinian Hunger Striker Stirs Emotions by Robert Berger
     Al-Qaida Leader Voices Support for Syrian Uprising by VOA News
     Senegal Youth Mobilizes Before Elections by Nick Loomis

Title: Methods and apparatuses for compressing digital image data

Abstract: Methods and apparatuses for compressing digital image data are described herein. In one embodiment, a wavelet transform is performed on each pixel of a frame to generate multiple wavelet coefficients representing each pixel in a frequency domain. The wavelet coefficients of a sub-band of the frame are iteratively encoded into a bit stream based on a target transmission rate, where the sub-band of the frame is obtained from a parent sub-band of a previous iteration. The encoded wavelet coefficients satisfy a predetermined threshold based on a predetermined algorithm while the wavelet coefficients that do not satisfy the predetermined threshold are ignored in the respective iteration. Other methods and apparatuses are also described.

Patent Number: 7,522,774 Issued on 04/21/2009 to Ramasastry,   et al.


Inventors: Ramasastry; Jayaram (Woodinville, CA), Choudhury; Partho (Maharashtra, IN), Prasad; Ramesh (Maharashtra, IN)
Assignee: Sindhara Supermedia, Inc. (Redmond, WA)
Appl. No.: 11/077,106
Filed: March 9, 2005


Related U.S. Patent Documents

Application NumberFiling DatePatent NumberIssue Date
60552153Mar., 2004
60552356Mar., 2004
60552270Mar., 2004

Current U.S. Class: 382/232
Current International Class: G06K 9/36 (20060101)
Field of Search: 382/232,239,240,248 708/317,400-401 375/240,240.01,240.02,240.11,240.18,240.19 348/384.1,398.1,404.1


References Cited [Referenced By]

U.S. Patent Documents
5585852 December 1996 Agarwal
5881176 March 1999 Keith et al.
6967600 November 2005 Kadono et al.
7295608 November 2007 Reynolds et al.
7333814 February 2008 Roberts
Primary Examiner: Couso; Jose L
Attorney, Agent or Firm: Blakely, Sokoloff, Taylor & Zafman LLP

Parent Case Text



This application claims the benefit of U.S. Provisional Application No. 60/552,153, filed Mar. 10, 2004, U.S. Provisional Application No. 60/552,356, filed Mar. 10, 2004, and U.S. Provisional Application No. 60/552,270, filed Mar. 10, 2004. The above-identified applications are hereby incorporated by reference in their entirety.
Claims



What is claimed is:

1. A computer-implemented method, comprising: performing on a computer a wavelet transform on each pixel of a frame to generate a plurality of wavelet coefficients representing each pixel in a frequency domain; iteratively encoding on a computer wavelet coefficients of a sub-band of the frame obtained from a parent sub-band of a previous iteration into a bit stream based on a target transmission rate, wherein the encoded wavelet coefficients satisfy a predetermined threshold based on a predetermined algorithm while the wavelet coefficients that do not satisfy the predetermined threshold are ignored in the respective iteration; and tracking spatial shift of one or more coefficients between frames.

2. The method of claim 1, further comprising transmitting at least a portion of the bit stream to a recipient over a network according to the target transmission rate, wherein the transmitted bit stream when decoded by the recipient, is sufficient to represent an image of the frame.

3. The method of claim 2, wherein iterative encoding is performed according to an order representing significance of the wavelet coefficients.

4. The method of claim 3, wherein the order is a zigzag order across the frame such that significant coefficients are encoded prior to less significant coefficients in the bit stream, wherein when a portion of the bit stream is transmitted due to the target transmission rate, at least a portion of bits representing the significant coefficients are transmitted while at least a portion of bits representing the less significant coefficients are ignored.

5. The method of claim 3, wherein an amount of data in the bit stream is determined based on the target transmission rate, which is determined based on a communications bandwidth associated with a recipient over the network.

6. The method of claim 5, wherein the bit stream includes significance, sign, and bit plane information associated with the encoded coefficient based on a location of encoded coefficient within a respective sub-band.

7. The method of claim 6, further comprising maintaining separate contexts to represent the significance, sign, and bit plane information respectively for different decomposition levels, wherein content of the contexts are compressed into the bit stream for transmission.

8. The method of claim 3, further comprising decrementing a predetermined threshold of a current iteration by a predetermined offset to generate a new threshold for a next iteration.

9. The method of claim 8, wherein the predetermined offset includes up to a half of the predetermined threshold of the current iteration.

10. The method of claim 8, wherein an encoding area for the next iteration is larger than an encoding area of the current iteration by a factor determined based on the predetermined offset.

11. A machine-readable medium having executable code to cause a machine to perform a method, the method comprising: performing on a computer a wavelet transform on each pixel of a frame to generate a plurality of wavelet coefficients representing each pixel in a frequency domain; iteratively encoding on a computer wavelet coefficients of a sub-band of the frame obtained from a parent sub-band of a previous iteration into a bit stream based on a target transmission rate, wherein the encoded wavelet coefficients satisfy a predetermined threshold based on a predetermined algorithm while the wavelet coefficients that do not satisfy the predetermined threshold are ignored in the respective iteration; and tracking spatial shift of one or more coefficients between frames.

12. The machine-readable medium of claim 11, wherein the method further comprises transmitting at least a portion of the bit stream to a recipient over a network according to the target transmission rate, wherein the transmitted bit stream when decoded by the recipient, is sufficient to represent an image of the frame.

13. The machine-readable medium of claim 12, wherein iteratively encoding is performed according to an order representing significance of the wavelet coefficients.

14. The machine-readable medium of claim 13, wherein the order is a zigzag order across the frame such that significant coefficients are encoded prior to less significant coefficients in the bit stream, wherein when a portion of the bit stream is transmitted due to the target transmission rate, at least a portion of bits representing the significant coefficients are transmitted while at least a portion of bits representing the less significant coefficients are ignored.

15. The machine-readable medium of claim 13, wherein an amount of data in the bit stream is determined based on the target transmission rate, which is determined based on a communications bandwidth associated with a recipient over the network.

16. The machine-readable medium of claim 15, wherein the bit stream includes significance, sign, and bit plane information associated with the encoded coefficient based on a location of encoded coefficient within a respective sub-band.

17. The machine-readable medium of claim 13, wherein the method further comprises decrementing a predetermined threshold of a current iteration by a predetermined offset to generate a new threshold for a next iteration.

18. The machine-readable medium of claim 17, wherein the predetermined offset includes up to a half of the predetermined threshold of the current iteration.

19. The machine-readable medium of claim 17, wherein an encoding area for the next iteration is larger than an encoding area of the current iteration by a factor determined based on the predetermined offset.

20. A data processing system, comprising: a capturing device to capture one or more frames of an image sequence; and an encoder coupled to the capturing device, for each frame, the encoder configured to perform a wavelet transform on each pixel of a frame to generate a plurality of wavelet coefficients representing each pixel in a frequency domain, and iteratively encode wavelet coefficients of a sub-band of the frame obtained from a parent sub-band of a previous iteration into a bit stream based on a target transmission rate, wherein the encoded wavelet coefficients satisfy a predetermined threshold based on a predetermined algorithm while the wavelet coefficients that do not satisfy the predetermined threshold are ignored in the respective iteration; and an element tracking spatial shift of one or more coefficients between frames.

21. A computer implemented method, comprising: receiving at a computerized mobile device at least a portion of a bit stream having at least one frame, wherein the mobile device includes one of Pocket PC based PDAs and smart phones, Palm based PDAs and smart phones, Symbian based phones, PDAs, and phones supporting at least one of J2ME and BREW; iteratively decoding at the device the at least a portion of a bit stream to reconstruct an image of the at least one frame and; tracking spatial shift of one or more coefficients between frames.

22. The method of claim 21, wherein iteratively decoding comprises generating significance, sign, and bit plane information associated with an encoded coefficient based on a location of encoded coefficient within a respective sub-band.

23. The method of claim 22, further comprising maintaining separate contexts to represent the significance, sign, and bit plane information respectively for different decomposition levels, wherein content of the contexts are updated from the received bit stream.

24. The method of claim 22, wherein iterative decoding is performed according to an order representing significance of the wavelet coefficients.

25. The method of claim 22, further comprising decrementing a predetermined threshold of a current iteration by a predetermined offset to generate a new threshold for a next iteration.

26. The method of claim 25, wherein the predetermined offset includes up to a half of the predetermined threshold of the current iteration.

27. method of claim 25, wherein an decoding area for the next iteration is larger than the decoding area of the current iteration by a factor determined based on the predetermined offset.

28. The method of claim 22, where in the amount of data to be decoded from the bit stream is determined based on the required quality of the reconstructed frame.

29. The method of claim 24, wherein the order is a zigzag order across the frame such that significant coefficients are decoded prior to less significant coefficients in the bit stream, wherein when a portion of the bit stream is received, at least a first portion of bits representing the significant coefficients are decoded while at least a second portion of bits representing the less significant coefficients are ignored, depending on the required quality of the reconstructed frame.

30. The method of claim 21, wherein an inverse wavelet transform is performed on each reconstructed coefficient to generate a plurality of pixels representing an image of the frame.
Description



FIELD OF THE INVENTION

The present invention relates generally to multimedia applications. More particularly, this invention relates to compressing digital image data.

BACKGROUND OF THE INVENTION

A variety of systems have been developed for the encoding and decoding of audio/video data for transmission over wireline and/or wireless communication systems over the past decade. Most systems in this category employ standard compression/transmission techniques, such as, for example, the ITU-T Rec. H.264 (also referred to as H.264) and ISO/IEC Rec. 14496-10 AVC (also referred to as MPEG-4) standards. However, due to their inherent generality, they lack the specific qualities needed for seamless implementation on low power, low complexity systems (such as hand held devices including, but not restricted to, personal digital assistants and smart phones) over noisy, low bit rate wireless channels.

Due to the likely business models rapidly emerging in the wireless market, in which cost incurred by the consumer is directly proportional to the actual volume of transmitted data, and also due to the limited bandwidth, processing capability, storage capacity and battery power, efficiency and speed in compression of audio/video data to be transmitted is a major factor in the eventual success of any such multimedia content delivery system. Most systems in use today are retrofitted versions of identical systems used on higher end desktop workstations. Unlike desktop systems, where error control is not a critical issue due to the inherent reliability of cable LAN/WAN data transmission, and bandwidth may be assumed to be almost unlimited, transmission over limited capacity wireless networks require integration of such systems that may leverage suitable processing and error-control technologies to achieve the level of fidelity expected of a commercially viable multimedia compression and transmission system.

Conventional video compression engines, or codecs, can be broadly classified into two broad categories. One class of coding strategies, known as a download-and-play (D&P) profile, not only requires the entire file to be downloaded onto the local memory before playback, leading to a large latency time (depending on the available bandwidth and the actual file size), but also makes stringent demands on the amount of buffer memory to be made available for the downloaded payload. Even with the more sophisticated streaming profile, the current physical limitations on current generation transmission equipment at the physical layer force service providers to incorporate a pseudo-streaming capability, which requires an initial period of latency (at the beginning of transmission), and continuous buffering henceforth, which imposes a strain on the limited processing capabilities of the hand-held processor. Most commercial compression solutions in the market today do not possess a progressive transmission capability, which means that transmission is possible only until the last integral frame, packet or bit before bandwidth drops below the minimum threshold. In case of video codecs, if the connection breaks before the transmission of the current frame, this frame is lost forever.

Another drawback in conventional video compression codes is the introduction of blocking artifacts due to the block-based coding schemes used in most codecs. Apart from the degradation in subjective visual quality, such systems suffer from poor performance due to bottlenecks introduced by the additional de-blocking filters. Yet another drawback is that, due to the limitations in the word size of the computing platform, the coded coefficients are truncated to an approximate value. This is especially prominent along object boundaries, where Gibbs' phenomenon leads to the generation of a visual phenomenon known as mosquito noise. Due to this, the blurring along the object boundaries becomes more prominent, leading to degradation in overall frame quality.

Additionally, the local nature of motion prediction in some codes introduces motion-induced artifacts, which cannot be easily smoothened by a simple filtering operation. Such problems arise especially in cases of fast motion clips and systems where the frame rate is below that of natural video (e.g., 25 or 30 fps non-interlaced video). In either case, the temporal redundancy between two consecutive frames is extremely low (since much of the motion is lost in between the frames itself), leading to poorer tracking of the motion across frames. This effect is cumulative in nature, especially for a longer group of frames (GoF).

Furthermore, mobile end-user devices are constrained by low processing power and storage capacity. Due to the limitations on the silicon footprint, most mobile and hand-held systems in the market have to time-share the resources of the central processing unit (microcontroller or RISC/CISC processor) to perform all its DSP, control and communication tasks, with little or no provisions for a dedicated processor to take the video/audio processing load off the central processor. Moreover, most general-purpose central processors lack the unique architecture needed for optimal DSP performance. Therefore, a mobile video-codec design must have minimal client-end complexity while maintaining consistency on the efficiency and robustness front.

SUMMARY OF THE INVENTION

Methods and apparatuses for compressing digital image data are described herein. In one embodiment, a wavelet transform is performed on each pixel of a frame to generate multiple wavelet coefficients representing each pixel in a frequency domain. The wavelet coefficients of a sub-band of the frame are iteratively encoded into a bit stream based on a target transmission rate, where the sub-band of the frame is obtained from a parent sub-band of a previous iteration. The encoded wavelet coefficients satisfy a predetermined threshold based on a predetermined algorithm while the wavelet coefficients that do not satisfy the predetermined threshold are ignored in the respective iteration.

Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.

FIG. 1 is a block diagram illustrating an exemplary multimedia streaming system according to one embodiment.

FIG. 2 is a block diagram illustrating an exemplary multimedia streaming system according to one embodiment.

FIG. 3 is a block diagram illustrating an exemplary network stack according to one embodiment.

FIGS. 4A and 4B are block diagram illustrating exemplary encoding and decoding systems according to certain embodiments.

FIG. 5 is a flow diagram illustrating an exemplary encoding process according to one embodiment.

FIGS. 6 and 7 are block diagrams illustrating exemplary pixel maps according to certain embodiments.

FIG. 8 is a flow diagram illustrating an exemplary encoding process according to an alternative embodiment.

FIGS. 9A-9B and 10A-10B are block diagrams illustrating exemplary encoding and decoding systems with motion prediction according certain embodiments.

FIG. 11 is a flow diagram illustrating an exemplary encoding process with motion prediction according to one embodiment.

FIGS. 12-15 are block diagrams illustrating exemplary pixel maps according to certain embodiments.

DETAILED DESCRIPTION

Methods and apparatuses for compressing digital image data are described herein. Concerns addressed by embodiments in this application are the speed of processing data using a processor having limited processing power, storage memory capacity and/or battery life, to achieve transmission data rates which would reproduce high-fidelity multimedia data (e.g., audio/video), the optimal compression of the payload data by exploiting any form of redundancy (e.g., spatial, temporal or run-length) present therein to achieve a target transmission rate as specified by the channel capacity, and the packetization of the data for optimal progressive transmission over the channel.

Embodiments of the system are suited for wireless streaming solutions, due to the seamless progressive transmission capability (e.g., various bandwidths), which help in graceful degradation of video quality in the event of a sudden shortfall in channel bandwidth. Moreover, it also allows for comprehensive intra- as well as inter-frame rate control, thereby allowing for the optimal allocation of bits to each frame, and an optimal distribution of the frame bit budget between the luma and chroma maps. As a result, this helps in improving the perspective quality of frames that have relatively high levels of detail or motion, while maintaining a minimal threshold on the picture quality of uniform texture and/or slow motion sequences within a video clip.

In the following description, numerous details are set forth to provide a more thorough explanation of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification do not necessarily all refer to the same embodiment.

Overview

Embodiments set forth herein include a stable system to compress and decompress digital audio/video data that is implemented on software and/or hardware platforms. Some advantages of the various embodiments of the invention include, but are not limited to, low battery power consumption, low complexity and low processing load, leading to a more efficient implementation of a commercial audio/video compression/decompression and transmission system.

According to certain embodiments, some other advantages include, but are not restricted to, a robust error detection and correction routine that exploits the redundancies in the unique data structure used in the source/arithmetic encoder/decoder of the system, and a smaller search space for predicting motion between two consecutive frames, for a more efficient and faster motion prediction routine.

FIG. 1 is a block diagram of one embodiment of an exemplary multimedia streaming system. Referring to FIG. 1, exemplary system 100 includes server component 101 (also referred to herein as a server suite) communicatively coupled to client components 103-104 (also referred to herein as client suites) over a network 102, which may be a wired network, a wireless network, or a combination of both. A server suite is an amalgamation of several services that provide download-and-playback (D&P), streaming broadcast, and/or peer-to-peer communication services. This server suite is designed to communicate with any third party network protocol stack (as shown in FIG. 3). In one embodiment, these components of the system may be implemented in the central server, though a lightweight version of the encoder may be incorporated into a handheld platform for peer-to-peer video conferencing applications. The decoder may be implemented in a client-side memory. The server component 101 may be implemented as a plug-in application within a server, such as a Web server. Similarly, each of the client components 103-104 may be implemented as a plug-in within a client, such as a wireless station (e.g., cellular phone, a personal digital assistant or PDA).

In one embodiment, server component 101 includes a data acquisition module 105, an encoder 106, and a decoder 107. The data acquisition module 105 includes a video/audio repository, an imaging device to capture video in real-time, and/or a repository of video/audio clips. In one embodiment, an encoder 106 reads the data and entropy/arithmetic encodes it into a byte stream. The encoder 106 may be implemental within a server suite.

In one embodiment, video/audio services are provided to a client engine (e.g., clients 103-104), which is a product suite encapsulating a network stack implementation (as shown in FIG. 3) and a proprietary decoder (e.g., 108-109). This suite can accept a digital payload at various data rates and footprint formats, segregate the audio and video streams, decode each byte stream independently, and display the data in a coherent and real-life manner.

In one embodiment, encoder module 106 reads raw data in a variety of data formats, (which includes, but is not limited to, RGB x:y:z, YUV x':y':z', YCrcb x'':y'':z'', where the letter symbols denote sub-sampling ratios etc.), and converts them into one single standard format for purposes of standardization and simplicity. According to one embodiment, irrespective of the mode of data acquisition, the digital information is read frame wise in a non-interleaved raster format. In case of pre-recorded video compression, the encoder unit 106 segregates the audio and video streams prior to actual processing. This is useful since the encoding and decoding mechanisms used for audio and video may be different.

In one embodiment, the frame data is then fed into a temporary buffer, and transformed into the spatial-frequency domain using a unique set of wavelet filtering operations. The ingenuity in this wavelet transformation lies in its preclusion of extra buffering, and the conversion of computationally complex filtering operations into simple addition/subtraction operations. This makes the wavelet module in this codec memory more efficient.

In one embodiment, the source encoder/decoder performs compression of the data by reading the wavelet coefficients of every sub-band of the frame obtained from the previous operation in a unique zigzag fashion, known as a Morton scan (similar to the one shown in FIG. 7). This allows the system to arrange the data in an order based on the significance of the wavelet coefficients, and code it in that order. The coding alphabet can be classified into significance, sign and refinement classes in a manner well-known in the art (e.g., JPEG 2000, etc.)

Based on the location of the pixel coefficient being coded in the sub-band map, the significance, sign and bit plane information of the pixel is coded and transmitted into the byte-stream. The first set of coefficients to be coded thus is the coarsest sub-band in the top-left corner of the sub-band map. Once the coarsest sub-band has been exhausted in this fashion, the coefficients in the finer sub-bands are coded in a similar fashion, based on a unique tree-structure relationship between coefficients in spatially homologous sub-bands.

To further exploit the redundancy of the bit stream, it is partitioned into independent logical groups of bits, based on their source in the sub-band tree map and the type of information it represents (e.g., significance, sign or refinement), and is arithmetic coded for further compression. This process that achieves results is similar to, but superior to, the context-based adaptive binary arithmetic coding (CABAC) technique specified in the H.264 and MPEG4 standards.

The temporal redundancy between consecutive frames in a video stream is exploited to reduce the bit count even further, by employing a motion prediction scheme. In one embodiment, motion is predicted over four coarsest sub-bands, and by employing a type of affined transformation, is predicted in the remaining finer sub-bands using a lower-entropy refinement search. The effective search area for predicting motion in the finer sub-bands is lesser than in the coarser sub-bands, leading to a speed-up in the overall performance of the system, along with a lower bit-rate as compared to similar video compression systems in current use.

In one embodiment, the video decoder (e.g., decoders 108-109) works in a manner similar to the encoder, with the exception that it does not have the motion prediction feedback loop. To compensate for the spatial and temporal redundancies in the data, the decoder performs the relatively opposite operations as in the encoder. In one embodiment, the byte stream is read on a bit-by-bit basis, and the coefficients are updated using a non-linear quantization scheme, based on the context decision. Similar logic applies to the wavelet transformation and arithmetic coding blocks.

Once the bit budget is exhausted, according to one embodiment, the updated coefficient map is inverse wavelet transformed using a set of arithmetic lifting operations, which may be reverse of the operations undertaken in the forward wavelet transform block in the encoder, to create the reconstructed frame. The reconstructed frame is then rendered by a set of native API (application programming interface) calls in the decoder client.

Techniques described herein are compatible with peer-peer video/audio, text/multimedia messaging, download-and-play and streaming broadcast solutions available from any third party content or service provider. For this purpose, according to one embodiment, the codec suite is made compatible with several popular third party multimedia network protocol suites.

In one embodiment, the exemplary system can be deployed on a variety of operating systems and environments, both on the hand-held as well as PC domain. These include, but are not restricted to, Microsoft.RTM. Windows.RTM. 9x/Me/XP/NT 4.x/2000, Microsoft Windows.RTM. CE, PocketLinux.TM. (and its various third party flavors), SymbianOS.TM. and PalmOS.TM.. It is available on a range of third-party development platforms. These include, but are not limited to, Microsoft.RTM. PocketPC.TM. 200X, Sun Microsystems.RTM. J2ME.TM. MIDP.RTM. X.0/CLDC.RTM. X.0, Texas Instruments.RTM. OMAP.TM. and Qualcomm.RTM. BREW.TM.. On the hardware front, embodiments of the invention can be provided as a solution on a wide range of platforms including, but not limited to, Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuits (ASIC) and System-on-Chip (SoC) implementations.

Exemplary Systems

A technique to enhance the delivery of audio and video multimedia data over low-bandwidth wireless networks is described herein. The following paragraphs explain the structure of an integrated system that provides services to a user from a single consolidated platform. It addresses transcoding and scalability issues, as well as integration between different third-party network protocol implementations..

FIG. 2 is a block diagram illustrating an exemplary multimedia streaming system according to one embodiment. Referring to FIG. 2, exemplary system 200 includes a server or servers 201 communicatively coupled to one or more clients 202-203 over a various types of networks, such as wireless network 204 and/or wired networks 205-206, which may be the same network. For example, server 201 may be implemented as server 101 of FIG. 1. Clients 202-203 may be implemented as clients 103-104 of FIG. 1. In one embodiment, the server platform 201 includes, but is not limited to, three units labeled A, B and C. However, it is not so limited. These units may be implemented as a single unit or module. These units can communicate with one another, as well with external units, to provide all relevant communication and video/audio processing capabilities.

Unit C may be an application server, which provides download services for client-side components such as decoder/encoder APIs to facilitate third party support, browser plug-ins, drivers and plug-and-play components. Unit B may be a web services platform. This addresses component reusability and scalability issues by providing COM.TM., COM+.TM., EJB.TM., CORBA.TM., XML and other related web and/or MMS related services. These components are discrete and encapsulate the data. They minimize system dependencies and reduce interaction to a set of inputs and desired outputs. To use a component, a developer may call its interface. The functionality once developed can be used in various applications, hence making the component reusable. The components are decoupled from each other, hence different parts can be scaled without need to change the entire application. Because of these features, the applications could be customized to provide differentiating services and could be scaled to handle more customers as the demand grows. Unit A may be an actual network services platform. Unit A provides network services required to transmit encoded data over the wireless network, either in a D&P (Download and Play) or a streaming profile. Unit A also provides support for peer-to-peer (P2P) communications in mobile video conferencing applications, as well as communicates with the wireless service provider to expedite billing and other ancillary issues.

According to one embodiment, for the purposes of illustration, three main types of scenarios are described herein. However, other types of scenarios may be applied. In one instance, a user 203 with unrestricted mobility (such as a person driving a car downtown) is able to access his or her wireless multimedia services using the nearest wireless base station (BS) 209 of the service provider to which he or she subscribes. The connection could be established using a wide range of technologies including, but not limited to, WCDMA [UMTS], IS-95 A/B-CDMA 1.X/EVDO/EVDV, IS-2000-CDMA2000, GSM-GPRS-EDGE, AMPS, iDEN/WiDEN, and Wi-MAX. The BS 209 communicates with the switching office (MTSO) 210 of the service provider over a TCP/IP or UDP/IP connection on the wireless WAN 204. The MTSO 210 handles hand-off, call dropping, roaming and other user profile issues. The payload and profile data is sent to the wireless ISP server for processing.

In another type of scenario, the user 202 has limited mobility, for example, within a home or office building (e.g., a LAN controlled by access point/gateway 211). Such a user sends in a request for a particular service over a short-range wireless connection, which includes, but is not restricted to, a Bluetooth.TM., Wi-Fi.TM. (IEEE.TM. 802.11x), HomeRF, HiperLAN/1 or HiperLan/2 connection, via an access point (AP) and the corporate gateway 211, to the web gateway of his or her service provider. The ISP communicates with the MTSO 210, to forward the request to the server suite 201. All communications are over a TCP/IP or UDP/IP connection 206. Once the required service has been processed by the server 201, the payload is transmitted over substantially the same route in the reverse direction back to the user.

In a third type of scenario, peer-to-peer (P2P) communication is enabled by bypassing the server 201 altogether. In this case, all communications, payload transfer and audio/video processing are routed or delegated through the wireless ISP server (e.g., server 207) without any significant load on the server, other than performing the functions of control, assignment, and monitoring.

The system capabilities may be classified based on the nature of the services and modes of payload transfer. In a D&P service, the user waits for the entire payload (e.g., video/audio clip) to be downloaded onto his or her wireless mobile unit or handset before playing it. Such a service has a large latency period, but can be transported over secure and reliable TCP/IP connections.

In a streaming service, the payload routing is the same as before, with the exception that it is now transported over a streaming protocol stack (e.g., RTSP/RTP, RTCP, SDP) over a UDP/IP network (e.g., networks 205-206). This ensures that the payload packets are transmitted quickly, though there is a chance of data corruption (e.g., packet loss) due to the unreliable nature of the UDP connection. In P2P services, the payload is routed through a UDP/IP connection, to ensure live video/audio quality needed for video conferencing applications.

The decoder as well as the encoder may be available on hardware, software, or a combination of both. For download-and-play and streaming services, the encoder may be stored in the remote server, which provides the required service over an appropriate connection, while a lightweight software decoder may be stored in the memory of the wireless handheld terminal. The decoder APIs can be downloaded from an application server (e.g., unit A) over an HTTP/FTP-over-TCP/IP connection. For services that require more interaction, such as MMS and P2P video conferencing services, both the encoder (e.g., on a stand-alone hardware platform riding piggy-back on the handset chipset) as well as decoder (e.g., an application layer software) are installed on the handheld terminal, for example, integrated within a network protocol stack as shown in FIG. 3.

Exemplary Data Compression Processes

FIGS. 4A and 4B are data flow diagrams illustrating exemplary encoding and decoding processes through an encoding system and a decoding system respectively, according to certain embodiments of the invention. FIG. 5 is a flow diagram illustrating an exemplary process for encoding digital image data according to one embodiment. The exemplary process 500 may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both. For example, exemplary process 500 may be performed by a server component (e.g., server suite), such as, for example, server 101 of FIG. 1 or sever 201 of FIG. 2.

Referring to FIGS. 4A and 5, the codec works on a raw file format 401 having raw YUV color frame data specified by any of the several standard file and frame formats including, but not limited to, high definition television (HDTV), standard definition television (SDTV), extended video graphics array (XVGA), standard video graphics array (SVGA), video graphics array (VGA), common interchange format (CIF), quarter common interchange format (QCIF) and sub-quarter interchange format (S-QCIF). In one embodiment, the pixel data is stored in a byte format, which is read in serial fashion and stored in a 1-dimensional array.

In one embodiment, each image, henceforth to be called the `frame`, includes three maps. Each of these maps may be designated to either store one primary color component, or in a more asymmetric scheme, one map stores the luminance information (also referred to as a luma map or Y map), while the other two maps store the chrominance information (also referred to as chroma maps or Cb/Cr maps). The Y map stores the luma information of the frame, while the chroma information is stored in two quadrature components. The system is designed to work on a wide variety of chrominance sub-sampling formats (which includes, but is not restricted to, the 4:1:1 color format). In this case, the dimensions of the chroma maps are an integral fraction of the dimensions of the luma map, along both the cardinal directions. The pixel data is stored in a byte format in the raw payload file, which is read in serial fashion and stored in a set of 3 one-dimensional arrays, one for each map.

Since the dimensions of the frame are previously known, the 2-dimensional co-ordinate of each pixel in the image map is mapped onto the indexing system of the 1-dimensional array representing the current color map. The actual 1-dimensional index is divided by the width of the frame, to obtain the "row number". A modulo operation on the 1-dimensional index gives a remainder that is the "column number" of the corresponding pixel.

In one embodiment, as one of the more significant pre-processing operations, the pixel coefficient values are scaled up by shifting the absolute value of each pixel coefficient by a predetermined factor (e.g., factor of 64). This increases the dynamic range of the pixel coefficient values, thereby allowing for a finer approximation of the reconstructed frame during decoding.

The next operation in the encoding process is to transform the payload from the spatial domain into the multi-resolution domain. In one embodiment, a set of forward and backward wavelet filter 402 coefficients with an integral number of taps is used for the low pass and high pass filtering operations (e.g., operation 501). In one embodiment, the filter coefficients may be modified in such a way that all operations can be done in-place, without a need for buffering the pixel values in a separate area in memory. This saves valuable volatile-memory space and processing time.

Exemplary Wavelet Filtering Operations

In one embodiment, the wavelet filtering operations on each image pixel are performed in-place, and the resultant coefficients maintain their relative position in the sub-band map. In one embodiment, the entire wavelet decomposition process is split into its horizontal and vertical components, with no particular preference to the order in which the cardinal orientation of filtering may be chosen. Due to the unique lifting nature of the filtering process, the complex mathematical computations involved in the filtering process is reduced to a set of fast, low-complexity addition and/or subtraction operations.

In every pass, according to one embodiment, a row or column (also referred to as a row vector or a column vector) is chosen, depending on the direction of the current filtering process. In a particular embodiment, a low pass filtering operation is performed on every pixel that has an even index relative to the first pixel in the current vector, and a high pass filtering operation is performed on every pixel that has an odd index relative to the first pixel of the same vector. For each filtering operation, the pixel whose wavelet coefficient is to be determined, along with a set of pixels symmetrically arranged around it in its neighborhood along the current orientation of filtering, in the current vector is chosen.

Wavelet filters with a vanishing moment of four is applied on the pixels. In one embodiment, four tap high pass and low pass filters are used for the transformation. The high pass filter combines the four neighboring even pixels weighted and normalized, as shown below, for filtering an odd pixel [9*(X.sub.k-1+X.sub.k+1)-(X.sub.k-3+X.sub.k+3)+16]/32 The low pass filter combines the four neighboring odd pixels weighted and normalized, as shown below, for filtering an even pixel [9*(X.sub.k-1+X.sub.k+1)-(X.sub.k-3+X.sub.k+3)+8]/16 where X.sub.k is the pixel at position k.

Different normalized weights are applied to each pixel (with a weight of 1.0 being applied to the (central) pixel whose transformed value is to be determined), and a series of addition, subtraction and multiplication operations generate the wavelet coefficient in the same pixel position relative to the beginning of the current vector. It can be deduced that all pixel positions with even indices within a particular vector (row or column) (0, 2, 4, . . . ) represent low pass coefficients, while all pixel positions with odd indices within a particular vector (row or column) (1, 3, 5 . . . ) represent high pass coefficients. Each such an iteration or pass, creates four sets of coefficients, each constituting a sub-band or sub-image.

These four sub-bands are intimately inter-meshed into one another. In the first pass, for instance, coefficients belonging to the same sub-image are at an offset of 1 pixel position from one another. In the next pass, all the low pass coefficients (which constitute the LL.sub.k sub-image) are selected, and the entire process is repeated. This operation shall, likewise, result in four sub-images with each sub-image having pixels, which are at an offset of 2 pixel positions from each other. Due to the iterative decimation, every pass involves computations on exactly one-quarter the number of pixels as compared to the previous pass, with the offset between pixels in the same sub-image at the same level, doubling with every iteration.

Alternative Wavelet Filtering Operations

In another embodiment, the wavelet filtering operation is viewed as a dyadic hierarchical filtering process, meaning that the end-result of a single iteration of the filtering process on the image is to decimate it into four sub-bands, or sub-images, each with half the dimensions in both directions as the original image. The four sub-bands, or sub-images are labeled as HH.sub.k, HL.sub.k, LH.sub.k and LL.sub.k (where k is the level of decomposition beginning with one for the finest level), depending on their spatial orientation relative to the original image. In the next iteration, the entire filtering process is repeated on only the LL.sub.k sub-image obtained in the previous pass, to obtain four sub-images called HH.sub.k-1, LH.sub.k-1, HL.sub.k-1 and LL.sub.k-1, which have half the dimensions of LL.sub.k, as explained above. This process is repeated for as many levels of decomposition as is desired, or until the LL sub-band has been reduced to a block which is one pixel across, in which case, further decimation is no longer possible.

In one embodiment, the filtering is split into horizontal and vertical filtering operations. For the vertical filtering mode, each column (e.g., vertical vector) in the three maps is processed one at a time. Initially, all the pixels in the column are copied into a temporary vector, which has as many locations reserved as there are pixels in the vector. For the actual filtering operation, the temporary vector is split into two halves. Pixels located in the even numbered memory locations (such as 0, 2, 4, . . . ) of the temporary vector are low pass filtered using the low pass filter (LPF) coefficients, while the pixels in the odd numbered memory locations (such as 1, 3, 5, . . . ) of the temporary vector are high pass filtered using the high pass filter (HPF) coefficients.

The result of each filtering operation (high-pass or low-pass) is stored in the current vector, such that all the results of the low-pass filtering operations are stored in the upper half of the vector (e.g., the top half of a vertical vector, or the left half of a horizontal vector, depending on the current orientation of filtering), while the results from the high-pass filtering operations are stored in the lower half of the column (e.g., the bottom half of a vertical vector, or the right half of a horizontal vector). In this way, the pixel data is decimated in a single iteration. The entire process is repeated for all the columns and rows in the current map and frame. The entire process is repeated for all three maps for the current frame, to obtain the wavelet transformed image.

Referring back to FIG. 4A, the bootstrapped source entropy and arithmetic coding process 403 of the wavelet map is also referred to as channel coding (e.g., operation 502). The arithmetic coding exploits the intimate relationships between spatially homologous blocks within the sub-band tree structure generated in the wavelet transformation 402 described above. The data in the wavelet map is encoded by representing the significance (e.g., with respect to a variable-size quantization threshold), sign and bit plane information of the pixels using a single bit alphabet. The bit stream is encoded in an embedded form, meaning that all the relevant information of a single pixel at a particular quantization threshold is transmitted as a continuous stream of bits. The quantization threshold depends on the number of bits used to represent the Wavelet coefficients. In this embodiment, sixteen bits are used for representing the coefficients. Hence for the first pass the quantization threshold is set, for example, at 0.times.8000. After a single pass, the threshold is lowered, and the pixels are encoded in the same or similar order as before until substantially all the pixels have been processed. This ensures that all pixels are progressively coded and transmitted in the bit stream.

According to one embodiment, the entropy coded bit stream is further compressed by passing the outputted bit through a context based adaptive arithmetic encoder 404 (also referred to as a channel encoder), as shown as operation 503. This context based adaptive binary arithmetic coder (CABAC) encodes the bit information depending on the probability of occurrence of a predetermined set of bits immediately preceding the current bit. The context in which the current bit is encoded depends on the nature of the information represented by the bit (significance, sign or bit plane information) and the location of the coefficient being coded in the hierarchical tree structure. The concept of a CABAC is similar in principle to the one specified in the ITU-T SG16 WP3 Q.6 (VCEG) Rec. H.264 and ISO/IEC JTC 1/SC 29/WG 11 (MPEG) Rec. 14496-10 (MPEG4 part 10). The difference lies in the context modeling, estimation and adaptation of probabilities. Since the transform and source coding technique of the embodiment is different from ITU-T SG16 WP3 Q.6 (VCEG) Rec. H.264 and ISO/IEC JTC 1/SC 29/WG 11 (MPEG) Rec. 14496-10 (MPEG4 part 10), the coefficients of the embodiment has different statistical characteristics. The CABAC-type entropy coder, as specified in the embodiment, is designed to exploit these characteristics to the maximum.

In one embodiment, the context is an n-bit data structure with a dynamic range of 0 to 2.sup.n. With every new bit coded, the context variable assigned to the current bit is updated, based on a probability estimation table (PET). Once n bits have been coded, the contents of the context variable is packed into a compact data structure as compressed file 405 and transmitted onward, and the variable is refreshed for the next batch of n bits (e.g., operation 504). In one embodiment, the system uses (9.times.m) context variables for each frame--for three bit classes over three spatial orientation trees, and all sub-bands over m levels of decomposition.

According to one embodiment, the decoder, which may reside in the client, may be implemented similar to the exemplary encoder 400 of FIG. 4A, but in a reversed order as shown in FIG. 4B.

Exemplary Encoding Hierarchy

The hierarchical tree structure relating spatially homologous pixels and their descendants can be explained using a parent-child hierarchy. FIG. 6 is a diagram illustrating an exemplary pixel map for encoding processing according to one embodiment. Referring to FIG. 6, in one embodiment, the root of the tree structure may be made up of the set of all the pixels in the coarsest sub-band, LL.sub.k, and the set be labeled as H. In one embodiment, the pixels in set H are grouped in sets of 2.times.2, or quads. Referring to FIG. 6, each quad in set H (e.g., block 601) has four pixels, with all but the top-left member 602 of every quad having four descendants (e.g., blocks 603-605) in the spatially homologous next finer level of decomposition. Thus, for instance, the top-right pixel in a quad has four descendant pixels 604 (in a 2.times.2 format) in the next finer sub-band with the same spatial orientation (HL.sub.k-1 in this case). The relative location of the descendants, too, is related to the spatial orientation of the tree root. The first generation descendants of a coefficient (henceforth labeled as offspring) of the top-right pixel in the top-left quad of set H are the top-left 2.times.2 quad in HL.sub.k-1 (e.g., block 604). Similarly, the offspring of the bottom right pixel in any quad of set H lie in spatially homologous positions in the HH.sub.k-1 sub-band, while the descendants of the bottom left pixel in any quad of set H lie in spatially homologous positions in the LH.sub.k-1 sub-band (e.g., block 603). Descendants beyond the first generation of pixels, and sets (including quads) thereof, are generally labeled as grandchildren coefficients, for example, blocks 606-611 as shown in FIG. 6.

For the encoding process, in one embodiment, a unique data structure records the order in which the coefficients are encoded. Three dynamically linked data structures, or queues, are maintained for this purpose, labeled as insignificant pixel queue (IPQ), insignificant set queue (ISQ) and significant pixel queue (SPQ). In one embodiment, each queue is implemented as a dynamic data structure, which includes, but is not restricted to, a doubly linked list or a stack array structure, where each node stores information about the pixel such as coordinates, bit plane number when the pixel becomes significant and type of ISQ list.

In one embodiment, three types of sets of transform coefficients are defined to partition the pixels and their descendant trees. However, more or less sets may be implemented. The set D(T) is the set of all descendants of a pixel, or an arbitrary set, T, thereof. This includes direct descendants (e.g., offspring such as blocks 603-605) as well as grandchildren coefficients (e.g., blocks 606-608). The set O(T) is defined as the set of all first generation, or direct, descendants of a pixel, or an arbitrary set, T, thereof (e.g., blocks 603-605). Finally, the set L(T) is defined as the set of descendants other than the offspring of a pixel, or an arbitrary set, T, thereof, i.e., L(T)=D(T)-O(T) (e.g., blocks 609-611). In one embodiment, two types of ISQ entries may be defined. ISQ entries of type a represent the set D(T). ISQ entries of type .beta. represent the set L(T).

In one embodiment, a binary metric used extensively in the encoding process is the significance function, S.sub.n(T). In one embodiment, the significance function gives an output of one if the largest wavelet coefficient in the set T is larger than the current quantization threshold level (e.g., the quantization threshold in the current iteration), or else give an output of zero. In one embodiment, the significance function may be defined as follows:

.function..tau..times..gtoreq..times..times..di-elect cons..tau. ##EQU00001## where, S.sub.n(T) is the set of pixels, T whose significance is to be measured against the current threshold, n.sub.1.

FIG. 8 is a flow diagram illustrating an exemplary encoding process according to one embodiment. The exemplary process 500 may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system, a server, or a dedicated machine), or a combination of both. For example, exemplary process 500 may be performed by a server component (e.g., server suite), such as, for example, server 101 of FIG. 1 or sever 201 of FIG. 2.

Referring to FIG. 8, the first phase in the encoding process of the encoder (also referred to as the initialization pass) is the determination and transmission (as a sequence of 8 bits, in binary format) of the number of passes the encoder has to iterate through (block 801). The number of iterations, according to one embodiment, is less than or equal to the number of bit-planes of the largest wavelet coefficient in the current map. The number of iterations to code all the bits in a single map is determined by the number of quantization levels. In one embodiment, this is determined using a formula that may be defined as follows: n.sub.1=ceil(log.sub.2|w.sub.max|) where w.sub.max is the largest wavelet coefficient in the current map. This number is transmitted (without context) into the byte stream. The coding process is then iterated n.sub.1 times over the entire map.

Initially, at block 802, in one embodiment all pixels are marked as insignificant against the current threshold level, which may be defined as T=2.sup.n1. Hence, IPQ is populated with all the pixels in set H, while ISQ is populated with all the pixels in set H that have descendants (i.e., in set H, all the pixels in every quad except the top-left one). The SPQ is kept empty and is filled gradually as pixels become significant against the current quantization threshold.

As a next phase, at block 803, also referred to as a sorting pass, in one embodiment, all the pixels in the IPQ are sorted to determine which ones have become significant with respect to a current quantization threshold. For every entry in the IPQ, the value of the significance function for the current pixel (or entry in the IPQ) is determined, and the value is sent out as the output in the form of a single bit. In one embodiment, as a next operation, the sign bit of the pixel entry, if the value of the significance function in the previous operation were one, is sent as the output in the form of a single bit. The output of the sign is 1 if the entry is positive and 0 if the entry is negative. Once all the significant pixels in the set H have been segregated from the insignificant ones, a more complex version of the same sorting pass is performed on the entries of the ISQ.

In one embodiment, at block 804, if the current entry of the ISQ is of type .alpha. (e.g., the class of entries that represents all the descendants of the pixel across all generations), the significance of the set D(T) is transmitted as a single bit. If this output bit is one (e.g., the entry has one or more significant descendants), a similar test is performed for the direct (e.g., first generation) descendants, or offspring of the entry. For all four offspring of the entry (defined by set O(T)), according to one embodiment, two operations are performed. First, the significance of the offspring pixel is determined and transmitted. As a second operation, if the offspring pixel is significant, the sign of the ISQ entry is transmitted. That is, a value of one is transmitted if the entry is positive, or a value of zero is transmitted if the entry is negative. The entry is then deleted from the ISQ and appended to the SPQ. If, however, the offspring pixel is insignificant, the offspring pixel is removed from the ISQ, and appended to the IPQ.

Once all the offspring pixels have been tested, the current ISQ entry is retained depending on the depth of the descendant hierarchy. If the entry has no descendants beyond the immediate offspring (L (T).noteq.(O), the entry is purged from the ISQ. If, however, descendants for the current set exist beyond the first generation, the entry is removed from its current position in the ISQ, and appended to the end of the ISQ as an entry of type .beta. (block 805).

In one embodiment, if the current entry in the ISQ were of type .beta., the significance test is performed on the set L (T). For every entry in the ISQ of type .beta., the significance of the set L(T) is tested (e.g., using a significant function) and transmitted as a single bit. If there exists one or more significant pixels in the set L(T), all four offspring of the current ISQ entry are appended to the ISQ as type .alpha. entries at block 806, to be processed in future passes. The current entry in the ISQ is then purged from the queue at block 807.

In one embodiment, the final phase in the coding process is referred to as the refinement pass. At the end of the sorting pass, all the pixels (or sets thereof) that have become significant against the current quantization threshold level up to the current iteration are removed from the IPQ and appended to the SPQ. For every such entry, the iteration number "n", when the entry was appended to the queue (and the corresponding coefficient became significant against the current quantization threshold level), is recorded along with the co-ordinate information. For every entry in the SPQ that has been appended to it in previous iterations (n.sub.1, n.sub.1-1. . . n.sub.1-n), the n.sup.th most significant bit is transmitted. As a final pass, in one embodiment, the value of n is decremented by one, so that the quantization threshold, T is reduced by half, and the entire process is repeated for all the entries currently listed in the three queues.

To achieve additional compression, in one embodiment, according to one embodiment, the output of the entropy coder may be passed through a CABAC-type processor. The embedded output stream of the entropy coder has been designed in a way, such that the compression is optimized by segregating the bit stream based on the context in which the particular bit has been coded. The bit stream includes the bits representing the binary decisions made during the coding. The bits corresponding to the same decisions are segregated and coded separately. Since the Wavelet transformed coefficients are arranged such that the coefficients with identical characteristics are grouped together, the decisions made on the coefficients in a group are expected to be similar or identical. Hence the bit-stream generated as a result would have longer runs of identical bits, making it more suitable for compression, and achieving more optimal level of compression.

Note that the wavelet coefficients "w" have a unique spatial correlation with one another, depending on which sub-band and tree it may belong to. Particu


Free Web Sudoku Puzzles.
Solve with your browser.
5       7 4      
    1         9  
6     3       4  
3         1   7  
1 5 2       4 3 9
  7   5         1
  2       5     3
  9         7    
      7 9       2
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!