Title: Multi-resolution image data management system and method based on tiled wavelet-like transform and sparse data coding
Abstract: An image process system tiles an image data array, processing the tiles in a predefined order. Each tile of image data is processed by applying a predefined family of transform layers to the tile of image data so as to generate successive sets of transform coefficients. Each set of transform coefficients include edge coefficients positioned at outside boundaries of the set of transform coefficients and non-edge coefficients positioned at interior locations of the set of transform coefficients. The sets of transform coefficients include a last set of transform coefficients, produced by the last transform layer, and one or more earlier sets of transform coefficients. The transform filters used include a short edge transform filter that is applied to image data at boundaries of the tile and to coefficients positioned at and near boundaries of each of the earlier sets of transform coefficients so as to generate the edge coefficients, and a long interior filter that is applied to image data at interior locations of the tile and to coefficients at interior locations of the earlier sets of transform coefficients so as to generate the non-edge coefficients. The edge transform filter has a shorter filter support than the interior transform filter, and both the edge transform filter and the interior transform filter are applied only to image data within the tile and only to transform coefficients within the earlier sets of transform.
Patent Number: 6,978,049 Issued on 12/20/2005 to Chui,   et al.
| Inventors:
|
Chui; Charles K. (Menlo Park, CA);
Gao; Hong-Ye (Milpitas, CA);
Zhong; Lefan (Santa Clara, CA)
|
| Assignee:
|
Zoran Corporation (Sunnyvale, CA)
|
| Appl. No.:
|
969495 |
| Filed:
|
October 19, 2004 |
| Current U.S. Class: |
382/240 |
| Intern'l Class: |
G06K 009/36 |
| Field of Search: |
382/232,236,238,239,240,248,250,251
348/384.1,394.1,395.1,398.1,4001-4031,407.1,409.1,413.1,416.1,430.1,431.1,424.1,425.1,437.1,438.1
375/240.02,240.03,240.11,240.12,24015-24016,24018-2402
341/51,63,65,67,79,107
708/203,300,307-308,313,316-317,400-405
|
References Cited [Referenced By]
U.S. Patent Documents
| 5025309 | Jun., 1991 | Isnardi.
| |
| 5128757 | Jul., 1992 | Citta et al.
| |
| 5162923 | Nov., 1992 | Yoshida et al.
| |
| 5309232 | May., 1994 | Hartung et al.
| |
| 5481308 | Jan., 1996 | Hartung et al.
| |
| 5684714 | Nov., 1997 | Yogeshwar et al.
| |
| 5748786 | May., 1998 | Zandi et al.
| |
| 5892847 | Apr., 1999 | Johnson.
| |
| 5926791 | Jul., 1999 | Ogata et al.
| |
| 5977996 | Nov., 1999 | Kondo.
| |
| 6125143 | Sep., 2000 | Suzuki et al.
| |
| 6141446 | Oct., 2000 | Boliek et al.
| |
| 6236629 | Dec., 2000 | Cheung et al.
| |
| 6195465 | Feb., 2001 | Zandi et al.
| |
| 6236757 | May., 2001 | Zeng et al.
| |
| 6252989 | Jun., 2001 | Geizler et al.
| |
| 6282322 | Aug., 2001 | Rackett.
| |
| 6459816 | Oct., 2002 | Matsuura et al.
| |
| Foreign Patent Documents |
| PCT/US01/31870 | Feb., 2002 | WO.
| |
Primary Examiner: Couso; Jose L.
Attorney, Agent or Firm: Morgan, Lewis & Bockius LLP
Parent Case Text
This is a divisional application of U.S. Application Ser. No. 10/347,422, filed
Jan. 17, 2003, which issued as Patent No. 6,807,308 on Oct. 19, 2004, which is
a continuation of U.S. application Ser. No. 09/687,467, filed Oct. 12, 2000, which
issued as Patent No. 6,549,674 on Apr. 15, 2003.
Claims
1. A method of processing an array of image data representing an initial image, comprising:
applying a predefined family of transform filters to the array of image data
so as to generate successive subbands of transform coefficients;
encoding the transform coefficients and storing the encoded transform coefficients
in a first file data structure in a predefined order to facilitate decoding a plurality
N of overlapping subsets of the first file data structure for reconstructing the
initial image at a corresponding plurality N of resolution levels 0 to N-1, each
resolution level differing in resolution from a neighboring resolution level by
a first predefined non-zero resolution factor;
down-sampling the image to generate a second array of image data representing
a smaller image having a lower resolution than the initial image;
applying the predefined family of transform filters to the second array of image
data so as to generate successive subbands of transform coefficients; and
encoding the transform coefficients and storing the encoded transform coefficients
in a second file data structure in the predefined order to facilitate decoding
a plurality M of overlapping subsets of the second file data structure for reconstructing
the smaller image at a corresponding plurality M of resolution levels 0 to M-1,
each resolution level differing in resolution from a neighboring resolution level
by a second predefined non-zero resolution factor;
wherein the first and second file data structures together store encoded transform
coefficients for reconstructing the initial image at a plurality of P of distinct
resolution levels, where P is greater than N and greater than M.
2. The method of claim 1, further including storing in the second file data structure
a link to the first file data structure, wherein the link is suitable for accessing
the first file data structure while viewing the second file data structure on a
client computer.
3. The method of claim 2, further including embedding in the second file data
structure a computer program for decoding the second file data structure so as
to reconstruct the second image.
4. The method of claim 2, further including embedding in the second file data
structure a link to a computer program for decoding the second file data structure
so as to reconstruct the second image.
5. The method of claim 1, wherein the encoding includes dividing the encoded
values for at least one of the subbands of transform coefficients into two or more
portions in accordance with one or more bit plane thresholds, each portion containing
a respective distinct range of bit planes of the encoded values for the at least
one subband of transform coefficients; and
for the at least one subband of transform coefficients, storing a most significant
portion of the encoded values in a first contiguous region of an image file, and
storing a least significant portion of the encoded values in a second contiguous
region of the image file.
6. A method of processing an array of image data representing an initial image, comprising:
applying a predefined family of transform filters to the array of image data
so as to generate successive subbands of transform coefficients;
encoding the transform coefficients and storing the encoded transform coefficients
in a first file data structure in a predefined reverse resolution order to facilitate
decoding a plurality N of overlapping subsets of the file for reconstructing the
initial image at a corresponding plurality N of resolution levels 0 to N-1, each
resolution level i differing in resolution from a neighboring resolution level
by a factor of 4;
down-sampling the image to generate a second array of image data representing
a smaller image having a resolution 2
2N times lower than the image;
applying the predefined family of transform filters to the second array of image
data so as to generate successive subbands of transform coefficients; and
encoding the transform coefficients and storing the encoded transform coefficients
in a second file data structure in the predefined reverse resolution order to facilitate
decoding a plurality M of overlapping subsets of the second file data structure
for reconstructing the smaller image at a corresponding plurality M of resolution
levels 0 to M-1, each resolution level differing in resolution from a neighboring
resolution level by a factor of 4;
wherein the first and second file data structures together store encoded transform
coefficients for reconstructing the initial image at a series of N+M resolution
levels, each resolution level differing in resolution from a neighboring resolution
level by a factor of 4.
7. The method of claim 6, further including storing in the second file data structure
a link to the first file data structure, wherein the link is suitable for accessing
the first file data structure while viewing the second file data structure on a
client computer.
8. The method of claim 7, further including embedding in the second file data
structure a computer program for decoding the second file data structure so as
to reconstruct the second image.
9. The method of claim 7, further including embedding in the second file data
structure a link to a computer program for decoding the second file data structure
so as to reconstruct the second image.
10. A computer program product for use in conjunction with a computer system,
the computer program product comprising a computer readable storage medium and
a computer program mechanism embedded therein, the computer program mechanism comprising:
an image processing module that applies a predefined family of transform filters
to the array of image data so as to generate successive subbands of transform coefficients;
the image processing module including:
instructions for encoding the transform coefficients and storing the encoded
transform coefficients in a first file data structure in a predefined order to
facilitate decoding a plurality N of overlapping subsets of the first file data
structure for reconstructing the initial image at a corresponding plurality N of
resolution levels 0 to N-1, each resolution level differing in resolution from
a neighboring resolution level by a first predefined non-zero resolution factor;
instructions for down-sampling the image to generate a second array of image
data representing a smaller image having a lower resolution than the initial image;
instructions for applying the predefined family of transform filters to the second
array of image data so as to generate successive subbands of transform coefficients;
and
instructions for encoding the transform coefficients and storing the encoded
transform coefficients in a second file data structure in the predefined order
to facilitate decoding a plurality M of overlapping subsets of the second file
data structure for reconstructing the smaller image at a corresponding plurality
M of resolution levels 0 to M-1, each resolution level differing in resolution
from a neighboring resolution level by a second predefined non-zero resolution
factor;
wherein the first and second file data structures together store encoded transform
coefficients for reconstructing the initial image at a plurality of P of distinct
resolution levels, where P is greater than N and greater than M.
11. The computer program product of claim 10, wherein the image processing module
further includes instructions for storing in the second file data structure a link
to the first file data structure, wherein the link is suitable for accessing the
first file data structure while viewing the second file data structure on a client computer.
12. The computer program product of claim 11, wherein the image processing module
further includes instructions for embedding in the second file data structure a
computer program for decoding the second file data structure so as to reconstruct
the second image.
13. The computer program product of claim 11, wherein the image processing module
further includes instructions for embedding in the second file data structure a
link to a computer program for decoding the second file data structure so as to
reconstruct the second image.
14. The computer program product of claim 10, wherein the instructions for encoding
include instructions for dividing the encoded values for at least one of the subbands
of transform coefficients into two or more portions in accordance with one or more
bit plane thresholds, each portion containing a respective distinct range of bit
planes of the encoded values for the at least one subband of transform coefficients; and
instructions for storing, for the at least one subband of transform coefficients,
a most significant portion of the encoded values in a first contiguous region of
an image file, and a least significant portion of the encoded values in a second
contiguous region of the image file.
15. A computer program product for use in conjunction with a computer system,
the computer program product comprising a computer readable storage medium and
a computer program mechanism embedded therein, the computer program mechanism comprising:
an image processing module that applies a predefined family of transform filters
to the array of image data so as to generate successive subbands of transform coefficients;
the image processing module including:
instructions for encoding the transform coefficients and storing the encoded
transform coefficients in a first file data structure in a predefined reverse resolution
order to facilitate decoding a plurality N of overlapping subsets of the file for
reconstructing the initial image at a corresponding plurality N of resolution levels
0 to N-1, each resolution level i differing in resolution from a neighboring resolution
level by a factor of 4;
instructions for down-sampling the image to generate a second array of image
data representing a smaller image having a resolution 2
2N times lower
than the image;
instructions for applying the predefined family of transform filters to the second
array of image data so as to generate successive subbands of transform coefficients;
and
instructions for encoding the transform coefficients and storing the encoded
transform coefficients in a second file data structure in the predefined reverse
resolution order to facilitate decoding a plurality M of overlapping subsets of
the second file data structure for reconstructing the smaller image at a corresponding
plurality M of resolution levels 0 to M-1, each resolution level differing in resolution
from a neighboring resolution level by a factor of 4;
wherein the first and second file data structures together store encoded transform
coefficients for reconstructing the initial image at a series of N+M resolution
levels, each resolution level differing in resolution from a neighboring resolution
level by a factor of 4.
16. The computer program product of claim 15, wherein the image processing module
further includes instructions for storing in the second file data structure a link
to the first file data structure, wherein the link is suitable for accessing the
first file data structure while viewing the second file data structure on a client computer.
17. The computer program product of claim 16, wherein the image processing module
further includes instructions for embedding in the second file data structure a
computer program for decoding the second file data structure so as to reconstruct
the second image.
18. The computer program product of claim 16, wherein the image processing module
further includes instructions for embedding in the second file data structure a
link to a computer program for decoding the second file data structure so as to
reconstruct the second image.
Description
The present invention relates generally to the processing, compression, communication
and storage of images in computer systems, personal digital assistants, digital
cameras and other devices, and particularly to an image management system and method
in which digitally encoded images can be viewed in any specified window size and
at a number of resolutions, and can be printed, cropped, and otherwise manipulated.
BACKGROUND OF THE INVENTION
An image may be stored at a number of resolution levels. The encoded image data
for a lower resolution level is smaller, and thus takes less bandwidth to communicate
and less memory to store than the data for a higher resolution level. When an image
is stored for multi-resolution use, it would be desirable for the image data to
be segregated into an ordered group of sets or subfiles, where each additional
subfile provides the additional data needed to increase the resolution of the image
from one level to the next. Further, it would be desirable for the quantity of
image data in each subfile, for increasing the image resolution by a particular
factor (such as 4), to be approximately proportional to the associated increase
in resolution. For instance, if each resolution level differs from its neighboring
resolution levels by a factor of 4 (e.g., level 0: 32×32, level 1: 64×64,
level 2: 128×128, and so on), then the quantity of encoded image data for
each resolution level should be approximately 25% as much as the quantity of encoded
image data for the next higher resolution level. From another viewpoint, the quantity
of data in the subfile(s) used to increase the image resolution from a first level
to the next should, ideally, be approximately three times as much as the quantity
of data in the subfile(s) for the first level.
It is well known that wavelet compression of images automatically generates several
resolution levels. In particular, if N "layers" of wavelet transforms are applied
to an image, then N+1 resolution levels of data are generated, with the last LL
subband of data comprising the lowest resolution level and all the subbands of
data together forming the highest resolution level. For convenience, the "layers"
of wavelet transforms will sometimes be called "levels". Each of these resolution
levels differs from its neighbors by a factor of two in each spatial dimension.
We may label these resolution levels as Level 0 for the lowest, thumbnail level
to Level N for the highest resolution level, which is the resolution of the final
or base image.
A first aspect of the present invention is based on two observations. The first
such observation is that, when using conventional as well as most proprietary data
compression and encoding methods, the quantity of data in the N levels generated
by wavelet compression tends to decrease in a geometric progression. For instance,
the quantity of data for resolution Level 0 is typically about 80% of the quantity
of data for resolution Level 1, whereas ideally it should about 25% of the quantity
of data for resolution Level 1. As a result, the data for Level 0 contains significantly
more data than is needed to display the Level 0 image. Alternately stated, the
data for Level 0 gives unnecessarily high quality for the low resolution display
at Level 0, and therefore gives less compression than could potentially be obtained
by providing only the information needed for displaying the image at the Level
0 resolution level.
The second observation is that the low resolution image data coefficients are
quantized for full resolution display, not for low resolution display, because
these data coefficients are used not only for generating a low resolution representation
of the image, but are also used when generating the higher resolution representations
of the image.
In accordance with this first aspect of the present invention, as already indicated
above, it would be desirable for the quantity of image data in the subarray or
subfile for each resolution level to be approximately proportional to the increase
in resolution associated with that resolution level.
A second aspect of the present invention is based on the observation that wavelet
transforms are conventionally applied across tile or block boundaries of an image
to avoid tile or block boundary artifacts in the regenerated image. A wavelet transform
may be implemented as a FIR (finite impulse response) filter having an associated
length. The "length" indicates the number of data samples that are used to generate
each coefficient. Wavelet transforms are generally symmetric about their center,
and when the filter that implements the wavelet transform is at the edge of a tile
or block, typically half or almost half of the filter will extend into a neighboring
block or tile. As a result it is usually necessary to keep not only part of the
neighboring tiles in memory while wavelet transforming a tile of an image, it also
necessary to keep in memory the edge coefficients of the neighboring tiles for
each level of the wavelet transform. Thus, avoiding tiling effects (also called
tile border effects or artifacts or edge artifacts) typically increases the memory
requirements of the computer or device performing the wavelet transforms on an
image, and may also increase the complexity of the transform procedure because
of the need to keep track of the memory locations of edge data and coefficients
from the neighboring tiles or blocks. In accordance with the second aspect of the
present invention, it would be highly desirable to have a wavelet or wavelet-like
transform that can be applied to just the data for the image block being processed,
without having to also apply the transform to data from neighboring blocks, and
without creating noticeable edge artifacts. Having such a transform would decrease
memory requirements and might simplify the wavelet compression of images.
It is well known in the prior art that digital images can be processed a portion
at a time, instead of all at once, thereby reducing memory requirements. For instance,
the DCT transform used for JPEG compression and encoding of images is traditionally
used on tiles of 8×8 pixels. However, a well known problem with tiling an
image for processing is that the tiling produces undesirable tile border effects.
The border effects of DCT tiling in JPEG images are considered to be acceptable
because the very small size of the tiles makes the tiling effect relatively unnoticeable
to the human eye.
However, using very small tiles such as 8×8 pixels is not practical
when using wavelet or wavelet-like transforms in place of the DCT transform. Wavelet-like
transforms have been shown to provide significantly better data compression than
the DCT transform, and therefore using wavelet-like transforms would be desirable
if the tiling effect can be avoided while using a moderate amount of working memory.
It would therefore be desirable to provide an image processing system and method
that process images using a moderate amount of working memory, such as 8 to 20
KB, by transforming the image data using a wavelet-like transform with moderately
sized tiles, such as tiles of 64×64, or 32×32, or 64×32 pixels,
while at the same time avoiding the generation of undesirable tiling (tile border) effects.
A third aspect of the present invention is based on the observation that the
optimal
quantization level to be applied to wavelet coefficients not only varies from one
transform subband to another, but also varies from one region of an image to another.
In particular, regions of an image that contain many "features" (typically characterized
by horizontal or vertical lines or edges) are harder to compress than regions with
fewer features. That is, such densely featured image regions cannot be compressed
as much as less densely featured regions without causing degradation in the quality
of the image regions regenerated from the compressed data. It would therefore be
desirable to provide an image compression and encoding system with a quantization
procedure that uses smaller quantization divisors to quantize the wavelet coefficients
of heavily featured regions than the quantization divisors used to quantize the
wavelet coefficients of regions having fewer features.
SUMMARY OF THE INVENTION
In summary, the present invention is an image processing system and method for
applying a family of predefined transforms, such as wavelet-like transforms, to
the image data for an image so as to generate transform image data and for applying
a data compression method to the transform image data so as to generate an image
file. The image processing system and method tiles a captured image, processing
the tiles in a predefined order. The tiles are nonoverlapping portions of the image
data. Each tile of image data is processed by applying a predefined family of transform
layers to the tile of image data so as to generate successive sets of transform
coefficients. In a preferred embodiment, the transform layers are successive applications
of a family of wavelet-like decomposition transforms, including edge filters applied
to data at the boundaries of the data arrays being processed and interior filters
applied to data in the interior regions of the data arrays.
The set of transform coefficients processed by each transform layer include edge
coefficients positioned at outside boundaries of the set of transform coefficients
and non-edge coefficients positioned at interior locations of the set of transform
coefficients. The sets of transform coefficients include a last set of transform
coefficients, produced by the last transform layer, and one or more earlier sets
of transform coefficients.
The transform filters used include one or more edge transform filters applied
to image data at boundaries of the tile and to coefficients positioned at and near
boundaries of each of the earlier sets of transform coefficients so as to generate
the edge coefficients, and one or more interior filters applied to image data at
interior locations of the tile and to coefficients at interior locations of the
earlier sets of transform coefficients so as to generate the non-edge coefficients.
The edge transform filters have shorter filter supports than the interior transform
filters, and both the edge transform filters and the longer interior transform
filters are applied only to image data within the tile and only to transform coefficients
within the earlier sets of transform.
The edge filters include a short, low spatial frequency filter that weights the
image datum closest to the boundary of the tile and the transform coefficient closest
to the boundary of each earlier set of transform coefficients so as to as enable
regeneration of the image from the transform coefficients without tile boundary artifacts.
At least some of the transform filters are preferably asymmetric boundary filters,
extending to a first extent toward each tile's boundary, and extending to a second,
longer extent in a direction away from the tile's boundary, but not extending over
the tile's boundary.
In a preferred embodiment, the interior transform filters include a center filter,
for generating two to four high pass and two to four low pass coefficients at or
near the center of the data array being processed. The center filter acts as a
filter switch. Two distinct forms of the interior filter are used on alternate
sides of the center filter. For instance, the interior filter may be centered on
even data positions on one side of the center filter and centered on odd data positions
on the other side of the center filter.
The image processing system and method may also include image reconstruction
circuitry or procedures for successively applying a data decompression method and
an inverse transform to the image file so as to generate a reconstructed image
suitable for display on an image viewer.
In a second aspect of the present invention, the sets of transform coefficients
correspond to spatial frequency subbands of the image. The subbands are grouped
in accordance with the transform layer that generated them. For one or more respective
groups of subbands, for each tile of the image, one or more parameters are generated
whose value is indicative of the density of image features in the tile. Each tile
of the image is classified into one of a predefined set of categories in accordance
with the values of the one or more parameters. Based on the classification of the
tile, a set of quantization factors for the tile are selected, and then the transform
coefficients of the tile are scaled by the selected set of quantization factors
to as to generate a set of quantized transform coefficients for the tile.
In a third aspect of the present invention the quantized transform coefficients
are encoded. While the coefficients for each group of spatial frequency subbands
are being encoded, a most significant set of bit planes of those coefficients are
stored in a first bitstream and the remaining least significant set of bit planes
of the coefficients are stored in a second bitstream. From another viewpoint, the
portions of the encoded coefficients (for a group of subbands) whose value exceeds
a predefined threshold are stored in a first bitstream while the remaining portion
of the encoded coefficients are stored in a second bitstream. When reconstructing
an image from the image file at a specified resolution level, only the bitstreams
corresponding to the specified resolution level are decoded and used to reconstruct
the image. For some resolution levels, one or more of the bitstreams not used will
contain the least significant portions (i.e., bit planes) of subbands whose more
significant portions are contained in the bitstreams used to reconstruct the image
at that resolution level.
BRIEF DESCRIPTION OF THE DRAWINGS
Additional objects and features of the invention will be more readily
apparent from the following detailed description and appended claims when taken
in conjunction with the drawings, in which:
FIG. 1 is a block diagram of a distributed computer system, including a web
server and a number of client computers, for distributing multi-resolution images
to the client computers.
FIG. 2 is a block diagram of a computer system in accordance with an embodiment
of the present invention.
FIG. 3A schematically depicts the process of transforming a raw image into a
transform image array and compressing the transform image array into a compressed
image file. FIG. 3B depicts a mapping of spatial frequency subbands to NQS subbands
used for encoding transform coefficients.
FIG. 4 is a conceptual representation of the encoded data that represents an
image, organized to facilitate multi-resolution regeneration of the image (i.e.,
at multiple resolution levels).
FIGS. 5A, 5B, 5C, 5D and 5E depict image storage
data structures.
FIG. 6 is a high level flow chart of an image processing process to which the
present invention can be applied.
FIGS. 7A, 7B and 7C graphically depict a forward and inverse
wavelet-like data transformation procedure.
FIG. 8 depicts the spatial frequency subbands of wavelet coefficients generated
by applying multiple layers of a decomposition wavelet or wavelet-like transform
to an array of image data.
FIG. 9 depicts a flow chart of a block classification method for selecting a
set of quantization divisors for a block of an image.
FIGS. 10A and 10B depict a flow chart of a procedure for encoding the transform
coefficients for a block of an image.
FIGS. 11A, 11B and 11C depict a method of encoding values, called
MaxbitDepth values in a preferred embodiment, which represent the number of bits
required to encode the transform coefficients in each block and subblock of an
encoded image.
FIG. 12 is a high level flow chart of a compressed image reconstruction process
to which the present invention can be applied.
FIGS. 13A and 13B depict a flow chart of a procedure for decoding the transform
coefficients for an image and for reconstructing an image from the coefficients.
FIG. 14 is a block diagram of a digital camera in which one or more aspects
of the present invention are implemented.
FIG. 15 is a conceptual flow chart of a client computer downloading a thumbnail
image, then zooming in on the image, and then panning to a new part of the image.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
In this document, the terms "wavelet" and "wavelet-like" are used interchangeably.
Wavelet-like transforms generally have spatial frequence characteristics similar
to those of conventional wavelet transforms, and are losslessly reversible, but
have shorter filters that are more computationally efficient.
The present invention may be implemented in a variety of devices that process
images, including a variety of computer systems, ranging from high end workstations
and servers to low end client computers, as well as in application specific dedicated
devices, such as digital cameras.
System for Encoding and Distributing Multi-Resolution Images
FIG. 1 shows a distributed computer system, including a web server 140
and a number of client computers 120, for distributing multi-resolution
images 190 to the client computers via a global communications network 110,
such as the Internet, or any other appropriate communications network, such as
a local area network or Intranet. An imaging encoding workstation 150 prepares
multi-resolution image files for distribution by the web server. In some embodiments,
the web server 140 may also perform the image encoding tasks of the image
encoding workstation 150.
A typical client device 120 will be a personal digital assistant, personal
computer, workstation, or a computer controlled device dedicated to a particular
task. The client device 120 will preferably include a central processing
unit 122, memory 124 (including high speed random access memory,
and non-volatile memory such as disk storage), and a network interface or other
communications interface 128 for connecting the client device to the web
server via the communications network 110. The memory 124 will typically
store an operating system 132, a browser application or other image viewing
application 134, an image decoder module 180, and multi-resolution
image files 190 encoded in accordance with the present invention. In a preferred
embodiment, the browser application 134 includes or is coupled to a Java™
(trademark of Sun Microsystems, Inc.) virtual machine for executing Java language
programs, and the image decoder module is implemented as a Java™ applet
that is dynamically downloaded to the client device along with the image files
190, thereby enabling the browser to decode the image files for viewing.
The web server 140 will preferably include a central processing unit 142,
memory 144 (including high speed random access memory, and non-volatile
memory such as disk storage), and a network interface or other communications interface
148 for connecting the web server to client devices and to the image encoding
workstation 150 via the communications network 110. The memory 144
will typically store an http server module 146 for responding to http requests,
including request for multi-resolution image files 190. The web server 140
may optionally include an image processing module 168 with encoding procedures
172 for encoding images as multi-resolution images.
Computer System
Referring to FIG. 2, the image processing workstation 150 may be
implemented using a programmed general-purpose computer system. This figure may
also represent the web server, when the web server performs image processing tasks.
The computer system 150 may include:
- one or more data processing units (CPU's) 152;
- memory 154, which will typically include both high speed random access
memory as well as non-volatile memory;
- a user interface 156, including a display device 157 such as a CRT or
LCD type display;
- a network or other communication interface 158 for communicating with
other computers as well as other devices;
- a data port 160, such as for sending and receiving images to and from
a digital camera (although such image transfers might also be accomplished via
the network interface 158); and
- one or more communication busses 161 for interconnecting the CPU(s)
152, memory 154, user interface 156, network interface 158 and data port 160.
The computer system's memory 154 stores procedures and data, typically including:
- an operating system 162 for providing basic system services;
- a file system 164, which may be part of the operating system;
- application programs 166, such as user level programs for viewing and
manipulating images,
- an image processing module 168, for performing various image processing
functions including those that are the subject of the present document;
- image files 190 representing various images; and
- temporary image data arrays 192 for intermediate results generated during
image processing and image regeneration.
The computer 150 may also include a http server module 146 (FIG.
1) when this computer 150 is used both for image processing and distribution
of multi-resolution images.
The image processing module 168 may include an image encoder module 170,
and an image decoder module 180. The image encoder module 170 produces
multi-resolution image files 190, the details of which will be discussed
below. The image encoder module 170 may include:
- an encoder control program 172, which controls the process of compressing
and encoding an image (starting with a raw image array 189, which in turn may be
derived from the decoding of an image in another image file format);
- a set of wavelet-like transform procedures 174 for applying wavelet-like
filters to image data representing an image;
- a block classifier procedure 176 for determining the quantization divisors
to be applied to each block (or band) of transform coefficients for an image;
- a quantizer procedure 178 for quantizing the transform coefficients
for an image; and
- a sparse data encoding procedure 179, also known as an entropy encoding
procedure, for encoding the quantized transform coefficients generated by the quantizer 178.
The procedures in the image processing module 168 store partially transformed
images and other temporary data in a set of temporary data arrays 192.
The image decoder module 180 may include:
- a decoder control program 182 for controlling the process of decoding
an image file (or portions of the image file) and regenerating the image represented
by the data in the image file;
- a sparse data decoding procedure 184 for decoding the encoded, quantized
transform coefficients stored in an image file into a corresponding array of quantized.
transform coefficients;
- a de-quantizer procedure 186 for dequantizing a set of transform coefficients
representing a tile of an image; and
- a set of wavelet-like inverse transform procedures 188 for applying
wavelet-like inverse filters to a set of dequantized transform coefficients, representing
a tile of an image, so as to regenerate that tile of the image.
Overview of Image Capture and Processing
Referring to FIG. 3, raw image data 200, obtained from a digital
camera's image capture mechanism (FIG. 14) or from an image scanner or other device,
is processed by "tiling the image data." More specifically, the raw image is treated
as an array of tiles 202, each tile having a predefined size such as 64×64
(i.e., 64 rows by 64 columns). In other embodiments, other tile sizes, such as
32×32 or 16×32 or 128×128 or 64×128 may be used. The tiles
are nonoverlapping portions of the image data. A sufficient number of tiles are
used to cover the entire raw image that is to be processed, even if some of the
tiles overhang the edges of the raw image. The overhanging portions of the tiles
are filled with copies of boundary data values during the wavelet transform process,
or alternately are filled with null data. Tile positions are specified with respect
to an origin at the upper left corner of the image, with the first coordinate indicating
the Y position of the tile (or a pixel or coefficient within the tile) and the
second coordinate indicating the X position of the tile (or a pixel or coefficient
within the tile). Thus a tile at position 0,128 is located at the top of the image,
and has its origin at the 128
th pixel of the top row of pixels.
A wavelet or wavelet-like decomposition transform is successively applied to
each
tile of the image to convert the raw image data in the tile into a set of transform
coefficients. When the wavelet-like decomposition transform is a one dimensional
transform that is being applied to a two dimensional array of image data, the transform
is applied to the image data first in one direction (e.g., the horizontal direction)
to produce an intermediate set of coefficients, and then the transform is applied
in the other direction (e.g., the vertical direction) to the intermediate set of
coefficients so as to produce a final set of coefficients. The final set of coefficients
are the result of applying the wavelet-like decomposition transform to the image
data in both the horizontal and vertical dimensions.
The tiles are processed in a predetermined raster scan order. For example, the
tiles in a top row are processed going from one end (e.g., the left end) to the
opposite end (e.g., the right end), before processing the next row of tiles immediately
below it, and continuing until the bottom row of tiles of the raw image data has
been processed.
The transform coefficients for each tile are generated by successive applications
of a wavelet-like decomposition transform. A first application of the wavelet decomposition
transform to an initial two dimensional array of raw image data generates four
sets of coefficients, labeled LL, HL1, LH1 and HH1. Each succeeding
application of the wavelet decomposition transform is applied only to the LL set
of coefficients generated by the previous wavelet transformation step and generates
four new sets of coefficients, labeled LL, HLx, LHx and HHx, where x represents
the wavelet transform "layer" or iteration. After the last wavelet decomposition
transform iteration only one LL set remains. The total number of coefficients generated
is equal to the number of data samples in the original data array. The different
sets of coefficients generated by each transform iteration are sometimes called
layers. The number of wavelet transform layers generated for an image is typically
a function of the resolution of the initial image. For tiles of size 64×64,
or 32×32, performing five wavelet transformation layers is typical, producing
16 spatial frequency subbands of data: LL
5, HL
5, LH
5,
HH
5, HL
4, LH
4, HH
4, HL
3,
LH
3, HH
3, HL
2, LH
2, HH
2,
HL
1, LH
1, HH
1. The number of transform layers
may vary from one implementation to another, depending on both the size of the
tiles used and the amount of computational resources available. For larger tiles,
additional transform layers would likely be used, thereby creating additional subbands
of data. Performing more transform layers will often produce better data compression,
at the cost of additional computation time, but may also produce additional tile
edge artifacts.
The spatial frequency subbands are grouped as follows. Subband group 0
corresponds to the LL
N subband, where N is the number of transform layers
applied to the image (or image tile). Each other subband group i contains three
subbands, LH
i, HL
i and HH
i. As will be described
in detail below, when the transform coefficients for a tile are encoded, the coefficients
from each group of subbands are encoded separately from the coefficients of the
other groups of subbands. In a preferred embodiment, a pair of bitstreams is generated
to represent the coefficients in each group of subbands. One of the bitstreams
represents the most significant bit planes of the coefficients in the group of
subbands while the second bitstream represents the remaining, least significant
bit planes of the coefficients for the group of subbands.
The wavelet coefficients produced by application of the wavelet-like transform
are preferably quantized (by quantizer 178) by dividing the coefficients
in each subband of the transformed tile by a respective quantization value (also
called the quantization divisor). In the preferred embodiment, a separate quantization
divisor is assigned to each subband. More particularly, as will be discussed in
more detail below, a block classifier 176 generates one or more values representative
of the density of features in each tile of the image, and based on those one or
more values, a table of quantization divisors is selected for quantizing the coefficients
in the various subbands of the tile.
The quantized coefficients produced by the quantizer 178 are encoded by
a sparse data encoder 179 to produce a set of encoded subimage subfiles
210 for each tile of the image. Details of the wavelet-like transforms used
in a preferred embodiment are described in detail below. Circuitry for performing
the wavelet-like transform of the preferred embodiment is very similar to the wavelet
transform and data quantization methods described in U.S. Pat. No. 5,909,518, "System
and Method for Performing Wavelet and Inverse Wavelet Like Transformations of Digital
Data Using Only Add and Bit Shift Arithmetic Operations," which is hereby incorporated
by reference as background information.
The sparse data encoding method of the preferred embodiment is called Nested
Quadratic Splitting (NQS), and is described in detail below. This sparse data encoding
method is an improved version of the NQS sparse data encoding method described
in U.S. Pat. No. 5,949,911, entitled "System and Method for Scalable Coding of
Sparse Data Sets," which is hereby incorporated by reference as background information.
FIG. 3B depicts a mapping of spatial frequency subbands to NQS subbands used
for encoding transform coefficients. In particular, in one preferred embodiment
seven spatial frequency subbands (LL
5, HL
5, LH
5,
HH
5, HL
4, LH
4, and HH
4) are mapped
to a single NQS subband (subband 0) for purposes of encoding the coefficients
in these subbands. In other words, the coefficients in these seven spatial frequency
subbands are treated as a single top level block for purposes of NQS encoding.
In one preferred embodiment, NQS subbands 0, 1, 2 and 3
are encoded as four top level NQS blocks, the most significant bit planes of which
are stored in a bitstream representing a lowest resolution level of the image in question.
Image Resolution Levels and Subimages
Referring to FIG. 4, an image is stored at a number of resolution levels
0 to N, typically with each resolution level differing from its neighbors
by a resolution factor of four. In other words, if the highest resolution representation
(at resolution level N) of the image contains X amount of information, the second
highest resolution level representation N-1 contains X/4 amount of information,
the third highest resolution level representation contains X/16 amount of information,
and so on. The number of resolution levels stored in an image file will depend
on the size of the highest resolution representation of the image and the minimum
acceptable resolution for the thumbnail image at the lowest resolution level. For
instance, if the full or highest resolution image is a high definition picture
having about 16 million pixels (e.g., a 4096×4096 pixel image), it might be
appropriate to have seven resolution levels: 4096×4096, 2048×2048, 1024×1024,
512×512, 256×256, 128×128, and 64×64.
However, as shown in FIG. 4, one feature or aspect of the present invention
is that when a multi-resolution image has more than, say, three or four resolution
levels, the image is encoded and stored in multiple "base image" files, each of
which contains the data for two to four of the resolution levels. Alternately,
all the base images may be stored in a single file, with each base image being
stored in a distinct base image subfile or subfile data structure within the image file.
Each base image file (or subfile) contains the data for reconstructing a "base
image" and one to three subimages (lower resolution levels). For instance, in the
example shown in FIG. 4, the image is stored in three files, with a first file
storing the image at three resolution levels, including the highest definition
level and two lower levels, a second file stores the image at three more resolution
levels (the fourth, fifth and sixth highest resolution levels) and a third file
stores the image at the two lowest resolution levels, for a total of eight resolution
levels. Generally, each successive file will be smaller than the next larger file
by a factor of about 2
2X where X is the number of resolution levels
in the larger file. For instance, if the first file has three resolution levels,
the next file will typically be smaller by a factor of 64 (2
6).
As a result, an image file representing a group of lower resolution levels will
be much smaller, and thus much faster to transmit to a client computer, than the
image file containing the full resolution image data. For instance, a user of a
client computer might initially review a set of thumbnail images, at a lowest resolution
level (e.g., 32×32 or 64×64), requiring the client computer to review
only the smallest of the three image files, which will typically contain about
0.024% as much data as the highest resolution image file. When the user requests
to see the image at a higher resolution, the client computer may receive the second,
somewhat larger image file, containing about 64 times as much data as the lowest
resolution image file. This second file may contain three resolution levels (e.g.,
512×512, 256×256, and 128×128), which may be sufficient for the
user's needs. In the event the user needs even high resolution levels, the highest
resolution file will be sent. Depending on the context in which the system is used,
the vendor of the images may charge additional fees for downloading each successively
higher resolution image file.
It should be noted that many image files are not square, but rather are rectangular,
and that the square image sizes used in the above examples are not intended to
in any way limit the scope of the invention. While the basic unit of information
that is processed by the image processing modules is a tile, which is typically
a 64×64 or 32×32 array of pixels, any particular image may include an
arbitrarily sized array of such tiles. Furthermore, the image need not be an even
multiple of the tile size, since the edge tiles can be truncated wherever appropriate.
The designation of a particular resolution level of an image as the "thumbnail"
image may depend on the client device to which the image is being sent. For instance,
the thumbnail sent to a personal digital assistant or mobile telephone, which have
very small displays, may be much smaller than (for example, one sixteenth the size
of) the thumbnail that is sent to a personal computer; and the thumbnail sent to
a device having a large, high definition screen may be much larger than the thumbnail
sent to a personal computer having a display of ordinary size and definition. When
an image is to be potentially used with a variety of client devices, additional
base images are generated for the image so that each type of device can initially
receive an appropriately sized thumbnail image.
When an image is first requested by a client device, the client device may specify
its window size in its request for a thumbnail image, or the server may determine
the size of the client device's viewing window by querying the client device prior
to downloading the thumbnail image data to the client device. As a result, each
client device receives an minimum resolution thumbnail that is appropriately sized
for that device.
Image File Data Structures
Referring to FIGS. 5A through 5E, when all the tiles of an image have been
transformed, compressed and encoded, the resulting encoded image data is stored
as an image file 190. The image file 190 includes header data 194
and a sequence of base image data structures, sometimes called base image subfiles
196. Each base image subfile 196 typically includes the data for
displaying the image at two or more resolution levels. Furthermore, each base image
supports a distinct range of resolution levels. The multiple base images and their
respective subimages together provide a full range of resolution levels for the
image, as conceptually represented in FIG. 4. While the resolution levels
supported by the base image levels are non-overlapping in the preferred embodiment,
in an alternate embodiment the resolution levels supported by one base image may
overlap with the resolution levels supported by another base image (for the same
initial full resolution image).
In the preferred embodiment, each image file 190 is an html file or similarly
formatted web page that contains a link 198, such as an object tag or applet
tag, to an applet 199 (e.g., a Java™ applet) that is automatically
invoked when the file is downloaded to a client computer. The header 194
and a selected one of the base images 196 are used as data input to the
embedded applet 199, which decodes and renders the image on the display
of user's personal digital assistant or computer. The operation of the applet is
transparent to the user, who simply sees the image rendered on his/her computer
display. Alternately, the applet may present the user with a menu of options including
the resolution levels available with the base image subfile or subfiles included
in the image file, additional base image subfiles that may be available from the
server, as well as other options such as image cropping options.
In an alternate embodiment, the client workstations include an application, such
as a browser plug-in application, for decoding and rendering images in the file
format of the present invention. Further, each image file 210 has an associated
data type that corresponds to the plug-in application. The image file 210
is downloaded along with an html or similarly formatted web page that includes
an embed tag or object tag that points to the image file. As a result, when the
web page is downloaded to a client workstation, the plug-in application is automatically
invoked and executed by the client computer's. As a result, the image file is decoded
and rendered and the operation of the plug-in application is transparent to the user.
The image file 190-A shown in FIG. 5A represents one possible way of storing
a multi-resolution image, and is particularly suitable for storing a multi-resolution
image in a server. In a client computer, the image file 190-B as shown in
FIG. 5B may contain only one base image 196. In addition, the client version
of the image file 190 may contain a link 201 to the image file 190-A
in the server. The link 201 is used to enable a user of the client computer
to download other base images (at other resolution levels) of the same image. Alternately,
the link 201 is a Java™ (trademark of Sun Microsystems) script for
requesting an image file containing any of the higher resolution base images from
the web server. If there is a charge for obtaining the higher resolution image
file, the script will invoke the execution of the server procedure for obtaining
payment from the requesting user.
In yet another alternate embodiment, a multi-resolution image may be stored in
the server as a set of separate base image files 190-B, each having the
format shown in FIG. 5B. This has the advantage of providing image files
190-B that are ready for downloading to client computers without modification.
Referring to FIG. 5A again, the header 194 of the image file includes
the information needed to access the various base image subfiles 196. In
particular, in a preferred embodiment the header 194 stores:
- an identifier or the URL of the image