Title: Dynamic threshold system for multiple raster content (MRC) representation of documents
Abstract: A method and a system for dynamically thresholding an image signal. The system comprises a computing block. The computing block receives the image signal and a minimum and a maximum within each of a set of windows centered on the current pixel in the image signal, and computes, for each of the windows, based on the current pixel and the respective minimum and maximum, a respective indicator representing the distance and direction of the current pixel relative to a respective threshold plane, and outputs a control signal based on the indicators.
Patent Number: 6,859,204 Issued on 02/22/2005 to Curry,   et al.
| Inventors:
|
Curry; Donald J. (Menlo Park, CA);
Kletter; Doron (San Mateo, CA);
Nafarieh; Asghar (Menlo Park, CA)
|
| Assignee:
|
Xerox Corporation (Stamford, CT)
|
| Appl. No.:
|
188277 |
| Filed:
|
July 1, 2002 |
| Current U.S. Class: |
345/426; 382/253 |
| Intern'l Class: |
G06T 005//00 |
| Field of Search: |
395/426,589,596,617
382/100,165,170,176,253,260,300
358/3.13,1.2
|
References Cited [Referenced By]
U.S. Patent Documents
| 5966471 | Oct., 1999 | Fisher et al. | 382/253.
|
| 6075926 | Jun., 2000 | Atkins et al. | 358/1.
|
| 6252608 | Jun., 2001 | Snyder et al. | 345/473.
|
| 6714320 | Mar., 2004 | Nakahara et al. | 358/3.
|
| 6748111 | Jun., 2004 | Stolin et al. | 382/176.
|
| 6768808 | Jul., 2004 | Rhoads | 382/100.
|
Primary Examiner: Jankus; Almis
Attorney, Agent or Firm: Oliff & Berridge, PLC
Parent Case Text
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is related to the following co-pending
applications: Ser. No. 10/187,499 entitled "Digital De-Screening of
Documents", Ser. No. 10/188,026 entitled "Control System for Digital
De-Screening of Documents", Ser. No. 10/188,249 entitled "Segmentation
Method and System for Multiple Raster Content (MRC) Representation of
Documents", Ser. No. 10/188,157 entitled "Separation System for Multiple
Raster Content (MRC) Representation of Documents", all filed Jul. 1, 2002,
on the same date as the present application and commonly assigned to the
present assignee, the contents of which are herein incorporated by
reference.
Claims
What is claimed is:
1. A method for dynamically thresholding an image signal, the method
comprising the operations of:
(a) receiving, at a computing block, the image signal and a minimum and a
maximum within each of a set of windows centered on the current pixel in
the image signal;
(b) computing, for each of the windows, based on the current pixel and the
respective minimum and maximum, a respective indicator representing the
distance and direction of the current pixel relative to a respective
threshold plane; and
(c) outputting a control signal based on the indicators.
2. The method of claim 1 wherein operation (b) comprises the operations of:
(i) computing, for each of the windows, a respective contrast vector; and
(ii) computing, for each of the windows, a bias vector and a dot product of
the respective contrast vector and a respective thresholded pixel vector
representing the current pixel thresholded by the bias vector, the dot
product representing the respective indicator.
3. The method of claim 2 wherein, for one of the windows, operation (ii)
comprises the operation of computing the average between the respective
maximum and the respective minimum to form the bias vector.
4. The method of claim 2 wherein, for one of the windows, operation (ii)
comprises the operation of computing the average between a vector
representing a lowpass filtered neighborhood of the current pixel and the
average between the respective maximum and the respective minimum to form
the bias vector.
5. The method of claim 1 wherein operation (b) comprises:
(1) computing a first indicator for a first window, via a first logic
block, the first indicator representing the distance and direction of the
current pixel relative to a first threshold plane;
(2) computing a second indicator for a second window, via a second logic
block, the second indicator representing the distance and direction of the
current pixel relative to a second threshold plane;
(3) thresholding the current pixel in the image signal via a third logic
block and outputting a third logic block signal; and
wherein operation (c) comprises generating the control signals based on the
first and second indicators and the third logic block signal.
6. The method of claim 5 wherein operation (b) further comprises computing
a first activity measure indicating activity in the first window, using
the first logic block, and computing a second activity measure indicating
activity in the second window using the second logic block.
7. The method of claim 5 wherein operation (c) comprises:
comparing the first activity measure with the second activity measure and
outputting a select signal, using a comparator block; and
receiving the first indicator, the second indicator and the third logic
block signal, selecting and outputting one of the first indicator, the
second indicator and third logic block signal in accordance with the
select signal, using a multiplexer.
8. A system for dynamically thresholding an image signal, the system
comprising:
a computing block receiving the image signal and a minimum and a maximum
within each of a set of windows centered on the current pixel in the image
signal, and computing, for each of the windows, based on the current pixel
and the respective minimum and maximum, a respective indicator
representing the distance and direction of the current pixel relative to a
respective threshold plane, and outputting a control signal based on the
indicators.
9. The system of claim 8 wherein the computing block comprises:
a first module computing, for each of the windows, a respective contrast
vector; and
a second module computing, for each of the windows, a bias vector and a dot
product of the respective contrast vector and a respective thresholded
pixel vector representing the current pixel thresholded by the bias
vector, the dot product representing the respective indicator.
10. The system of claim 9 wherein, for one of the windows, the second
module computes the bias vector by computing the average between the
respective maximum and the respective minimum.
11. The system of claim 9 wherein, for one of the windows, the second
module computes the bias vector by computing the average between a vector
representing a lowpass filtered neighborhood of the current pixel and the
average between the respective maximum and the respective minimum.
12. The system of claim 8 wherein the computing block comprises:
(1) a first logic block computing a first indicator for a first window, the
first indicator representing the distance and direction of the current
pixel relative to a first threshold plane;
(2) a second logic block computing a second indicator for a second window,
the second indicator representing the distance and direction of the
current pixel relative to a second threshold plane;
(3) a third logic block thresholding the current pixel in the image signal
and outputting a third logic block signal; and
(4) a decision module generating the control signals in communication with
the first, second, and third logic blocks, the control signals being based
on the first and second indicators and the third logic block signal.
13. The system of claim 12 wherein the first logic block computes a first
activity measure indicating activity in the first window and wherein the
second logic block computes a second activity measure indicating activity
in the second window.
14. The system of claim 12 wherein the decision module comprises:
a comparator block comparing the first activity measure with the second
activity measure and outputting a select signal; and
a multiplexer receiving the first indicator, the second indicator and the
third logic block signal, selecting and outputting one of the first
indicator, the second indicator and third logic block signal in accordance
with the select signal.
15. An article of manufacture comprising:
a machine usable medium having program code embedded therein, the program
code being used for dynamically thresholding an image signal, the program
code comprising:
(a) machine readable code to receive the image signal and a minimum and a
maximum within each of a set of windows centered on the current pixel in
the image signal;
(b) machine readable code to compute, for each of the windows, based on the
current pixel and the respective minimum and maximum, a respective
indicator representing the distance and direction of the current pixel
relative to a respective threshold plane; and
(c) machine readable code to output a control signal based on the
indicators.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to methods and systems for
segmenting digitally scanned documents into two or more planes, and more
particularly to methods and systems for segmenting digitally scanned
documents into planes suitable for a Multiple Raster Content (MRC)
representation of documents.
2. Description of Related Art
The MRC representation of documents is versatile. It provides the ability
to represent color images and either color or monochrome text. The MRC
representation enables the use of multiple "planes" for the purpose of
representing the content of documents. The MRC representation is becoming
increasingly important in the marketplace. It has been already established
as the main color-fax standard.
In an MRC representation, an image is represented by more than one image
plane. The main advantage of the MRC representation of documents is to
provide an efficient way to store, transmit, and manipulate large digital
color documents. The method exploits the properties of the human vision
system, where the ability to distinguish small color variations is greatly
reduced in the presence of high-contrast edges. The edge information is
normally separated from the smoothly varying color information, and
encoded (possibly at higher resolution than 1 bit per pixel) in one of the
planes, called the Selector plane. Following a careful separation, the
various planes could be independently compressed using standard
compression schemes (such as JPEG and G4) with good compression and high
quality at the same time.
There is a need for a method and a system for efficiently separating an
image into a set of planes, such that the advantages of the MRC
representation can be fully exploited.
SUMMARY OF THE INVENTION
A method and a system for dynamically thresholding an image signal are
disclosed. The system comprises a computing block. The computing block
receives the image signal and a minimum and a maximum within each of a set
of windows centered on the current pixel in the image signal, and
computes, for each of the windows, based on the current pixel and the
respective minimum and maximum, a respective indicator representing the
distance and direction of the current pixel relative to a respective
threshold plane, and outputs a control signal based on the indicators.
BRIEF DESCRIPTION OF THE DRAWINGS
The features and advantages of the present invention will become apparent
from the following detailed description of the present invention in which:
FIG. 1 illustrates the MRC structure for documents.
FIG. 2 shows the block diagram of the system of the present invention.
FIG. 3 shows the block diagram of an embodiment of the system of the
present invention.
FIG. 4 illustrates the function of the Dependent Min-Max block E1 used in
one embodiment of the system of the present invention.
FIG. 5 illustrates the function of the Dependent Min-Max Sub-Sample block
E2 used in one embodiment of the system of the present invention.
FIG. 6 illustrates the functions of the Dependent Max block E3 and
Dependent Min block E4 used in one embodiment of the system of the present
invention.
FIG. 7 illustrates the two window contexts employed by one embodiment of
the Dynamic Threshold module.
FIG. 8 shows the block diagram of one embodiment of the Dynamic Threshold
module.
FIG. 9 shows an implementation of the comparator logic block included in
one embodiment of the Dynamic Threshold module.
FIG. 10 shows the truth table of the comparator logic block of FIG. 9.
FIG. 11 shows an implementation of the selector logic module included in
one embodiment of the Dynamic Threshold module.
FIG. 12 illustrates the function of the Edge Processing block included in
the Separation module.
FIG. 13 illustrates the decision range used by the Separation module for
separating the image signal into the Background and Foreground planes.
FIG. 14 shows a block diagram of one implementation of the FG/BG Cleanup
block included in one embodiment of the Separation module.
FIG. 15 illustrates the dilate operation used in one implementation of the
FG/BG Cleanup block included in one embodiment of the Separation module.
FIG. 16 is a graphical illustration of equations (1) through (4).
FIG. 17 is a graphical illustration of equations (6) through (9).
FIG. 18 shows an exemplary structure of the halftone estimate module.
FIG. 19 shows a min-max detection scheme used by the min-max detection
modules included in the halftone estimate module of FIG. 18.
FIG. 20 illustrates the equations that implement the halftone weight module
included in the halftone estimate module.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a method and a system for separating an
image signal into a set of image planes. The image signal represents a
digitally scanned document. The image planes are suitable for a Mixed
Raster Content (MRC) representation of the digitally scanned document.
FIG. 1 shows the general MRC representation. The representation comprises
up to four independent planes: Foreground, Background, Selector, and
Rendering Hints. In the most general case, there could be multiple
Foreground and Selector pairs at higher levels. However, in most
applications, the representation is limited to three or four planes. The
Background plane is typically used for storing continuous-tone information
such as pictures and/or smoothly varying background colors. The Selector
plane normally holds the image of text (binary) as well as other edge
information (e.g., line art drawings). The Foreground plane usually holds
the color of the corresponding text and/or line art. However, the MRC
representation only specifies the planes and their associated compression
methods. It does not otherwise restrict nor enforce the content of each of
the planes. The content of each of the planes may be defined appropriately
by an implementation of the MRC representation.
The MRC structure also allows for a fourth plane, the Rendering Hints
plane, which is used for communicating additional information about the
content of the document. For example, the Rendering Hints plane may carry
the ICC (International Color Consortium) color hints that identify the
best color matching strategy for the various objects on the page.
The Foreground and Background planes are defined to be two full-color (L,
a, b) planes. The Selector plane is defined as a binary (1-bit deep)
plane. The Rendering Hints plane is typically restricted to an 8-bit
plane. One exemplary MRC representation specifies that the Foreground and
Background are to be JPEG compressed, and that the Selector plane is to be
ITU-G4 compressed (standard Group 4 facsimile compression). The Rendering
Hints plane is considered to be optional, but if one is used, a
compression scheme similar to the Lempel-Zev-Welch scheme may be used for
its compression. In general, the Foreground, Background, Selector and
Rendering Hints planes can all be at different resolutions, and they are
not required to maintain the original source input resolution.
The method for assembling back a "segmented" MRC image from its components
(i.e., planes) is by "pouring" the Foreground colors through the Selector
plane "mask" on top of the Background plane, thus overwriting the previous
content of the Background plane at these locations. In other words, the
assembly is achieved by multiplexing between the Foreground and Background
information on a pixel by pixel basis, based on the binary control signal
of the Selector plane. For example, if the Selector value is 1, the
content of Foreground is used; otherwise (i.e., for Selector value=0) the
content of Background is used. The multiplexing operation is repeated on a
pixel by pixel basis until all of the output pixels have been defined.
The main advantage of the MRC representation of documents is to provide an
efficient way to store, transmit, and manipulate large digital color
documents. The method exploits the properties of the human vision system,
where the ability to distinguish small color variations is greatly reduced
in the presence of high-contrast edges. The edge information is normally
separated from the smoothly varying color information, and encoded
(possibly at higher resolution than 1 Selector sample per source pixel) in
the Selector plane. Following a careful separation, the various planes
could be independently compressed using standard compression schemes (such
as JPEG and G4) with good compression and high quality at the same time.
The Segmentation system of the present invention is used for splitting an
incoming image into three or more planes suitable for an MRC
representation of the image.
FIG. 2 shows a block diagram of the Segmentation system of the present
invention. Segmentation system 200 comprises a Min-Max module 202, a
Dynamic Threshold module 204 and a Separation module 206. The Min-Max
module 202 receives the image signal DSC, searches for minima and maxima
within a set of windows centered on a pixel in the image signal. The
Dynamic Threshold module 204 computes, for each of the windows, based on
the minima and maxima received from the min-max module and the current
pixel, a respective indicator representing the distance and direction of
the current pixel relative to a respective threshold plane, and outputs a
control signal based on the indicators. The Separation module 206
separates the image signal into the set of image planes in accordance with
the control signal by including a representation of the current pixel in
at least one of the image planes.
FIG. 3 shows a block diagram of one embodiment 300 of the Segmentation
system 200.
For best performance of the Segmentation system 300, the input signal DSC
should be free of most of the original mid-frequency halftone patterns of
the original scanned image. These halftone frequencies are typically
eliminated by passing the input image through a de-screen system first.
However, in some situations, such as for clean PDL (Page Description
Language) printing, the input signal may be known to be free of
problematic halftone frequencies. In such situations, the de-screen
operation is not needed and the clean input signal can be directly fed
into the Segmentation system.
For ease of explanation, in the description of the Segmentation system 300
herein, the source input image DSC, as well as the Foreground FG and
Background BG outputs, are all assumed to be full-color (L, a, b) planes,
while the Selector plane SEL output is binary (1-bit). It is understood
that these assumptions are not to be construed as limitations of the
applications of the present invention.
In general, the Foreground, Background, and Selector planes could all be at
different resolutions relative to the input image DSC. For example, the
Foreground and Background planes are typically down-sampled (for better
compression) while the Selector plane is typically up-sampled (for better
edge quality) from the original input resolution. The amount of up or down
sampling may be fully programmable under software control.
The Segmentation system 300 may also receive and use the optional estimated
frequency Halftone Weight HTW and full color Super Blur BLR_A signals when
they are available. These optional signals may be generated by a de-screen
or filtering system such as the one described in a co-pending patent
application. The optional full color Super Blur BLR_A signal may be
generated by lowpass filtering the image source signal with a filter that
has a very large filter span (i.e., very low cut-off frequency). The
optional estimated frequency Halftone Weight HTW will be described in
detail later in connection with FIG. 18, FIG. 19 and FIG. 20.
The Segmentation system 300 comprises a Min-Max module 310, a Dynamic
Threshold module 320, and a Separation module 330.
The Min-Max module 310 comprises a Dependent Min-Max block E1, a Dependent
Min-Max Sub-Sample block E2 and two Dependent Min-Max blocks E3, E4. The
Min-Max module 310 receives the input image signal DSC (3-dimensional),
computes and outputs two sets of maximum and minimum vectors (Mx, Mn),
(MX, MN), each set corresponding to a different window.
The Dynamic Threshold module 320 receives the input image signal DSC, and
the vectors (Mx, Mn), (MX, MN) from the Min-Max module 310 and computes,
for each of the windows, based on the respective minimum and maximum
received from the min-max module and the current pixel, a respective
indicator representing the distance and direction of the current pixel
relative to a respective threshold plane, and outputs based on the
indicators a control signal GRS to the Separation module 330. The optional
control signals SEG, ENH may also be outputted. The Dynamic Threshold
module 320 also receives the optional estimated frequency Halftone Weight
HTW and full color Super Blur BLR_A signals when they are available.
The Separation module 330 comprises a Selector Logic block E6, an Edge
Processing block E7, a FG/BG Separation block E8, and a FG/BG Cleanup
block E9. The Separation module 330 receives the image signal DSC, the
vectors Mx, Mn from the Min-Max module 310, the control signal GRS and the
optional control signals SEG, ENH from the Dynamic Threshold module 320,
and outputs the three signals BG, FG, SEL, which correspond to the
Background, Foreground, and Selector planes of an MRC representation of
the image DSC, respectively.
The Dependent Min-Max block E1 receives the input image signal DSC, and
searches in a 5.times.5 window centered on the current pixel of interest
for the minimum value (vector) Mn and maximum value (vector) Mx. The
vectors Mn and Mx represent the minimum and maximum in the window context
of 5.times.5 pixels. The meaning of these rectors will be described in
detail later.
The Dependent Min-Max Sub-Sample block E2 receives the input image signal
DSC, and searches for the minimum and maximum luminance value in each of
the non-overlapping 8.times.8 windows, and also provides the corresponding
chroma values at these locations. By using non-overlapping 8.times.8
windows, the Dependent Min-Max Sub-Sample block E2 effectively sub-samples
the minimum and maximum values by a factor of 8 in each direction, thus
reducing the overall bandwidth by a factor of 64. The sub-sampled outputs
are then fed to the two Dependent Min-Max blocks E3 and E4, which search
for the minimum and maximum vectors MN and MX over a 9.times.9 window
centered on the original (before sub-sampling) 8.times.8 window that
contains the current pixel of interest. Thus, the MN and MX vectors
correspond to the minimum of all the minima and the maximum of all the
maxima from the non-overlapping 8.times.8 windows, respectively. Due to
the sub-sampling (by 8) effect, the 9.times.9 window actually corresponds
to a window context of 72.times.72 pixels. It is noted that capital
letters are used for vectors MN and MX to distinguish them from the
vectors Mn and Mx (outputs of block E1) and to indicate that they
represent the minimum and maximum in the larger window context of
72.times.72 pixels overall.
The two sets of minimum and maximum vectors (Mn, Mx) and (MN, MX) are fed
to the Dynamic Threshold Module 320. The Dynamic Threshold Module 320
outputs the monochrome 8-bit signal GRS whose biased zero crossings
represent the locations of edges in the Selector planes. In addition, The
Dynamic Threshold Module may also generate the optional binary control
signal SEG and the optional 8-bit segmentation enhancement control ENH.
The optional binary control signal SEG provides an external means (similar
to that of an override switch) to control the segmentation operation of
the FG/BG Separation block E8 of Separation module 330 (see equations (14)
through (20)). The optional 8-bit segmentation enhancement control ENH
provides to the FG/BG Separation block E8 the amount of enhancement to
apply.
The Selector Logic block E6 receives the 8-bit Gray Selector signal GRS
from the Dynamic Threshold Module 320, up-samples it by doubling the
resolution, and then thresholds it at the zero crossings to produce the
binary Selector plane output SEL. For high-quality text and line-art
reproduction, the Selector plane is typically kept at twice the input
resolution (1200 dpi for a 600 dpi input), although it could be programmed
for even higher ratio (in one implementation, up to 8 times the input
resolution) under software control.
But in applications that do not require very high quality, the Selector
plane could be at the same resolution as the input signal DSC. The Edge
Processing block E7 receives the high resolution Selector output SEL and
counts the number of ON and OFF pixels in a 5.times.5 (high-resolution)
window centered on the current (low-resolution) pixel of interest. The
Edge Processing block E7 outputs the two-bit signal SEE. The SEE signal is
set to 0 if all of the input pixels inside the 5.times.5 window are OFF
(corresponding to a 5.times.5 constant Background area). Similarly, the
SEE signal is set to 3 if all of the input pixels inside the window are ON
(corresponding to a 3.times.3 constant Foreground area). The SEE output is
set to 1 or 2 if the 3.times.3 window is mostly Background (white) or
mostly Foreground (black), respectively.
The FG/BG Separation block E8 receives the full color source signal DSC to
be segmented, the full color minimum and maximum vectors Mn, Mx from the
Dependent Min-Max block E1, the SEE signal from the Edge Processing block
E7, the optional segmentation signal SEG, and the enhancement control
signal ENH from the Dynamic Threshold Module 320. The FG/BG Separation
block E8 performs the MRC segmentation to generate the Foreground and
Background information, and produces two full-color outputs Fgr and Bgr as
the rough estimates of the Foreground and Background planes, respectively.
The FG/BG Cleanup block E9 applies additional processing on the rough
Foreground and Background estimates Fgr and Bgr to generate the final
Foreground and Background outputs FG and BG. This additional processing is
to slightly extend the Foreground and Background values beyond the edges
and to fill in the undefined pixels in the Foreground and Background
planes with appropriate values. The purpose of this processing is to
prevent artifacts that may result from a subsequent sampling and JPEG
compression and to fill in the yet-undefined pixels with values that will
result in good JPEG compression ratio.
An additional logic inside the FG/BG Cleanup block E9 (see Tile Tag block
F7 of FIG. 14) also monitors the Foreground and Background output values
to detect and flag tiles that are almost all-black or all-white. Rather
than encode the output from such tiles into the output file, a special
tile marker is used and referenced whenever such a tile is detected. This
increases the overall compression ratio by eliminating the need to
repeatedly encode the common all-white or all-black tiles.
The blocks included in the Min-Max module 310 will be discussed in detail
in the following.
The Dependent Min-Max block E1 looks for the maximum and minimum values of
the luminance component L in a 5.times.5 window centered on the current
pixel of interest, and outputs the full-color (luminance and chrominance)
values at these locations. It is called a Dependent Min-Max to indicate
that it only searches for the minimum and maximum over a single component,
which is the luminance L, and not over all three components of the image
signal DSC. Once the locations of the minimum and maximum luminance are
found, the chroma components (a, b) at these locations are also outputted.
The Dependent Min-Max block E1 outputs two vectors of full-color (L, a, b)
signals Mn=(L.sub.Mn, a.sub.Mn, b.sub.Mn), and Mx=(L.sub.Mx, a.sub.Mx,
b.sub.Mx), corresponding to the minimum and maximum values in the
5.times.5 window, respectively. The outputs Mn and Mx are at the same
pixel rate as the input signal DSC.
FIG. 4 illustrates the operation of the Dependent Min-Max block E1. The
content of the DSC luminance data is first searched in a 5.times.5
luminance window centered on the current pixel of interest to find the
locations of the smallest and largest L values. If the minimum or maximum
L values are not unique (that is, if there is more than one location
having the same minimum or maximum value), the location of the one first
encountered is used. The output of this search process is a unique pair
(L.sub.Mn, L.sub.Mx) of the minimum and maximum L values as well as their
relative location within the 5.times.5 window.
The Dependent Min-Max block E1 then uses the relative location information
to index the corresponding chroma (a, b) components in the two
corresponding 5.times.5 chroma windows and retrieve the chroma values at
these locations. Thus, the relative location of the maximum L value
L.sub.Mx is used to address the 5.times.5 chroma windows and retrieve the
chroma pair (a.sub.Mx, b.sub.Mx) at this location. Together, the triplet
(L.sub.Mx, a.sub.Mx, b.sub.Mx) forms the output Mx from the Dependent
Min-Max block E1. Similarly, the relative location of the minimum L value
L.sub.Mn is used to address the 5.times.5 chroma windows and retrieve the
chroma pair (a.sub.Mn, b.sub.Mn) at this location. The triplet (L.sub.Mn,
a.sub.Mn, b.sub.Mn) forms the output Mn from the Dependent Min-Max block
E1.
The implementation of the Dependent Min-Max block E1 can be greatly
accelerated by taking advantage of the sequential nature of the operation
and the type of operations (min-max) that is being performed. For example,
as the operation is advanced to the subsequent pixel, the extreme values
(i.e., maximum and minimum) and corresponding locations for the previous
pixel are already known. Since the current 5.times.5 window greatly
overlaps the previous window, by keeping track of the previous window
content, the Dependent Min-Max block E1 has to sort out only the newest
and oldest 5.times.1 columns of L values on either side of the previous
window. The center 3.times.5 area is common to both previous window and
current window, and the new address locations of the previous minimum and
maximum values in the previous window are at an offset of 1 in the fast
scan direction relative to their previous locations. The previous minimum
and maximum are compared to values in the newest column of L values to
yield the new maximum and minimum L values.
FIG. 5 illustrates the Dependent Min-Max Sub-Sample block E2. Block E2
receives the full-color (L, a, b) input signal DSC and produces two
full-color sub-sampled minimum and maximum outputs 502 and 504. Block E2
searches for the minimum and maximum luminance values over non-overlapping
8.times.8 windows. The locations of the minimum and maximum luminance
values are then used to index the chroma windows and retrieve the
corresponding chroma values at these locations.
By using non-overlapping 8.times.8 windows, the operation of the Dependent
Min-Max Sub-Sample block E2 is effectively sub-sampling the min and max
outputs (that would have been produced had a sliding window been used
instead of non-overlapping windows) by a factor of 8 in each direction,
thereby reducing the overall output data rate by a factor of 64.
The minimum output 504 corresponds to the triplet (L.sub.MIN, a.sub.MIN,
b.sub.MIN) formed by the minimum luminance value L.sub.MIN of the input
signal DSC within the 8.times.8 window containing the current pixel of
interest, and the corresponding chroma (a, b) values (a.sub.MIN,
b.sub.MIN) at this minimum luminance location. Similarly, the maximum
output 502 corresponds to the triplet (L.sub.MAX, a.sub.MAX, b.sub.max)
formed by the maximum luminance value L.sub.MAX of the input signal DSC
within the 8.times.8 window containing the current pixel of interest, and
the corresponding chroma (a, b) values (a.sub.MAX, b.sub.MAX) at this
maximum luminance location. If the minimum or maximum luminance values are
not unique (i.e., if there is more than one location with the same maximum
or minimum values), the one first encountered is used.
The sub-sampling operation is achieved by advancing the current pixel
position by 8 in the fast scan direction (and also, upon reaching the end
of a line, by 8 lines in the slow scan direction) to maintain the
non-overlapping windows condition.
The 8-times (abbreviated as 8.times.) reduction factor (in each dimension)
of the Dependent Min-Max Sub-Sample E2 block is designed in accordance
with the amount of sub-sampling desired for the Foreground and Background
planes (normally a sub-sampling factor of 2). For higher output image
quality (as is the case with a clean PDL input image, for example), it may
be desirable to not sub-sample the Foreground and Background outputs at
all. In such a case, a smaller amount of sub-sampling factor (e.g., only
4.times.) is to be applied instead of the 8.times. factor above. If a
sub-sampling factor of 4 (for each direction) is to be applied, 4.times.4
non-overlapping windows are used.
The Dependent Min-Max Sub-Sample block E2 is used in conjunction with the
two Dependent Min and Max Units E3 and E4 to produce a Min-Max analysis
similar to that of the Dependent Min-Max block E1, but covering a much
larger area context (72.times.72 pixels as compared to 5.times.5 pixels)
and at a coarser resolution to reduce the overall bandwidth.
FIG. 6 illustrates the functions of the Dependent Max block E3 and
Dependent Min block E4 as used in one embodiment of the system of the
present invention.
The Dependent Max block E3 receives the full-color dependent maximum output
502 from the Dependent Min-Max Sub-Sample block E2, searches the content
of the luminance data in the signal 502 in a 9.times.9 luminance window
centered on the current pixel of interest to find the location of the
maximum L value. If the maximum L value is not unique (that is, if there
is more than one location having the same maximum value), the location of
the one first encountered is used. The output of this search process is
the maximum value L.sub.MX as well as its relative location within the
9.times.9 window.
The Dependent Max block E3 then uses the relative location information of
L.sub.MX to index the corresponding chroma (a, b) components in the two
corresponding 9.times.9 chroma windows and retrieve the chroma values at
this location. Thus, the relative location of the maximum L value L.sub.MX
is used to address the 9.times.9 chroma windows and retrieve the chroma
pair (a.sub.MX, b.sub.MX) at this location (as illustrated in FIG. 6). The
triplet (L.sub.MX, a.sub.MX, b.sub.MX) forms the output MX of the
Dependent Max block E3.
The Dependent Min block E4 receives the full-color dependent minimum output
504 from the Dependent Min-Max Sub-Sample block E2, searches the content
of the luminance data in the signal 504 in a 9.times.9 luminance window
centered on the current pixel of interest to find the location of the
minimum L value. If the minimum L value is not unique (that is, if there
is more than one location having the same minimum value), the location of
the first one encountered is used. The output of this search process is
the minimum value L.sub.MN as well as its relative location within the
9.times.9 window.
The Dependent Min block E4 then uses the relative location information of
L.sub.MN to index the corresponding chroma (a, b) components in the two
corresponding 9.times.9 chroma windows and retrieve the chroma values at
this location. Thus, the relative location of the minimum L value L.sub.MN
is used to address the 9.times.9 chroma windows and retrieve the chroma
pair (a.sub.MN, b.sub.MN) at this location (as illustrated in FIG. 6). The
triplet (L.sub.MN, a.sub.MN, b.sub.MN) forms the output MN of the
Dependent Min block E4.
By applying the Dependent Min block E4 on the dependent minimum output 504
of the Dependent Min-Max Sub-Sample block E2, the dependent minimum
operation is effectively extended over a larger area to provide a
dependent minimum analysis (the MN is minimum of minima received from
block E2). Similarly, the Dependent Max block effectively provides a
dependent maximum analysis over the extended area (the MX is maximum of
maxima received from block E2). Since both inputs 502 and 504 are already
sub-sampled by a factor of 8 in each direction (as compared to the
original pixel resolution of input image DSC), the equivalent window area
for each of the dependent minimum MN and maximum MX is 72.times.72 pixels
at the original pixel resolution.
The Dynamic Threshold module 320 applies adaptive thresholding to the
incoming source signal DSC to generate a monochrome 8-bit gray signal GRS
output, whose zero crossings represent the edges in the Selector plane.
The Dynamic Threshold module 320 utilizes the two sets of min/max values
(Mn, Mx) and (MN, MX) from the 5.times.5 fine- and 9.times.9 coarse
resolution windows. and may also receive the Halftone Weight estimate HTW
and the Super Blur BLR_A signals, when they are available. The Dynamic
Threshold module 320 produces the gray selector signal GRS, the binary
segmentation signal SEG and the 8-bit signal ENH, which is used to
communicate the amount of segmentation enhancement to apply in the FG/BG
Separation block E8.
FIG. 7 illustrates the three available choices of context area: the Single
Pixel area which is the area of the current pixel of interest, the
5.times.5 High-Resolution window W1, and the 9.times.9 Coarse Resolution
window W2. Recall that the 9.times.9 window context W2 corresponds to a
window of 72.times.72 pixels sub-sampled by 8 in each direction. Each
square (pixel) in the 9.times.9 coarse resolution window W2 represents an
extremum in a window of 8.times.8 original pixels (i.e., pixels at the
original pixel resolution). The Dynamic Threshold module 320 uses these
three predefined context areas in the process of determining the gray
selector signal GRS.
The Single Pixel (current pixel) area is used when no contrast activity
(described below) exists in both the 5.times.5 window W1 and 9.times.9
window W2, in which case the luminance of the incoming signal DSC is
merely thresholded and the chroma (a, b) components are not used.
Otherwise, the 5.times.5 High-Resolution and 9.times.9 Coarse Resolution
areas are used in combination to track and segment the incoming signal DSC
based on the level of activity in the windows. Activity in the 5.times.5
window indicates the presence of an image edge in that window. Activity in
the 9.times.9 window indicates that an edge is either approaching the
small window or leaving the small window. Thus, the large 9.times.9 window
serves as a look ahead feature. It also provides the history of where an
edge has been. This allows proper setting of the SEE signal (to be
described later). The large 9.times.9 window could be replaced by other
embodiment that serves the same purposes. The operation of tracking and
segmenting the incoming signal DSC based on the level of activity in the
windows will be described below.
FIG. 8 shows a block diagram of an embodiment 800 of the Dynamic Threshold
Module 320. The embodiment 800 comprises three logic blocks 810, 820, 830,
and a decision module 840.
The three logic blocks 810, 820, 830 correspond to the three possible
context windows shown in FIG. 7, i.e., the Single Pixel area, the
5.times.5 High-Resolution window W1, and the 9.times.9 Coarse Resolution
window W2, respectively.
The multiplexer MUX can select and pass one of these outputs as the final
GRS output signal. The selection can be switched on a pixel-by-pixel basis
based on the 2-bit signal SEL. The actual selection code for each of the
inputs is shown in FIG. 8 to the right of the input arrows.
For the case of a Single Pixel context, the luminance component of the
incoming input signal DSC is merely biased by subtracting from it a
pre-determined 8-bit constant THR, using the adder 815. The value of THR
is stored in a programmable register so that it could be adjusted to
accommodate the sensor calibration. For an ideal balanced incoming signal
DSC that spans the full 8-bit luminance range, THR would be normally set
to THR=128 in order to bias the luminance of DSC such that the output
signal GRS will have zero mean and the incoming signal will be thresholded
halfway across. However, the visual threshold may well be skewed away from
the center due to the logarithmic response of the human visual system.
In addition, the scanner response may vary across the dynamic range, or may
not even span the full 8-bit range. For example, the peak luminance value
is determined by the brightest media reflectance, and the dark current of
the sensor determines the output at low light levels. The value of the
threshold register THR can be appropriately adjusted to account for the
above considerations and better match the desired GRS response. In any
case, only the luminance component of the incoming signal DSC is used for
this biasing.
The logic block 820 is used to address the 9.times.9 coarse resolution
window context W2 shown in FIG. 7. The inputs to the logic block 820 are
the full-color coarse minimum value MN and maximum value MX from the
Dependent Max and Min blocks E3 and E4, respectively. Recall that these
values were generated by sub-sampling the outputs of the Dependent Min-Max
block E1 by a factor of 8 in both directions and then searching for the
minimum and maximum (i.e., minimum of minima and maximum of maxima) over a
9.times.9 window. The operation of the logic block 820 is equivalent to
performing the scaled dot product of the following two vectors X and Y:
output 828=<X, Y>; (1)
where <X, Y> is the scaled dot product of the two vectors X and Y:
<X, Y>=(X.sub.L, X.sub.a, X.sub.b)(Y.sub.L, Y.sub.a, Y.sub.b).sup.t
=X.sub.L Y.sub.L +X.sub.a Y.sub.a +X.sub.b Y.sub.b ; (2)
where
##EQU1##
The (L, a, b) values in equation (4) are the corresponding color components
of the incoming signal DSC. The X vector in equation (3) is the vector
difference between the maximum value MX and the minimum value MN. The Y
vector in equation (4) is the incoming signal DSC minus the average of the
minimum MN and maximum MX values, the average being the 3D midpoint
between MN and MX. By taking the scaled dot product of these two vectors,
the output is proportional to the relative distance from the plane that is
perpendicular to the X vector and crosses it halfway along. Since the
sought-after information is the location of the zero-crossing, the precise
magnitude of the dot product is not required. Therefore, the result is
divided by an arbitrary factor of 256 (shift right by 8) to scale it back
to fit the 8-bit range.
However, since the logic block 820 output (to multiplexer 848) may still
occasionally overflow the 8-bit range (by a factor of roughly 3, or 1.5
bits), additional logic may be used to limit the logic block 820 output to
255 if it gets larger than 255.
A scalar measure for the overall contrast magnitude X9 within the coarse
resolution 9.times.9 window is generated by adding together the absolute
values of the three components of the vector X within the summation block
829:
X.sub.9 =L.sub.X
+.vertline.a.sub.X.vertline.+.vertline.b.sub.X.vertline.=L.sub.MX
-L.sub.MN +.vertline.a.sub.MX -a.sub.MN.vertline.+.vertline.b.sub.MX
-b.sub.MN.vertline.; (5)
Referring to equation (5), there is no need to take the absolute value of
the luminance component L since L is confined to the positive range [0 . .
. 255]. The implementation of equations (1) through (5) for the logic
block 820 is straight-forward. Referring to logic block 820 in FIG. 8, the
first two adders 821, 823 perform the vector sum and difference of the
3.times.1 input signals MX, MN, on a component by component basis. The
adder 821 that handles the sum also divides the result by 2 (by shifting
it right by 1 position) to obtain the average as indicated by the symbol
.SIGMA./2. Adder 823 outputs the vector difference X (defined in equation
(3)) to block 829. Block 829 computes the sum of absolute values of the
three components of the vector X and generates the contrast magnitude
X.sub.9. Adder 825 calculates the vector Y in equation (4) by performing
the vector difference between the input signal DSC and the output from
adder 821. The X and Y vector components are then multiply-and-added
together, element by element, to form the dot product in the dot product
block 827. The output 828 of block 827 is described by equations (1) and
(2).
FIG. 16 is a graphical illustration of equations (1) through (4). In FIG.
16, the origin of the three-dimensional space is assumed to be on the left
hand side, as shown. The vectors MX, MN represent the three-dimensional
points MX and MN, respectively. The current image pixel is represented by
the vector DSC. As illustrated, the vector X=(MX-MN) and the vector Y
result from vector operations in accordance with equations (3) and (4).
The value d.sub.1 represents the result of taking the dot product of X and
Y. This value is the projection of Y onto X. This value also indicates the
distance and "direction" of the point represented by the vector Y with
respect to the plane P.sub.1. The plane P.sub.1 is orthogonal to the
vector X=MX-MN at the midpoint of X. By "direction" of the point
represented by vector Y, it is meant whether this point is above or below
the plane P.sub.1. The plane P.sub.1 represents the threshold plane. The
indicator value d.sub.1 indicates whether, after thresholding, the current
image pixel DSC is above or below the threshold plane, that is, whether it
is closer to MX or to MN, and by how much. This indicator value d.sub.1
allows a decision to be made regarding segmentation of the current pixel.
For example, if the thresholded pixel is very close to MX (respectively,
MN), a decision can be made that the current pixel be included in the
Foreground plane (respectively, Background plane). If the thresholded
pixel is too close to the threshold plane, a decision can be made that the
current pixel be included in both the Foreground and Background planes.
Referring to FIG. 8, the logic block 830 is used to address the 5.times.5
high-resolution window context W1 shown in FIG. 7. The inputs to the logic
block 830 are the full-color minimum and maximum values Mn, Mx from the
Dependent Min-Max module E1. The operation of the logic block 830, in
forming a scaled dot product, is similar to the logic block 820 described
above.
The operation of the logic block 830 is equivalent to performing the scaled
dot product of the following two vectors:
output 838=<X', Y'>; (6)
where <X', Y'> is the scaled dot product between the two vectors X'
and Y':
<X', Y'>=(X.sub.L ', X.sub.a ', X.sub.b ')(Y.sub.L ', Y.sub.a ',
Y.sub.b ').sup.t =X.sub.L 'Y.sub.L '+X.sub.a 'Y.sub.a '+X.sub.b 'Y.sub.b
'; (7)
where
##EQU2##
The (L, a, b) values in equation (9) are the corresponding color components
of the incoming signal DSC. The X' vector in equation (8) is the vector
difference between the maximum vector Mx and the minimum vector Mn. The Y'
vector in equation (9) is the incoming signal DSC minus the average of the
minimum Mn and maximum Mx values, the average being the 3D midpoint
between Mn and Mx. By taking the scaled dot product of these two vectors,
the output is proportional to the relative distance from the plane that is
perpendicular to the X' vector and crosses it halfway along. Since the
sought-after information is the location of the zero-crossing, the precise
magnitude of the dot product is not required. Therefore, the result is
divided by an arbitrary factor of 256 (shift right by 8) to scale it back
to fit the 8-bit range.
However, since the logic block 830 output (to multiplexer 848) may still
occasionally overflow the 8-bit range (by a factor of roughly 3, or 1.5
bits), additional logic may be used to limit the logic block 830 output to
255 if it gets larger than 255.
A scalar measure for the overall contrast magnitude X5 within the fine
resolution 5.times.5 window W1 (FIG. 7) is generated by adding together
the absolute values of the three components of the vector X' within the
summation block 839:
X.sub.5 =L.sub.X'
+.vertline.a.sub.X'.vertline.+.vertline.b.sub.X'.vertline.=L.sub.Mx
-L.sub.Mn +.vertline.a.sub.Mx -a.sub.Mn.vertline.+.vertline.b.sub.Mx
-b.sub.Mn.vertline.; (10)
Referring to equation (10), there is no need to take the absolute value of
the luminance component L since L is confined to the positive range [0 . .
. 255]. The implementation of equations (6) through (10) for the logic
block 830 is straight-forward. Referring to logic block 830 in FIG. 8, the
first two adders 831, 833 perform the vector sum and difference of the
3.times.1 input signals Mx, Mn, on a component by component basis. The
adder 831 that handles the sum also divides the result by 2 (by shifting
it right by 1 position) to obtain the average as indicated by the symbol
.SIGMA./2. Adder 833 outputs the vector difference X' (defined in equation
(8)) to block 839. Block 839 computes the sum of absolute values of the
three components of the vector X' and generates the contrast magnitude
X.sub.5. Adder 834 adds the vector signal BLR_A to the vector output of
adder 831 and divides the result by 2. Adder 835 calculates the vector Y'
in equation (9) by performing the vector difference between the input
signal DSC and the output from adder 834. The X' and Y' vector components
are then multiply-and-added together, element by element, to form the dot
product in the dot product block 837. The output of block 837 is described
by equations (6) and (7).
It is important to note that the architecture of logic block 830 differs
from that of logic block 820 by having the added threshold-biasing feature
that enhances dark or light thin lines by "nudging" the threshold towards
the Super-Blur reference signal BLR_A=(L.sub.A, a.sub.A, b.sub.A) when
BLR_A is available. This is accomplished by averaging the Super Blur
signal BLR_A with the averaged Mx and Mn values, to form the alternative
Y' vector, as shown in equation (9).
FIG. 17 is a graphical illustration of equations (6) through (9). In FIG.
17, the origin of the three-dimensional space is assumed to be on the left
hand side, as shown. The vectors Mx, Mn represent the three-dimensional
points Mx and Mn, respectively. The current image pixel is represented by
the vector DSC. The Super-Blur reference signal BLR_A is represented by
the vector BLR_A. As illustrated, the vector X'=(Mx-Mn) and the vector Y'
result from vector operations in accordance with equations (8) and (9).
The value d.sub.2 represents the result of taking the dot product of X'
and Y'. This value is the projection of Y' onto X'. This value also
indicates the distance and "direction" of the point represented by the
vector Y' with respect to the plane P.sub.2. The plane P.sub.2 is
orthogonal to-the vector X=MX-MN at a point away from the midpoint by a
small amount. This amount represents the added threshold-biasing feature
discussed in the preceding paragraph. By "direction" of the point
represented by vector Y', it is meant whether this point is above or below
the plane P.sub.2. The plane P.sub.2 represents the threshold plane. The
indicator value d.sub.2 indicates whether, after thresholding, the current
image pixel DSC is above or below the threshold plane, that is, whether it
is closer to Mx or to Mn, and by how much. This indicator value d.sub.2
allows a decision to be made regarding the segmentation of the current
pixel. For example, if the thresholded pixel is very close to Mx
(respectively, Mn), a decision can be made that the current pixel be
included in the Foreground plane (respectively, Background plane). If the
thresholded pixel is too close to the threshold plane, a decision can be
made that the current pixel be included in both the Foreground and
Background planes.
Referring to FIG. 8, the decision module 840 receives the output 818 from
logic block 810, output 828 and contrast magnitude output X.sub.9 from
logic block 820, output 838 and contrast magnitude output X.sub.5 from
output block 830.
The decision module 840 comprises a comparator logic 846, a multiplexer
848, an enhancement coding block 850, and a comparator 852. The decision
module 840 also includes two parameterized piecewise linear function
blocks 842 and 844 to process the halftone weight signal HTW when it is
available from a de-screener system.
The comparator logic 846 receives the contrast magnitude outputs X.sub.5
and X.sub.9, outputs the select signal SEL to control the output GRS of
multiplexer 848, and outputs the enable signal ENA to control the
enhancement signal ENH of the enhancement logic 850. The comparator logic
846 may also use the 8-bit Halftone Weight frequency estimate HTW, when
available, from a de-screener system, after the estimate HTW has passed
through a parameterized piecewise linear function block 842.
Note that, by definition of the min-max operations described previously,
the contrast magnitude of the larger 9.times.9 (sub-sampled) window W2
must be equal or larger than the contrast magnitude of the smaller
5.times.5 high-resolution window W1. In other words:
X.sub.9.gtoreq.X.sub.5 ; (11)
This is due to the fact that, for a larger window that includes a smaller
one, the maximum can only be larger and the minimum smaller than those of
the smaller window. Furthermore, as the segmentation process proceeds from
one pixel to the next (in the fast scan direction), the X.sub.9 contrast
value remains the same for 8 consecutive pixels until the next pixel
crosses the 8.times.8 window boundaries into the next non-overlapping
window. The X.sub.5 contrast value, on the other hand, may change on a
pixel by pixel basis. This behavior is due to the 8.times. sub-sampling
performed by the Dependent Min-Max Sub-Sample block E2.
FIG. 9 shows the block diagram of an embodiment of the comparator logic
846. The two contrast magnitude measures X.sub.5 and X.sub.9 are compared
to the signal STH, via comparators 904, 902, respectively, to generate the
selection bits SEL0 and SEL1, respectively. The bits SEL0 and SEL1 form
the 2-bit select signal SELECT. If the halftone weight HTW is available,
HTW is passed through the piecewise linear function block 842 to produce
STH. Otherwise, STH is set to a predetermined value. The two bits SEL0 and
SEL1 are then combined together by AND gate 906 to generate the 1-bit
enhancement enable signal ENA.
FIG. 10 shows the equivalent Truth Table for the comparator logic. If the
contrast measure X.sub.9 of the larger 9.times.9 (sub-sampled) window W2
is smaller than STH, then, regardless of the contrast measure X.sub.5 of
the smaller 5.times.5 window W1, the SEL1 bit is cleared and the SELECT
signal is either 0 or 1. This causes the multiplexer 848 to select the
Single Pixel context output 818 (FIG. 8). If, however, there is some
activity in the larger 9.times.9 window W2 but not within the smaller
5.times.5 window W1, the SELECT signal is set to equal 2 (binary "10").
This causes the multiplexer 848 to select in the logic block 820 output
828. If both windows show significant contrast magnitude, the SELECT
signal is set to 3, resulting in the output 838 of logic block 830
(corresponding to the 5.times.5 high-resolution window) being selected by
the multiplexer 848. In addition, when the SELECT signal is 3, the binary
enable signal ENA is turned on. The signal ENA is used to enable the
enhancement block 850 to output the segmentation enhancement signal ENH.
Referring to FIG. 8, the enhancement coding block 850 also uses a linear
function of the Halftone Weight frequency estimate HTW to produce the
signal ENH which controls the amount of segmentation enhancement to be
applied in the FG/BG Separation block E8 (FIG. 3). The HTW signal is fed
to the parameterized piecewise linear function block 844 which applies a
piecewise linear function EEN to the signal HTW, and outputs the resulting
signal to the enhancement coding block 850. The binary enhancement enable
signal ENA from the comparator logic 846 is used for gating (i.e.,
enabling) the enhancement signal ENH as follows. If ENA=1, then the block
844 output signal is passed through to the output ENH; otherwise, all of
the ENH bits are forced to zero (disabled). The 8-bit ENH output signal
communicates the amount of segmentation enhancement