Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Solve Your Financial Problems Sell Your House Fast For Cash
Category:
 

The Quick and Easy Way to Tone Up Your Abs Part 2
Category:
Health / Fitness  

Xocai Online Marketer s Review
Category:
 

LG PRADA Touch sensitive mobile world
Category:
 

Debt Management UK Time Tested Formula For Freedom From Debts
Category:
Finance / Investment  

Oral Thrush – Causes Symptoms and Treatment
Category:
 

The Lowdown on Mobile Blogging Basics
Category:
Writing  

4 Areas Where Your Business is Losing Money
Category:
Business  

How To Drive Traffic To Your Website Using Funny Videos
Category:
Webmaster  

Vietnam Ho Chi Minh City
Category:
Travel  

Tax issue s when selling a franchise
Category:
Business  

Are Digestive Disorders Troubling You
Category:
Health / Fitness  

Payday Loans – The Loans of Great Help
Category:
 

MSRA Infection Cdiff Super bug Hospitals to Be Renamed Slaughter...
Category:
 

Jordon If you buy more than 20pairs 30 pairs
Category:
Fashion  

How To Use Masques
Category:
 

Monthly loan no documents has been pioneer for the freelancers w...
Category:
 

Wireless Camera Systems Big Brother is Watching
Category:
 

What Makes Up A Successful Affiliate Marketing Program
Category:
 

Why You Need A Wall Bed
Category:
 

Why Networking Could Be Your Biggest Lead Generating Tool
Category:
 

There are Many Options for Plumber Training
Category:
 

The Business of Opium in Afghanistan Drug Addiction
Category:
 

Aluminum Utility Trailer Basics And Some Points Of Concern
Category:
 

Acid Reflux Homeopathic Remedy
Category:
 

Is It Normal to Have Different Sized Feet
Category:
 

Tips on Keeping a Food Diary
Category:
 

Affiliate Resources
Category:
 

Payday loans Best solution for emergency needs
Category:
 

Classifying Asanas Levels Methods Mastering Them Part II
Category:

Object detection Number:7,522,772 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

Google
 

Top Breaking News
     Media Rights Groups Call for Probe Into Shooting of VOA Reporter in Puntland by Alisha Ryu
     US Begins Talks on Iran Nuclear Proposal with International Partners by VOA News
     Climate Change, Political Experts Say Obama Made Progress on China Trip by Stephanie Ho

Title: Object detection

Abstract: Image object detection apparatus in which test regions of a test image are compared with an image property model, a mask defining a subset of pixel positions within a test region, comprises means for comparing pixel properties in the test image defined by the test regions with the image property model to detect a property difference between the image property model and a test region; so that pixel property differences within the mask and pixel property differences outside the mask are combined with opposite respective polarities to form a difference value in respect of that test region, an object being detected in the test image at a test region corresponding to a lowest difference value between the image property model and pixels defined by the test region.

Patent Number: 7,522,772 Issued on 04/21/2009 to Porter,   et al.


Inventors: Porter; Robert Mark Stefan (Winchester, GB), Living; Jonathan (Nr. Stourbridge, GB), Haynes; Simon Dominic (Basingstoke, GB)
Assignee: Sony United Kingdom Limited (Weybridge, GB)
Appl. No.: 11/007,110
Filed: December 8, 2004


Foreign Application Priority Data

Dec 11, 2003 [GB] 0328741.4

Current U.S. Class: 382/218 ; 382/103; 382/283
Current International Class: G06K 9/68 (20060101)
Field of Search: 382/103,118,165,170,209,218,282,288,283 340/5.53 358/538 365/201,189.011 549/392 348/821,739,805,818 166/250.08,321,334.2,332.1 220/2.1A


References Cited [Referenced By]

U.S. Patent Documents
5710842 January 1998 Lee
6497997 December 2002 Simons
6521384 February 2003 Szajewski
6534226 March 2003 Owczarczyk et al.
7336830 February 2008 Porter et al.
2003/0128298 July 2003 Moon et al.
Foreign Patent Documents
0 682 325 Mar., 1997 EP
0 877 274 Nov., 1999 EP
0 961 225 Dec., 1999 EP
1 353 516 Oct., 2003 EP

Other References

Schneiderman H et al: "A histogram-based method for detection of faces and cars" Proc. IEEE Conf. on Image Processing. ICIP 2000, vol. 3, Sep. 10, 2000, pp. 504-507, XP010529514. cited by other .
Schneiderman H et al: "A statistical method for 3D object detection applied to faces and cars" Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (CAT No. PR00662) IEEE Comput. Soc Los Alamitos, CA, USA, vol. 1, Sep. 2000, pp. 746-751 vol. 1, XP002313459 ISBN: 0-7695-0662-3. cited by other .
Sobottka K et al: "Segmentation and tracking of faces in color images" Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on Killington, VT, USA Oct. 14-16, 1996, Los Alamitos, CA, USA,IEEE Comput. Soc, US, Oct. 14, 1996, pp. 236-241, XP010200426 ISBN: 0-8186-7713-9. cited by other .
Zhong Y et al: "Object Tracking Using Deformable Templates" IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Inc. New York, US, vol. 22, No. 5, May 2000, pp. 544-549, XP000936705 ISSN: 0162-8828. cited by other .
Storring M E Al: "Constraining a statistical skin colour model to adapt to illumination changes" 7.sup.TH German Workshop on Colour Image Processing, Oct. 2001, pp. 47-58, XP002315562 Erlangen, Deutschland. cited by other .
Henry Schneiderman et al., "A Statistical Method for 3D Object Detection Applied to Faces and Cars", IEEE Conference on Computer Vision and Pattern Detection, 2000. cited by other .
Henry Schneiderman et al.. "Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition", IEEE Conference on Computer Vision and Pattern Detection, 1998. cited by other .
Henry Schneiderman, "A Statistical Approach to 3D Object Detection Applied to Faces and Cars", phD thesis, Robotics Institute, Carnegie Mellon University, pp. 1-100, May 10, 2000. cited by other .
Erik Hjelmas et al., "Face Detection: A Survey", Computer Vision and Image Understanding, No. 83, pp. 236-274, 2001. cited by other .
Ming-Hsuan Yang et al, "Detecting Faces in Images: A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, No. 1, pp. 34-58, Jan. 2002. cited by other.

Primary Examiner: Chawan; Sheela C
Attorney, Agent or Firm: Oblon, Spivak, McClelland, Maier & Neustadt, P.C.

Claims



We claim:

1. Image object detection apparatus in which test regions of a test image are compared with an image property model, a mask defining a subset of pixel positions within a test region; said apparatus comprising: a comparator to compare pixel properties in said test image defined by said test regions with said image property model to detect a property difference between said image property model and a test region; so that pixel property differences within said mask and pixel property differences outside said mask are combined with opposite respective polarities to form a difference value in respect of that test region, an object being detected in said test image at a test region corresponding to a lowest difference value between said image property model and pixels defined by that test region.

2. Apparatus according to claim 1, in which said test image is from a video sequence.

3. Apparatus according to claim 2, said apparatus comprising logic to derive said image property model in dependence upon image properties of a region detected to contain an object in at least one previous image in said video sequence.

4. Apparatus according to claim 2, comprising logic to derive said image property model in dependence upon image properties of all regions detected to contain an object in at least one previous image in said video sequence.

5. Apparatus according to claim 2, in which said image property model is a predetermined Gaussian model.

6. Apparatus according to claim 2, comprising an object position predictor for predicting an object position in a next image in a test order of said video sequence on the basis of a detected object position in one or more previous images in said test order of said video sequence; in which: if said apparatus detects an object within a threshold image distance of said predicted object position and such that the difference between said test region properties and said image property model at that position is less than a threshold difference, said object position predictor uses said detected position to produce a next position prediction.

7. Apparatus according to claim 6, comprising logic to derive said image property model in dependence upon an object at an image position not predicted by said object position predictor.

8. Apparatus according to claim 2, comprising a generator to generate said mask in dependence upon at least a proportion of pixels in a region detected to contain an object in the preceding image which most closely match said image property model derived in respect of that region.

9. Apparatus according to claim 8, in which said generator is operable to generate a mask as a weighted combination of a previous mask and pixels in a region detected to contain an object in said preceding image.

10. Apparatus according to claim 1, in which said image property model is a colour model.

11. Apparatus according to claim 10, in which said colour model represents a colour distribution in at least a part of at least one image of said video sequence.

12. Apparatus according to claim 1, comprising a quantiser to quantise pixel property differences within said mask and pixel differences outside said mask.

13. Apparatus according to claim 12, in which said pixel property differences are quantised to a single-valued positive difference, a zero difference and a single-valued negative difference.

14. Apparatus according to claim 1, in which said objects are faces.

15. Video conferencing apparatus comprising apparatus according to claim 1.

16. Surveillance apparatus comprising apparatus according to claim 1.

17. A camera arrangement comprising apparatus according to claim 1.

18. An object detection method in which test regions of a test image are compared with an image property model, a mask defining a subset of pixel positions within a test region, said method comprising: using a processor to perform the step of comparing pixel properties in said test image defined by said test regions with said image property model to detect a property difference between said image property model and a test region so that pixel property differences within said mask and pixel property differences outside said mask are combined with opposite respective polarities to form a difference value in respect of that test region; and detecting an object in said test image at a test region corresponding to a lowest difference value between said image property model and pixels defined by said test region.

19. A computer readable storage medium encoded with instructions, which when executed by a processor, causes the computer to perform a method according to claim 18.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to object detection.

2. Description of the Prior Art

The following description relates to a problem present in the detection of various types of objects, but will be discussed with respect to face detection for clarity of the description.

Many human-face detection algorithms have been proposed in the literature, including the use of so-called eigenfaces, face template matching, deformable template matching or neural network classification. None of these is perfect, and each generally has associated advantages and disadvantages. None gives an absolutely reliable indication that an image contains a face; on the contrary, they are all based upon a probabilistic assessment, based on a mathematical analysis of the image, of whether the image has at least a certain likelihood of containing a face. Depending on their application, the algorithms generally have the threshold likelihood value set quite high, to try to avoid false detections of faces.

In any sort of block-based analysis of a possible face, or an analysis involving a comparison between the possible face and some pre-derived data indicative of the presence of a face, there is a possibility that the algorithm will be confused by an image region which, while possibly looking nothing like a face, may possess certain image attributes to pass the comparison test. Such a region may then be assigned a high probability of containing a face, and can lead to a false-positive face detection.

It is a constant aim in this technical field to improve the reliability of object detection, including reducing the occurrence of false-positive detections.

SUMMARY OF THE INVENTION

This invention provides a image object detection apparatus in which test regions of a test image are compared with an image property model, a mask defining a subset of pixel positions within a test region; the apparatus comprising:

means for comparing pixel properties in the test image defined by the test regions with the image property model to detect a property difference between the image property model and a test region;

so that pixel property differences within the mask and pixel property differences outside the mask are combined with opposite respective polarities to form a difference value in respect of that test region, an object being detected in the test image at a test region corresponding to a lowest magnitude difference value between the image property model and pixels defined by the test region.

It will be appreciated that the term "previous image", "preceding image" and the like refer to an order of testing of the images, not necessarily to a forward temporal order of a video sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and advantages of the invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a general purpose computer system for use as a face detection system and/or a non-linear editing system;

FIG. 2 is a schematic diagram of a video camera-recorder (camcorder) using face detection;

FIG. 3 is a schematic diagram illustrating a training process;

FIG. 4 is a schematic diagram illustrating a detection process;

FIG. 5 schematically illustrates a feature histogram;

FIG. 6 schematically illustrates a sampling process to generate eigenblocks;

FIGS. 7 and 8 schematically illustrates sets of eigenblocks;

FIG. 9 schematically illustrates a process to build a histogram representing a block position;

FIG. 10 schematically illustrates the generation of a histogram bin number;

FIG. 11 schematically illustrates the calculation of a face probability;

FIGS. 12a to 12f are schematic examples of histograms generated using the above methods;

FIGS. 13a and 13b schematically illustrate the data structure of the histograms;

FIG. 14 schematically illustrates a so-called bin map with a face window overlaid;

FIGS. 15a to 15g schematically illustrate so-called multiscale face detection;

FIG. 16 is a schematic flowchart illustrating a technique for detecting face positions in a multiscale arrangement;

FIG. 17 schematically illustrates a motion detector;

FIGS. 18a to 18e schematically illustrate a technique for detecting an area of change in an image;

FIGS. 19a to 19c schematically illustrate an improvement on the technique of FIGS. 18a to 18e;

FIGS. 20a to 20c schematically illustrate a spatial decimation technique;

FIGS. 20a to 21d schematically illustrate another spatial decimation technique;

FIG. 22 schematically illustrates a face tracking algorithm;

FIGS. 23a and 23b schematically illustrate the derivation of a search area used for skin colour detection;

FIG. 24 schematically illustrates a mask applied to skin colour detection;

FIGS. 25a to 25c schematically illustrate the use of the mask of FIG. 24;

FIG. 26 is a schematic distance map;

FIG. 27 schematically illustrates a colour mask process;

FIG. 28 schematically illustrates a colour map update process; and

FIGS. 29a to 29c schematically illustrate a gradient (variance) pre-processing technique.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The embodiments will be described with respect to face detection, but are equally applicable to a detection of other objects such as cars, by training with training images representing the required objects.

FIG. 1 is a schematic diagram of a general purpose computer system for use as a face detection system and/or a non-linear editing system. The computer system comprises a processing unit 10 having (amongst other conventional components) a central processing unit (CPU) 20, memory such as a random access memory (RAM) 30 and non-volatile storage such as a disc drive 40. The computer system may be connected to a network 50 such as a local area network or the Internet (or both). A keyboard 60, mouse or other user input device 70 and display screen 80 are also provided. The skilled man will appreciate that a general purpose computer system may include many other conventional parts which need not be described here.

FIG. 2 is a schematic diagram of a video camera-recorder (camcorder) using face detection. The camcorder 100 comprises a lens 110 which focuses an image onto a charge coupled device (CCD) image capture device 120. The resulting image in electronic form is processed by image processing logic 130 for recording on a recording medium such as a tape cassette 140. The images captured by the device 120 are also displayed on a user display 150 which may be viewed through an eyepiece 160.

To capture sounds associated with the images, one or more microphones are used. These may be external microphones, in the sense that they are connected to the camcorder by a flexible cable, or maybe mounted on the camcorder body itself. Analogue audio signals from the microphone (s) are processed by an audio processing arrangement 170 to produce appropriate audio signals for recording on the storage medium 140.

It is noted that the video and audio signals may be recorded on the storage medium 140 in either digital form or analogue form, or even in both forms. Thus, the image processing arrangement 130 and the audio processing arrangement 170 may include a stage of analogue to digital conversion.

The camcorder user is able to control aspects of the lens 110's performance by user controls 180 which influence a lens control arrangement 190 to send electrical control signals 200 to the lens 110. Typically, attributes such as focus and zoom are controlled in this way, but the lens aperture or other attributes may also be controlled by the user.

Two further user controls are schematically illustrated. A push button 210 is provided to initiate and stop recording onto the recording medium 140. For example, one push of the control 210 may start recording and another push may stop recording, or the control may need to be held in a pushed state for recording to take place, or one push may start recording for a certain timed period, for example five seconds. In any of these arrangements, it is technologically very straightforward to establish from the camcorder's record operation where the beginning and end of each "shot" (continuous period of recording) occurs.

The other user control shown schematically in FIG. 2 is a "good shot marker" (GSM) 220, which may be operated by the user to cause "metadata" (associated data) to be stored in connection with the video and audio material on the recording medium 140, indicating that this particular shot was subjectively considered by the operator to be "good" in some respect (for example, the actors performed particularly well; the news reporter pronounced each word correctly; and so on).

The metadata may be recorded in some spare capacity (e.g. "user data") on the recording medium 140, depending on the particular format and standard in use. Alternatively, the metadata can be stored on a separate storage medium such as a removable MemoryStick.sup.RTM memory (not shown), or the metadata could be stored on an external database (not shown), for example being communicated to such a database, by a wireless link (not shown). The metadata can include not only the GSM information but also shot boundaries, lens attributes, alphanumeric information input by a user (e.g. on a keyboard--not shown), geographical position information from a global positioning system receiver (not shown) and so on.

So far, the description has covered a metadata-enabled camcorder. Now, the way in which face detection may be applied to such a camcorder will be described. It will of course be appreciated that the techniques are applicable to, for example, a networked camera such as an internet protocol (IP) camera, a video conferencing camera and the like.

The camcorder includes a face detector arrangement 230. Appropriate arrangements will be described in much greater detail below, but for this part of the description it is sufficient to say that the face detector arrangement 230 receives images from the image processing arrangement 130 and detects, or attempts to detect, whether such images contain one or more faces. The face detector may output face detection data which could be in the form of a "yes/no" flag or maybe more detailed in that the data could include the image co-ordinates of the faces, such as the co-ordinates of eye positions within each detected face. This information may be treated as another type of metadata and stored in any of the other formats described above.

As described below, face detection may be assisted by using other types of metadata within the detection process. For example, the face detector 230 receives a control signal from the lens control arrangement 190 to indicate the current focus and zoom settings of the lens 110. These can assist the face detector by giving an initial indication of the expected image size of any faces that may be present in the foreground of the image. In this regard, it is noted that the focus and zoom settings between them define the expected separation between the camcorder 100 and a person being filmed, and also the magnification of the lens 110. From these two attributes, based upon an average face size, it is possible to calculate the expected size (in pixels) of a face in the resulting image data.

A conventional (known) speech detector 240 receives audio information from the audio processing arrangement 170 and detects the presence of speech in such audio information. The presence of speech may be an indicator that the likelihood of a face being present in the corresponding images is higher than if no speech is detected. In some embodiments, to be discussed below, the speech detector may be modified so as to provide a degree of location of a speaker by detecting a most active microphone from a set of microphones, or by a triangulation or similar technique between multiple microphones.

Finally, the GSM information 220 and shot information (from the control 210) are supplied to the face detector 230, to indicate shot boundaries and those shots considered to be most useful by the user.

Of course, if the camcorder is based upon the analogue recording technique, further analogue to digital converters (ADCs) may be required to handle the image and audio information.

The present embodiment uses a face detection technique arranged as two phases. FIG. 3 is a schematic diagram illustrating a training phase, and FIG. 4 is a schematic diagram illustrating a detection phase.

Unlike some previously proposed face detection methods (see References 4 and 5 below), the present method is based on modelling the face in parts instead of as a whole. The parts can either be blocks centred over the assumed positions of the facial features (so-called "selective sampling") or blocks sampled at regular intervals over the face (so-called "regular sampling"). The present description will cover primarily regular sampling, as this was found in empirical tests to give the better results.

In the training phase, an analysis process is applied to a set of images known to contain faces, and (optionally) another set of images ("nonface images") known not to contain faces. The analysis process builds a mathematical model of facial and nonfacial features, against which a test image can later be compared (in the detection phase).

So, to build the mathematical model (the training process 310 of FIG. 3), the basic steps are as follows:

1. From a set 300 of face images normalised to have the same eye positions, each face is sampled regularly into small blocks.

2. Attributes are calculated for each block; these attributes are explained further below.

3. The attributes are quantised to a manageable number of different values.

4. The quantised attributes are then combined to generate a single quantised value in respect of that block position.

5. The single quantised value is then recorded as an entry in a histogram, such as the schematic histogram of FIG. 5. The collective histogram information 320 in respect of all of the block positions in all of the training images forms the foundation of the mathematical model of the facial features.

One such histogram is prepared for each possible block position, by repeating the above steps in respect of a large number of test face images. The test data are described further in Appendix A below. So, in a system which uses an array of 8.times.8 blocks, 64 histograms are prepared. In a later part of the processing, a test quantised attribute is compared with the histogram data; the fact that a whole histogram is used to model the data means that no assumptions have to be made about whether it follows a parameterised distribution, e.g. Gaussian or otherwise. To save data storage space (if needed), histograms which are similar can be merged so that the same histogram can be reused for different block positions.

In the detection phase, to apply the face detector to a test image 350, successive windows in the test image are processed 340 as follows:

6. The window is sampled regularly as a series of blocks, and attributes in respect of each block are calculated and quantised as in stages 1-4 above.

7. Corresponding "probabilities" for the quantised attribute values for each block position are looked up from the corresponding histograms. That is to say, for each block position, a respective quantised attribute is generated and is compared with a histogram previously generated in respect of that block position. The way in which the histograms give rise to "probability" data will be described below. 8. All the probabilities obtained above are multiplied together to form a final probability which is compared against a threshold in order to classify the window as "face" or "nonface". It will be appreciated that the detection result of "face" or "nonface" is a probability-based measure rather than an absolute detection. Sometimes, an image not containing a face may be wrongly detected as "face", a so-called false positive. At other times, an image containing a face may be wrongly detected as "nonface", a so-called false negative. It is an aim of any face detection system to reduce the proportion of false positives and the proportion of false negatives, but it is of course understood that to reduce these proportions to zero is difficult, if not impossible, with current technology.

As mentioned above, in the training phase, a set of "nonface" images can be used to generate a corresponding set of "nonface" histograms. Then, to achieve detection of a face, the "probability" produced from the nonface histograms may be compared with a separate threshold, so that the probability has to be under the threshold for the test window to contain a face. Alternatively, the ratio of the face probability to the nonface probability could be compared with a threshold.

Extra training data may be generated by applying "synthetic variations" 330 to the original training set, such as variations in position, orientation, size, aspect ratio, background scenery, lighting intensity and frequency content.

The derivation of attributes and their quantisation will now be described. In the present technique, attributes are measured with respect to so-called eigenblocks, which are core blocks (or eigenvectors) representing different types of block which may be present in the windowed image. The generation of eigenblocks will first be described with reference to FIG. 6.

Eigenblock Creation

The attributes in the present embodiment are based on so-called eigenblocks. The eigenblocks were designed to have good representational ability of the blocks in the training set. Therefore, they were created by performing principal component analysis on a large set of blocks from the training set. This process is shown schematically in FIG. 6 and described in more detail in Appendix B.

Training the System

Experiments were performed with two different sets of training blocks.

Eigenblock Set I

Initially, a set of blocks were used that were taken from 25 face images in the training set. The 16.times.16 blocks were sampled every 16 pixels and so were non-overlapping. This sampling is shown in FIG. 6. As can be seen, 16 blocks are generated from each 64.times.64 training image. This leads to a total of 400 training blocks overall.

The first 10 eigenblocks generated from these training blocks are shown in FIG. 7.

Eigenblock Set II

A second set of eigenblocks was generated from a much larger set of training blocks. These blocks were taken from 500 face images in the training set. In this case, the 16.times.16 blocks were sampled every 8 pixels and so overlapped by 8 pixels. This generated 49 blocks from each 64.times.64 training image and led to a total of 24,500 training blocks.

The first 12 eigenblocks generated from these training blocks are shown in FIG. 8.

Empirical results show that eigenblock set II gives slightly better results than set I. This is because it is calculated from a larger set of training blocks taken from face images, and so is perceived to be better at representing the variations in faces. However, the improvement in performance is not large.

Building the Histograms

A histogram was built for each sampled block position within the 64.times.64 face image. The number of histograms depends on the block spacing. For example, for block spacing of 16 pixels, there are 16 possible block positions and thus 16 histograms are used.

The process used to build a histogram representing a single block position is shown in FIG. 9. The histograms are created using a large training set 400 of M face images. For each face image, the process comprises: Extracting 410 the relevant block from a position (i,j) in the face image. Calculating the eigenblock-based attributes for the block, and determining the relevant bin number 420 from these attributes. Incrementing the relevant bin number in the histogram 430.

This process is repeated for each of M images in the training set, to create a histogram that gives a good representation of the distribution of frequency of occurrence of the attributes. Ideally, M is very large, e.g. several thousand. This can more easily be achieved by using a training set made up of a set of original faces and several hundred synthetic variations of each original face.

Generating the Histogram Bin Number

A histogram bin number is generated from a given block using the following process, as shown in FIG. 10. The 16.times.16 block 440 is extracted from the 64.times.64 window or face image. The block is projected onto the set 450 of A eigenblocks to generate a set of "eigenblock weights". These eigenblock weights are the "attributes" used in this implementation. They have a range of -1 to +1. This process is described in more detail in Appendix B. Each weight is quantised into a fixed number of levels, L, to produce a set of quantised attributes 470, w.sub.i,i=1 . . . A. The quantised weights are combined into a single value as follows: h=w.sub.1.L.sup.A-1+w.sub.2L.sup.A-2+w.sub.3L.sup.A-3+ . . . +w.sub.A-1L.sup.1+w.sub.AL.sup.0 where the value generated, h, is the histogram bin number 480. Note that the total number of bins in the histogram is given by L.sup.A.

The bin "contents", i.e. the frequency of occurrence of the set of attributes giving rise to that bin number, may be considered to be a probability value if it is divided by the number of training images M. However, because the probabilities are compared with a threshold, there is in fact no need to divide through by M as this value would cancel out in the calculations. So, in the following discussions, the bin "contents" will be referred to as "probability values", and treated as though they are probability values, even though in a strict sense they are in fact frequencies of occurrence.

The above process is used both in the training phase and in the detection phase.

Face Detection Phase

The face detection process involves sampling the test image with a moving 64.times.64 window and calculating a face probability at each window position.

The calculation of the face probability is shown in FIG. 11. For each block position in the window, the block's bin number 490 is calculated as described in the previous section. Using the appropriate histogram 500 for the position of the block, each bin number is looked up and the probability 510 of that bin number is determined. The sum 520 of the logs of these probabilities is then calculated across all the blocks to generate a face probability value, P.sub.face (otherwise referred to as a log likelihood value).

This process generates a probability "map" for the entire test image. In other words, a probability value is derived in respect of each possible window centre position across the image. The combination of all of these probability values into a rectangular (or whatever) shaped array is then considered to be a probability "map" corresponding to that image.

This map is then inverted, so that the process of finding a face involves finding minima in the inverted map (of course this is equivalent to not inverting the map and finding maxima; either could be done). A so-called distance-based technique is used. This technique can be summarised as follows: The map (pixel) position with the smallest value in the inverted probability map is chosen. If this value is larger than a threshold (TD), no more faces are chosen. This is the termination criterion. Otherwise a face-sized block corresponding to the chosen centre pixel position is blanked out (i.e. omitted from the following calculations) and the candidate face position finding procedure is repeated on the rest of the image until the termination criterion is reached.

Nonface Method

The nonface model comprises an additional set of histograms which represent the probability distribution of attributes in nonface images. The histograms are created in exactly the same way as for the face model, except that the training images contain examples of nonfaces instead of faces.

During detection, two log probability values are computed, one using the face model and one using the nonface model. These are then combined by simply subtracting the nonface probability from the face probability: P.sub.combined=P.sub.face-P.sub.nonface

P.sub.combined is then used instead of P.sub.face to produce the probability map (before inversion).

Note that the reason that P.sub.nonface is subtracted from P.sub.face is because these are log probability values.

Note also that the face and non-face histograms may optionally be combined at the end of the training process (prior to face detection) by simply summing log histograms: Summed histogram=log (histogram(face))+log (histogram(non_face))

This is why only one histogram is required for each block position/pose/eye spacing combination in the description below.

Histogram Examples

FIGS. 12a to 12f show some examples of histograms generated by the training process described above.

FIGS. 12a, 12b and 12c are derived from a training set of face images, and FIGS. 12d, 12e and 12f are derived from a training set of nonface images. In particular:

TABLE-US-00001 Face Nonface histograms histograms Whole histogram FIG. 12a FIG. 12d Zoomed onto the main peaks at about h = 1500 FIG. 12b FIG. 12e A further zoom onto the region about h = 1570 FIG. 12c FIG. 12f

It can clearly be seen that the peaks are in different places in the face histogram and the nonface histograms.

Histogram Storage

As described above, the histograms store statistical information about the likelihood of a face at a given scale and location in an image. However, the ordering of the histograms is unexpectedly significant to system performance. A simple ordering can result in the access being non-localised (i.e. consecutive accesses are usually far apart in memory). This can give poor cache performance when implemented using microprocessors or bespoke processors. To address this problem the histograms are reordered so that the access to the data are more localised.

In the present embodiment there are 6 histograms in total:

TABLE-US-00002 F.sup.38 Frontal face with an eye spacing of 38 pixels (that is, a "zoomed in" histogram) L.sup.38 Face facing to the left by 25 degrees, with an eye spacing of 38 pixels R.sup.38 Face facing to the right by 25 degrees, with an eye spacing of 38 pixels F.sup.22 Frontal face with an eye spacing of 22 pixels (that is, a "full face" histogram) L.sup.22 Face facing to the left by 25 degrees, with an eye spacing of 22 pixels R.sup.22 Face facing to the right by 25 degrees, with an eye spacing of 22 pixels

In the following discussion:

c is the value from the binmap (a map giving the histogram entry for each location in the image) for a given location in the image at a given scale--in the present case this is a 9 bit binary number. The binmap is precalculated by convolving the image with 9 eigenblocks, quantising the resulting 9 eigenblock weights and combining them into a single value;

x is the x location within the face window (between 0 and 6); and

y is the y location within the face window (between 0 and 6).

This means that the histograms for each pose (e.g. F.sup.38) are 512.times.7.times.7=25088 bytes in size.

F.sub.c,x,y.sup.38 is the value of the histogram for a given c,x & y.

For example, F.sub.15,4,5.sup.38 is the value given by the frontal histogram with the 38 eye spacing at location (4,5) in the face window, for a binmap value of 15.

A straightforward ordering of the histograms in memory is by c, then x, then y, then pose, and then eye spacing. A schematic example of this ordering is shown in FIG. 13a. An improved ordering system is by pose then x then y then c and then eye spacing. A schematic example of this type of ordering is shown in FIG. 13b.

There are two reasons for the improvements in cache performance when the histograms are ordered in the new way: (i) the way the poses are accessed; and (ii) the way that the face window moves during a face search.

The three different poses (left, right and frontal) are always accessed with the same bin-number and location for each location. i.e. if F.sub.329,2,1.sup.38 is accessed, then L.sub.329,2,1.sup.38 & R.sub.329,2,1.sup.38 are also accessed. These are adjacent in the new method, so excellent cache performance is achieved.

The new method of organising the histograms also takes advantage of the way that the face window moves during a search for faces in the image. Because of the way that the face window moves the same c value will be looked up in many (xy) locations.

FIG. 14 shows which values are used from the bin map to look for a face in a certain location. For example, F.sub.329,2,1.sup.38 is the value from the frontal histogram for eye spacing 38 for the (2,1) location in the face window.

It can be seen that when the face detection window moves 2 spaces to the right, that the highlighted squares will be shifted one place to the left. i.e. the same value will be looked up in a different location. In the example in FIG. 14. F.sub.329,2,1.sup.38 will become F.sub.329,1,1.sup.38 when the face window has shifted right by two.

As the algorithm searches for faces by shifting the face window through the image it will look up the same binmap in several locations. This means that if these values are stored close together in memory, then cache performance will be improved.

Another improvement which can be made to the histogram structure, either together with or independently from the improvement described above, is that side poses use fewer bits than frontal poses. The values stored in each histogram bin are quantised to a different number of bits depending on which pose they represent.

The number of bits used for each of the six histograms is summarised below:

TABLE-US-00003 F.sup.38 Frontal face with an eye spacing of 38 pixels - 8 bits L.sup.38 Face facing to the left by 25 degrees, with an eye spacing of 38 pixels - 4 bits R.sup.38 Face facing to the right by 25 degrees, with an eye spacing of 38 pixels - 4 bits F.sup.22 Frontal face with an eye spacing of 22 pixels - 8 bits L.sup.22 Face facing to the left by 25 degrees, with an eye spacing of 22 pixels - 4 bits R.sup.22 Face facing to the right by 25 degrees, with an eye spacing of 22 pixels - 4 bits

The advantage of this is that each set of 3 histogram values can be stored in 2 bytes instead of 3.

It was found that this is possible because the side poses have less importance than the frontal pose on the overall performance of the algorithm, and so these can be represented with reduced resolution without significantly affecting accuracy.

Multiscale Face Detection

In order to detect faces of different sizes in the test image, the test image is scaled by a range of factors and a distance (i.e. probability) map is produced for each scale. In FIGS. 15a to 15c the images and their corresponding distance maps are shown at three different scales. The method gives the best response (highest probability, or minimum distance) for the large (central) subject at the smallest scale (FIG. 15a) and better responses for the smaller subject (to the left of the main figure) at the larger scales. (A darker colour on the map represents a lower value in the inverted map, or in other words a higher probability of there being a face). Candidate face positions are extracted across different scales by first finding the position which gives the best response over all scales. That is to say, the highest probability (lowest distance) is established amongst all of the probability maps at all of the scales. This candidate position is the first to be labelled as a face. The window centred over that face position is then blanked out from the probability map at each scale. The size of the window blanked out is proportional to the scale of the probability map.

Examples of this scaled blanking-out process are shown in FIGS. 15a to 15c. In particular, the highest probability across all the maps is found at the left hand side of the largest scale map (FIG. 15c). An area 530 corresponding to the presumed size of a face is blanked off in FIG. 15c. Corresponding, but scaled, areas 532, 534 are blanked off in the smaller maps.

Areas larger than the test window may be blanked off in the maps, to avoid overlapping detections. In particular, an area equal to the size of the test window surrounded by a border half as wide/long as the test window is appropriate to avoid such overlapping detections.

Additional faces are detected by searching for the next best response and blanking out the corresponding windows successively.

The intervals allowed between the scales processed are influenced by the sensitivity of the method to variations in size. It was found in this preliminary study of scale invariance that the method is not excessively sensitive to variations in size as faces which gave a good response at a certain scale often gave a good response at adjacent scales as well.

The above description refers to detecting a face even though the size of the face in the image is not known at the start of the detection process. Another aspect of multiple scale face detection is the use of two or more parallel detections at different scales to validate the detection process. This can have advantages if, for example, the face to be detected is partially obscured, or the person is wearing a hat etc.

FIGS. 15d to 15g schematically illustrate this process. During the training phase, the system is trained on windows (divided into respective blocks as described above) which surround the whole of the test face (FIG. 15d) to generate "full face" histogram data and also on windows at an expanded scale so that only a central area of the test face is included (FIG. 15e) to generate "zoomed in" histogram data. This generates two sets of histogram data. One set relates to the "full face" windows of FIG. 15d, and the other relates to the "central face area" windows of FIG. 15e.

During the detection phase, for any given test window 536, the window is applied to two different scalings of the test image so that in one (FIG. 15f) the test window surrounds the whole of the expected size of a face, and in the other (FIG. 15g) the test window encompasses the central area of a face at that expected size. These are each processed as described above, being compared with the respective sets of histogram data appropriate to the type of window. The log probabilities from each parallel process are added before the comparison with a threshold is applied.

Putting both of these aspects of multiple scale face detection together leads to a particularly elegant saving in the amount of data that needs to be stored.

In particular, in these embodiments the multiple scales for the arrangements of FIGS. 15a to 15c are arranged in a geometric sequence. In the present example, each scale in the sequence is a factor of {square root over (2)} different to the adjacent scale in the sequence. Then, for the parallel detection described with reference to FIGS. 15d to 15g, the larger scale, central area, detection is carried out at a scale 3 steps higher in the sequence, that is, 23/4 times larger than the "full face" scale, using attribute data relating to the scale 3 steps higher in the sequence. So, apart from at extremes of the range of multiple scales, the geometric progression means that the parallel detection of FIGS. 15d to 15g can always be carried out using attribute data generated in respect of another multiple scale three steps higher in the sequence.

The two processes (multiple scale detection and parallel scale detection) can be combined in various ways. For example, the multiple scale detection process of FIGS. 15a to 15c can be applied first, and then the parallel scale detection process of FIGS. 15d to 15g can be applied at areas (and scales) identified during the multiple scale detection process. However, a convenient and efficient use of the attribute data may be achieved by: deriving attributes in respect of the test window at each scale (as in FIGS. 15a to 15c) comparing those attributes with the "full face" histogram data to generate a "full face" set of distance maps comparing the attributes with the "zoomed in" histogram data to generate a "zoomed in" set of distance maps for each scale n, combining the "full face" distance map for scale n with the "zoomed in" distance map for scale n+3 deriving face positions from the combined distance maps as described above with reference to FIGS. 15a to 15c

Further parallel testing can be performed to detect different poses, such as looking straight ahead, looking partly up, down, left, right etc. Here a respective set of histogram data is required and the results are preferably combined using a "max" function, that is, the pose giving the highest probability is carried forward to thresholding, the others being discarded.

Improved Use of Multiple Scales

The face detection algorithm provides many probability maps at many scales; the requirement is to find all the places in the image where the probability exceeds a given threshold, whilst ensuring that there are no overlapping faces.

A disadvantage of the method described above is that it requires the storage of a complete set of probability maps at all scales, which is a large memory requirement. The following technique does not require the storage of all of the probability maps simultaneously.

In summary, a temporary list of candidate face locations is maintained. As the probability map for each scale is calculated, the probability maxima are found and compared against the list of candidate face locations to ensure that no overlapping faces exist.

In detail, this method uses a face list to maintain a list of current locations when there might be a face. Each face in the face list has a face location and a face size. The threshold is the probability threshold above which an object is deemed to be a face. The scale factor is the size factor between successive scales (1.189207115 or 2 in the present embodiment).

A 16.times.16 face_size is considered in the example description below.

The process is schematically illustrated in the flowchart of FIG. 16.

Referring to FIG. 16, the process starts at a step 1400 in respect of one of the scales (in the example shown, the smallest scale). The first time that the step 1400 takes place, the face list will be empty, but in general, for all faces in the face list, the face size for each face is modified at the step 1400 by multiplying the respective face size by the scale factor. This makes sure that faces detected in respect of the previous scale are correctly sized for a valid comparison with any maxima in the current scale.

At a step 1410, the maximum probability value, mp, is detected in the current map.

At a step 1420, the maximum probability value mp is compared with the threshold. If mp is greater than the threshold then control passes to a step 1430. On the other hand, if mp is not greater than the threshold then processing of the next map (corresponding to the next scale factor to be dealt with) is initiated at a step 1440.

Returning to the step 1430, if the location within the current scale's probability map of the maximum value mp overlaps (coincides) with a face in the face list (considering the modified sizes derived at the step 1400), then control passes to a step 1450. If not, control passes to a step 1460.

At the step 1450, the value mp is compared with a stored probability value in respect of the existing face. If mp is greater than that probability then the existing face is deleted at a step 1470 and a new entry created in the face list corresponding to the current value and position of mp. In particular, the value mp is stored in respect of the new entry in the face list and a 16.times.16 pixel area centred on the image position of the current maximum probability is set to the threshold at a step 1480. At a step 1490 the current location of the maximum probability value is added to the face list with a face size of 16. Control then returns to the step 1410.

Returning to the step 1460, if the maximum probability location was detected not to overlap with any faces in the face list (at the step 1430) then a new entry is created in the face list. As above, at the step 1460 the value mp is stored and a 16.times.16 area surrounding the current maximum value is set to the threshold. At a step 1465 the current maximum position is added to the face list with a face size of 16 and control returns to the step 1410.

If at the step 1450 the maximum probability value mp is detected not to be greater than the probability of the existing (overlapping) face then control passes to a step 1455 at which the area of the existing face is set to the threshold value and control returns to the step 1410.

At each of these stages, when control returns to the step 1410, a maximum probability value mp is again detected, but this will be in the light of modifications to the probability values surrounding detected faces in the steps 1460, 1455 and 1480. So, the modified values created at those steps will not in fact pass the test of the step 1420, in that a value set to equal the threshold value will be found not to exceed it. Accordingly, the step 1420 will establish whether another position exists in the correct map where the threshold value is exceeded.

An advantage of this method is that it allows each scale of probability map to be considered separately. Only the face list needs to be stored in between processing each scale. This has the following advantages: Lower memory requirement: A complete set of probability maps do not need to be stored. Only the face list needs to be stored, which requires much less memory. Allows temporal decimation: The algorithm can use methods such as temporal decimation, where processing for one frame is divided between several timeslots and only a subset of scales are processed during each time slot. This method can now be used while only needing to maintain a face list between each call, instead of the entire set of probability maps calculated so far. Allows faster searching: Only one scale is considered at a time. Therefore, we do not need to blank out areas across all scales in a set of probability maps each time a maximum if found. Change Detection

In situations where face detection has to be carried out in real time, it can be difficult to complete all of the face detection processing in the time allowed--e.g. one frame period of a video signal.

A change detection process is used to detect which areas of the image have changed since the previous frame, or at least to remove from the face detection process certain areas detected not to have changed since the previous frame.

Areas of the image that have not changed since the previous frame do not need to have face detection performed on them again, as the result is likely to be the same as the previous frame. However, areas of the images that have changed need to have face detection performed on them afresh. These areas of the image are labelled as "areas of interest" during change detection.

In the present embodiment, change detection is performed only at a single fixed scale, e.g. the original image scale or the largest scale that is used in face detection. The process in illustrated in FIG. 17 which schematically illustrates a motion detector.

The current and previous frames are first processed by low pass filters 1100, 1110. The two frames are then supplied to a differencer 1120 to produce a frame difference image, for example a representation of the absolute pixel (or block) differences between frames with one difference value per pixel (or block) position. The absolute values of the difference image are then thresholded 1130 by comparison with a threshold value Thr.sub.diff to create a binary difference image, i.e. an array of one-bit values with one value per pixel (or block) position: very small differences are set to zero (no change) and larger differences are set to one (change detected). Finally, a morphological opening operation is performed 1140 on the binary difference image to create more contiguous areas of detected change/motion.

In practice, the low-pass filtering operation may be omitted.

Morphological opening is a known image processing technique and in this example is performed on a 3.times.3 area (i.e. a 3.times.3 block is used as the morphological structuring element) and comprises a morphological erosion operation followed by a morphological dilation operation. In order to carry this out in what is basically a raster-based system, the morphological processing is carried out after processing every third line.

Change detection can be applied to the whole image, as described above, to create a map of areas of the image where changes have been detected. Face detection is applied to those areas.

Alternatively, change detection can be used to eliminate certain areas of the image from face detection, though without necessarily detecting all areas of motion or "no motion". This technique has the advantage of reducing the processing requirements of the change detection process while still potentially providing a useful saving in processing for the face detection itself. A schematic example of this process is illustrated in FIGS. 18a to 18e.

In FIG. 18a, change detection is applied in a raster scanning arrangement in which a scan 1150 of horizontal lines (of pixels or blocks) from top left towards bottom right of an image. The basic process shown in FIG. 17 (without morphological processing and preferably without low pass filtering) is used and the image is compared with the preceding image. At each scan point, the detected absolute difference is compared with the threshold value Thr.sub.diff.

The scan 1150 progresses until the detected absolute difference in respect of one scan position 1160 exceeds the threshold Thr.sub.diff. At this point the scan 1150 terminates.

Three similar scans 1170, 1180, 1190 are carried out. The scan 1170 is a horizontal scan starting at the bottom of the image and terminates when a scan position 1200 gives rise to an absolute difference value exceeding the threshold Thr.sub.diff. The scan 1180 is a downwards vertical scan starting at the left hand side of the image and terminates when a scan position 1210 gives rise to an absolute difference value exceeding the threshold Thr.sub.diff. And the scan 1190 is a downwards vertical scan starting at the right hand side of the image and terminates when a scan position 1220 gives rise to an absolute difference value exceeding the threshold Thr.sub.diff.

The four points 1160, 1200, 1210, 1220 define a bounding box 1230 in FIG. 18e. In particular, if the image co-ordinates of a point nnnn are (X.sub.nnnn, Y.sub.nnnn) then the four vertices of the bounding box 1230 are given by:

TABLE-US-00004 top left (x.sub.1210, y.sub.1160) top right (x.sub.1220, y.sub.1160) bottom left (x.sub.1210, y.sub.1200) bottom right (x.sub.1220, y.sub.1200)

The bounding box therefore does not define all areas of the image in which changes have been detected, but instead it defines an area (outside the bounding box) which can be excluded from face processing because change has not been detected there. As regards the area inside the bounding box, potentially all of the area may have changed, but a more usual situation would be that some parts of that area may have changed and some not.

Of course, there are several permutations of this technique: (a) the order in which the 4 searches are performed (b) the direction in which each search is performed (the arrows could be reversed in each diagram without changing the effect of the algorithm) (c) whether the scans are carried out sequentially (one scan after another) or in parallel (two or more scans at the same time).

In a variation shown schematically in FIGS. 19a to 19c, the two vertical scans 1180', 1190' are carried out only in respect of those rows 1240 which have not already been eliminated by the two horizontal scans 1150, 1170. This variation can reduce the processing requirements.

The change detection techniques described above work well with the face detection techniques as follows. Change detection is carried out, starting from four extremes (edges) of the image, and stops in each case when a change is detected. So, apart from potentially the final pixel (or block) or part row/column of each of the change detection processes, change detection is carried out only in respect of those image areas which are not going to be subject to face detection. Similarly, apart from that final pixel, block or part row/column, face detection is carried out only in respect of areas which have not been subject to the change detection process. Bearing in mind that change detection is less processor-intensive than face detection, this relatively tiny overlap between the two processes means that in almost all situations the use of change detection will reduce the overall processing requirements of an image.

A different method of change detection applies to motion-encoded signals such as MPEG-encoded signals, or those which have been previously encoded in this form and decoded for face detection. Motion vectors or the like associated with the signals can indicate where an inter-image change has taken place. A block (e.g. an MPEG macroblock) at the destination (in a current image) of each motion vector can be flagged as an area of change. This can be done instead of or in addition to the change detection techniques described above.

Another method of reducing the processing requirements is as follows. The face detection algorithm is divided into a number of stages that are repeated over many scales. The algorithm is only completed after n calls. The algorithm is automatically partitioned so that each call takes approximately an equal amount of time. The key features of this method are: The method uses an automatic method to partition the algorithm into pieces that take an equal amount of processing time. Estimates are used for the processing time taken by each stage, so that the algorithm can return before executing a given stage if it will take too much time. The algorithm can only return at the end of each stage; it cannot return part way through a stage. This limits the amount of local storage required, and simplifies the


Free Web Sudoku Puzzles.
Solve with your browser.
      6 9       3
  8       1      
  9 6   4 7     5
      7 5   2    
    9       6    
    1   8 9      
2     1 6   4 8  
      3       6  
5       2 4      
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!