Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles

Method of and apparatus for detecting a human face and observer tracking display Number:6,633,655 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

Google
 

Top Breaking News
     Republican Presidential Contenders Debate Iran and Domestic Issues by Jim Malone
     Rights Groups: Syria Escalating Violent Campaign Against Journalists by Michael Lipin
     'Friends of Syria' to Meet in Tunis by Al Pessin

Title: Method of and apparatus for detecting a human face and observer tracking display

Abstract: A method is provided for detecting a human face in an image, such as a sequence of images supplied by a video camera. The method comprises locating in each image a candidate face region and analyzing the candidate face region for a first characteristic indicative of a facial feature. The locating step may comprise detecting uniformity saturated regions of predetermined shape in a reduced resolution version of the image. The analyzing step may comprise selecting a signal color component, forming a vertical integral projection profile and detecting an omega shape in the profile characteristic of an eye region of a face.

Patent Number: 6,633,655 Issued on 10/14/2003 to Hong,   et al.


Inventors: Hong; Qi He (Oxfordshire, GB), Holliman; Nicolas Steven (Oxfordshire, GB), Ezra; David (Oxfordshire, GB)
Assignee: Sharp Kabushiki Kaisha (Osaka, JP)
Appl. No.: 09/386,527
Filed: August 30, 1999


Foreign Application Priority Data

Sep 05, 1998 [GB] 9819323

Current U.S. Class: 382/118 ; 348/E13.022; 348/E13.027; 348/E13.045; 348/E13.059; 382/164; 382/165; 382/174; 382/206
Current International Class: G02B 27/22 (20060101); G06T 7/00 (20060101); G06K 9/00 (20060101); H04N 13/00 (20060101); G06K 009/00 ()
Field of Search: 382/118,174,164,165,206


References Cited [Referenced By]

U.S. Patent Documents
5629752 May 1997 Kinjo et al.
5680481 October 1997 Prasad et al.
5715325 February 1998 Bang et al.
5835616 November 1998 Lobo et al.
5870138 February 1999 Smith et al.
6088137 July 2000 Tomizawa
6148092 November 2000 Qian
6389155 May 2002 Funayama et al.
6404900 June 2002 Qian et al.
Foreign Patent Documents
751473 Feb., 1997 EP
0844582 May., 1998 EP
9629674 Sep., 1996 WO
9825229 Jun., 1998 WO

Other References

Wu et al., "Face and Facial Feature Extraction from Color Image." Proc. of the Second Int. Conf. on Automatic Face and Gestur Recognition, Oct. 1996, pp. 345-350).* .
Reisfeld et al. "Robust Detection of Facial Features by Generalized Symmetry." Proc. 11th IAPR, Int. Conf. on Computer Vision and Applications, Conference A, vol. 1, Aug. 1992, pp. 117-120.* .
Brunelli et al., "Face Recognition: Features versus Templates." IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, No. 10, Oct. 1993, pp. 1042-1052.* .
Sobottka et al. "Segmentation and Tracking of Faces in Color Images." Proc. of the Second Int. Conf. on Automatic Face and Gesture Recognition, Oct. 1996, pp. 236-241.* .
Peng et al. "Locating Facial Features Using Threshold Images." 3rd Int. Conf. on Signal Processing, Oct. 1996, pp. 1162-1166.* .
Crowley et al. "Multi-Modal Tracking of Faces for Video Communications." Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Jun. 1997, pp. 640-645.* .
Herodotou et al. "A color Segmentation Scheme for Object-Based Video Coding."IEEE Symp. on Advance in Digital Filtering an Signal Processing, Jun. 1998, pp. 25-29.* .
Search Report for British Patent Application No. 9819323.8 dated Jan. 15, 1999. .
G. Yang et al., Pattern Recognition, vol. 27, No. 1, pp 53-63, 1994, "Human Face Detection in Complex Backgrounds." .
M. Turk and A. Pentland, Journal of Cognitive Neuroscience, vol. 3, No. 1, pp. 70-86, "Eigenfaces for Recognition." .
Sako et al., proceedings of 12 IAPR International Conference on Pattern Recognition. Jerusalem, Oct. 6-13, 1994, vol. II, pp. 320-324, "Real Time Facial Feature Tracking Based on Matching Techniques and its Applications." .
Chen et al., IEEE (0-8186-7042-8), pp. 591-596, 1995, "Face Detection by Fuzzy Pattern Matching." .
European Search Report for EP Application 99306962.4-2201 mailed Jan. 28, 2000..

Primary Examiner: Chang; Jon
Attorney, Agent or Firm: Renner, Otto, Boisselle & Sklar

Claims



What is claimed is:

1. A method of detecting a human face in an image, comprising locating in the image a candidate face region and analyzing the candidate face region for a first characteristic indicative of a facial feature, wherein the first characteristic comprises a substantially symmetrical horizontal brightness profile comprising a maximum disposed between first and second minima and in that the analyzing step comprises forming a vertical integral projection of a portion of the candidate face region and determining whether the vertical integral projection has first and second minima disposed substantially symmetrically about a maximum.

2. A method as claimed in claim 1, wherein the locating and analyzing steps are repeated for each image of a sequence of images.

3. A method as claimed in claim 1, wherein the image is a color image and the analyzing step is performed on a color component of the color image.

4. A method as claimed in claim 1, wherein the image is a color image and the analyzing step is performed on a contrast image derived from the color image.

5. A method as claimed in claim 1, wherein the analyzing step determines whether the vertical integral projection has first and second minima whose horizontal separation is within a predetermined range.

6. A method as claimed in claim 1, wherein the analyzing step determines whether the vertical integral projection has a maximum and first and second minima such that the ratio of the difference between the maximum and the smaller of the first and second minima to the maximum is greater than a first threshold.

7. A method as claimed in claim 6, wherein vertical integral projections are formed for a plurality of portions of the candidate face region and the portion having the highest ratio is selected as a potential target image.

8. A method as claimed in claim 1, wherein the analyzing step comprises forming a measure of the symmetry of the portion.

9. A method as claimed in claim 8, wherein the symmetry measure is formed as: ##EQU22##

where V(x) is the value of the vertical integral projection at horizontal position x and x.sub.0 is the horizontal position of the middle of the vertical integral projection.

10. A method as claimed in claim 8, wherein the vertical integral projection is formed for a plurality of portions of the face candidate and the portion having the highest symmetry measure is selected as a potential target image.

11. A method as claimed in claim 1, wherein the analyzing step comprises dividing a portion of the candidate face region into left and right halves, forming a horizontal integral projection of each of the halves, and comparing a measure of horizontal symmetry of the left and right horizontal integral projections with a second threshold.

12. A method as claimed in claim 1, wherein the analyzing step determines whether the candidate face region has first and second brightness minima disposed at substantially the same height with a horizontal separation within a predetermined range.

13. A method as claimed in claim 12, wherein the analyzing step determines whether the candidate face region has a vertically extending region of higher brightness than and disposed between first and second brightness minima.

14. A method as claimed in claim 13, wherein the analyzing step determines whether the candidate face region has a horizontally extending region disposed below and of lower brightness than the vertically extending region.

15. A method as claimed in claim 1, wherein the analyzing step comprises locating, in the candidate face region, candidate eye pupil regions where a green image component is greater than a red image component or where a blue image component is greater than a green image component.

16. A method as claimed in claim 15, wherein locating the candidate eye pupil regions is restricted to candidate eye regions of the candidate face region.

17. A method as claimed in claim 16, wherein the analyzing step forms a function E(x,y) for picture elements (x,y) in the candidate eye regions such that: ##EQU23##

where R, G and B are red, green and blue image components, C.sub.1 and C.sub.2 are constants' E(x,y)=1 represents a picture element inside the candidate eye pupil regions and E(x,y)=0 represents a picture element outside the candidate eye pupil regions.

18. A method as claimed in claim 17, wherein the analyzing step detects the centers of the eye pupils as the centroids of the candidate eye pupil regions.

19. A method as claimed in claim 15, wherein the analyzing step comprises locating a candidate mouth region in a sub-region of the candidate face region which is horizontally between the candidate eye pupil regions and vertically below the level of the candidate eye pupil regions by between substantially half and substantially one and a half times the distance between the candidate eye pupil regions.

20. A method as claimed in claim 19, wherein the analyzing step forms a function M(x,y) for picture elements (x,y) within the sub-region such that: ##EQU24##

where R, G and B are red, green and blue image components, .eta. is a constant, M(x,y)=1 represents a picture element inside the candidate mouth region and M(x,y)=0 represents a picture element outside the candidate mouth region.

21. A method as claimed in claim 20, wherein vertical and horizontal projection profiles of the function M(x,y) are formed and a candidate lip region is defined in a rectangular sub-region where the vertical and horizontal projection profiles exceed first and second predetermined thresholds, respectively.

22. A method as claimed in claim 21, wherein the first and second predetermined thresholds are proportional to maxima of the vertical and horizontal projection profiles, respectively.

23. A method as claimed in claim 21, wherein the analyzing step checks whether the aspect ratio of the candidate lip region is between first and second predefined thresholds.

24. A method as claimed in claim 21, wherein the analyzing step checks whether the ratio of the vertical distance from the candidate eye pupil regions to the top of the candidate lip region to the spacing between the candidate eye pupil regions is between first and second preset thresholds.

25. A method as claimed in claim 1, wherein the analyzing step comprises dividing a portion of the candidate face region into left and right halves and comparing the angles of the brightness gradients of horizontally symmetrically disposed pairs of points for symmetry.

26. A method as claimed in claim 2, wherein the locating and analyzing steps are stopped when the first characteristic is found r times in R consecutive images of the sequence.

27. A method as claimed in claim 1, wherein the locating step comprises searching the image for a candidate face region having a second characteristic indicative of a human face.

28. A method as claimed in claim 27, wherein the second characteristic is uniform saturation.

29. A method as claimed in claim 28, wherein the searching step comprises reducing the resolution of the image by averaging the saturation to form a reduced resolution image and searching for a region of the reduced resolution image having, in a predetermined shape, a substantially uniform saturation which is substantially different from the saturation of the portion of the reduced resolution image surrounding the predetermined shape.

30. A method as claimed in claim 29, wherein the image comprises a plurality of picture elements and the resolution is reduced such that the predetermined shape is from two to three reduced resolution picture elements across.

31. A method as claimed in claim 30, wherein the image comprises a rectangular array of M by N picture elements, the reduced resolution image comprises (M/m) by (N/n) picture elements, each of which corresponds to m by n picture elements of the image, and the saturation P of each picture element of the reduced resolution image is given by: ##EQU25##

where f(i,j) is the saturation of the picture element of the ith column and the jth row of the m by n picture elements.

32. A method of claimed in claim 31, further comprising storing the saturations in a store.

33. A method as claimed in claim 31, wherein a uniformity value is ascribed to each of the reduced resolution picture elements by comparing the saturation of each of the reduced resolution picture elements with the saturation of at least one adjacent reduced resolution picture element.

34. A method as claimed in claim 33, wherein each uniformity value is ascribed a first value if

where max(P) and min(P) are the maximum and minimum values, respectively, of the saturations of the reduced resolution picture element and the or each adjacent picture element and T is a threshold, and a second value different from the first value otherwise.

35. A method as claimed in claim 34, wherein T is substantially equal to 0.15.

36. A method as claimed in claim 33, further comprising storing the saturations in a store, wherein the or each adjacent reduced resolution picture element has not been ascribed a uniformity value and each uniformity value is stored in the store in place of the corresponding saturation.

37. A method as claimed in claim 34, wherein the resolution is reduced such that the predetermined shape is two or three reduced resolution picture elements across and wherein the method further comprises indicating detection of a candidate face region when a uniformity value of the first value is ascribed to any of one reduced resolution picture element, two vertically or horizontally adjacent reduced resolution picture elements and a rectangular two-by-two array of picture elements and when a uniformity value of the second value is ascribed to each surrounding reduced resolution picture element.

38. A method as claimed in claim 37, further comprising storing the saturations in a store, wherein detection is indicated by storing a third value different from the first and second values in the store in place of the corresponding uniformity value.

39. A method as claimed in claim 30, further comprising repeating the resolution reduction and searching at least once with the reduced resolution picture elements shifted with respect to the image picture elements.

40. A method as claimed in claim 29, wherein the saturation is derived from red, green and blue components as

where max(R,G,B) and min(R,G,B) are the maximum and minimum values, respectively, of the red, green and blue components.

41. A method as claimed in claim 1, wherein a first image is captured while illuminating an expected range of positions of a face, a second image is captured using ambient light, and the second image is subtracted from the first image to form the image.
Description



BACKGROUND OF THE INVENTION

The present invention relates to a method of and an apparatus for detecting a human face. Such a method may, for example, be used for capturing a target image in an initialisation stage of an image tracking system. The present invention also relates to an observer tracking display, for instance of the autostereoscopic type, using an image tracking system including such an apparatus.

Other applications of such methods and apparatuses include security surveillance, video and image compression, video conferencing, multimedia database searching, computer games, driver monitoring, graphical user interfaces, face recognition and personal identification.

Autostereoscopic displays enable a viewer to see two separate images forming a stereoscopic pair by viewing such displays with the eyes in two viewing windows. Examples of such displays are disclosed in EP 0 602 934, EP 0 656 555, EP 0 708 351, EP 0 726 482 and EP 0 829 743. An example of a known type of observer tracking autostereoscopic display is illustrated in FIG. 1 of the accompanying drawings.

The display comprises a display system 1 co-operating with a tracking system 2. The tracking system 2 comprises a tracking sensor 3 which supplies a sensor signal to a tracking processor 4. The tracking processor 4 derives from the sensor signal an observer position data signal which is supplied to a display control processor 5 of the display system 1. The processor 5 converts the position data signal into a window steering signal and supplies this to a steering mechanism 6 of a tracked 3D display 7. The viewing windows for the eyes of the observer are thus steered so as to follow movement of the head of the observer and, within the working range, to maintain the eyes of the observer in the appropriate viewing windows. GB 2 324 428 and EP 0 877 274 disclose an observer video tracking system which has a short latency time, a high update frequency and adequate measurement accuracy for observer tracking autostereoscopic displays. FIG. 2 of the accompanying drawings illustrates an example of the system, which differs from that shown in FIG. 1 of the accompanying drawings in that the tracking sensor 3 comprises a Sony XC999 NTSC video camera operating at a 60 Hz field rate and the tracking processor 4 is provided with a mouse 8 and comprises a Silicon Graphics entry level machine of the Indy series equipped with an R4400 processor operating at 150 Mhz and a video digitiser and frame store having a resolution of 640.times.240 picture elements (pixels) for each field captured by the camera 3. The camera 3 is disposed on top of the display 7 and points towards the observer who sits in front of the display. The normal distance between the observer and the camera 3 is about 0.85 metres, at which distance the observer has a freedom of movement in the lateral or X direction of about 450mm. The distance between two pixels in the image formed by the camera corresponds to about 0.67 and 1.21 mm in the X and Y directions, respectively. The Y resolution is halved because each interlaced field is used individually.

FIG. 3 of the accompanying drawings illustrates in general terms the tracking method performed by the processor 4. The method comprises an initialisation stage 9 followed by a tracking stage 10. During the initialisation stage 9, a target image or "template" is captured by storing a portion of an image from the camera 3. The target image generally contains the observer eye region as illustrated at 11 in FIG. 4 of the accompanying drawings. Once the target image or template 11 has been successfully captured, observer tracking is performed in the tracking stage 10.

A global target or template search is performed at 12 so as to detect the position of the target image within the whole image produced by the camera 3. Once the target image has been located, motion detection is performed at 13 after which a local target or template search is performed at 14. The template matching steps 12 and 14 are performed by cross-correlating the target image in the template with each sub-section overlaid by the template. The best correlation value is compared with a predetermined threshold to check whether tracking has been lost in step 15. If so, control returns to the global template matching step 12. Otherwise, control returns to the step 13. The motion detection 13 and the local template matching 14 form a tracking loop which is performed for as long as tracking is maintained. The motion detection step supplies position data by a differential method which determines the movement of the target image between consecutive fields and adds this to the position found by local template matching in the preceding step for the earlier field.

The initialisation stage 9 obtains a target image or a template of the observer before tracking starts. The initialisation stage disclosed in GB 2 324 428 and EP 0 877 274 uses an interactive method in which the display 7 displays the incoming video images and an image generator, for example embodied in the processor 4, generates a border image or graphical guide 16 on the display as illustrated in FIG. 5 of the accompanying drawings. A user-operable control, for instance forming part of the mouse 8, allows manual actuation of capturing of the image region within the border image.

The observer views his own image on the display 7 together with the border image which is of the required template size. The observer aligns the midpoint between his eyes with the middle line of the graphical guide 16 and then activates the system to capture the template, for instance by pressing a mouse button or a keyboard key. Alternatively, this alignment may be achieved by dragging the graphical guide 16 to the desired place using the mouse 8.

An advantage of such an interactive template capturing technique is that the observer is able to select the template with acceptable alignment accuracy. This involves the recognition of the human face and the selection of the interesting image regions, such as the eyes regions. Whereas human vision renders this process trivial, such template capture would be difficult for a computer, given all possible types of people with different age, sex, eye shape and skin colour under various lighting conditions.

However, such an interactive template capturing method is not convenient for regular users because template capture has to be performed for each use of the system. For non-regular users, such as a visitor, there is another problem in that they have to learn how to cooperate with the system. For example, new users may need to know how to align their faces with the graphical guide. This alignment is seemingly intuitive but has been found awkward for many new users. It is therefore desirable to provide an improved arrangement which increases the ease of use and market acceptability of tracking systems.

In order to avoid repeated template capture for each user, it is possible to store each captured template of the users in a database. When a user uses the system for the first time, the interactive method may be used to capture the template, which is then stored in the database. Subsequent uses by the same user may not require a new template as the database can be searched to find his or her template. Each user may need to provide more than one template to accommodate, for example, changes of lighting and changes of facial features. Thus, although this technique has the advantage of avoiding the need to capture a template for each use of the display, it is only practical if the number of users is very small. Otherwise, the need to build a large database and the associated long searching time would become prohibitive for any commercial implementation. For example, point-of-sale systems with many one-time users would not easily be able to store a database with every user.

It is possible to capture templates automatically using image processing and computer vision techniques. This is essentially a face and/or eye detection problem, which forms part of a more general problem of face recognition. A complete face recognition system should be able to detect faces automatically and identify a person from each face. The task of automatic face detection is different from that of identification, although many methods which are used for identification may also be used for detection and vice versa.

Much of the computer vision research in the field of face recognition has focused so far on the identification task and examples of this are disclosed in R Brunelli and T Poggio, "Face recognition through geometrical feature," Proceedings of the 2.sup.nd European Conference on Computer Vision, pp. 792-800, Genoa, 1992; U.S. Pat. No. 5,164,992A, M Turk and A Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience Vol 3, No 1, pp. 70-86 and A L Yuille, D S Cohen, and P W Hallinam, "Feature extraction from faces using deformable templates," International Journal of Computer Vision, 8(2), pp. 99-111 1992. Many of these examples have shown a clear need for automatic face detection but the problem and solution tend to be ignored or have not been well described. These known techniques either assume that a face is already detected and that its position is known in an image or limit the applications to situations where the face and the background can be easily separated. Few known techniques for face detection achieve a reliable detection rate without restrictive constraints and long computing time.

DE 19634768 discloses a method of detecting a face in a video picture. The method compares an input image with a pre-stored background image to produce a binary mask which can be used to locate the head region, which is further analysed with regard to the possibility of the presence of a face. This method requires a controlled background which does not change. However, it is not unusual for people to move around in the background while one user is watching an autostereoscopic display.

G Yang and T S Huang, "Human face detection in complex backgrounds", Pattern Recognitition, Vol. 27, No. 1, pp. 53-63, 1994 disclose a method of locating human faces in an uncontrolled background using a hierarchical knowledge-based technique. The method comprises three levels. The higher two levels are based on mosaic images at different resolutions. In the lowest level, an edge detection method is proposed. The system can locate unknown human faces spanning a fairly wide range of sizes in a black-and-white picture. Experimental results have been reported using a set of 40 pictures as the training set and a set of 60 pictures as the test set. Each picture has 512.times.512 pixels and allows for face sizes ranging from 48.times.60 to 200.times.250 pixels. The system has achieved a detection rate of 83% i.e. 50 out of 60. In addition to correctly located faces, false faces were detected in 28 pictures of the test set. While this detection rate is relatively low, a bigger problem is the computing time of 60 to 120 seconds for processing each image.

U.S. Pat. No. 5,012,522 discloses a system which is capable of locating human faces in video scenes with random content within two minutes and of recognising the faces which it locates. When an optional motion detection feature is included, the location and recognition events occur in less than 1 minute. The system is based on an earlier autonomous face recognition machine (AFRM) disclosed in E J Smith, "Development of autonomous face recognition machine", Master thesis, Doc.# AD-A178852, Air Force Institute of Technology, December 1986, with improved speed and detection score. The AFRM was developed from an earlier face recognition machine by including an automatic "face finder", which was developed using Cortical Thought Theory (CTT). CTT involves the use of an algorithm which calculates the "gestalt" of a given pattern. According to the theory, the gestalt represents the essence or "single characterisation" uniquely assigned by the human brain to an entity such as a two-dimensional image. The face finder works by searching an image for certain facial characteristics or "signatures". The facial signatures are present in most facial images and are rarely present when no face is present.

The most important facial signature in the AFRM is the eye signature, which is generated by extracting columns from an image and by plotting the results of gestalt calculated for each column. First an 8 pixel (vertical) by 192 pixel (horizontal) window is extracted from a 128 by 192 pixel image area. The 8 by 192 pixel window is then placed at the top of a new 64 by 192 pixel image. The remaining rows of the 64 by 192 pixel image are filled in with a background grey level intensity, for instance 12 out of the total of 16 grey levels where 0 represents black. The resulting image is then transformed into the eye signature by calculating the gestalt point for each of the 192 vertical columns in the image. This results in a 192-element vector of gestalt points. If an eye region exists, this vector shows a pattern that is characterised by two central peaks corresponding to the eye centres and a central minimum between the two peaks together with two outer minima on either side. If such a signature is found, an eye region may exist. A similar technique is then applied to produce a nose/mouth signature to verify the existence of the face. The AFRM achieved a 94% success rate for the face finder algorithm using a small image database containing 139 images (about 4 to 5 different pictures per subject). A disadvantage of such a system is that there are too many objects in an image which can display a similar pattern. It is not, therefore, a very reliable face locator. Further, the calculation of the gestalts is very computing intensive so that it is difficult to achieve real time implementation.

EP 0 751 473 discloses a technique for locating candidate face regions by filtering, convolution and thresholding. A subsequent analysis checks whether candidate face features, particularly the eyes and the mouth, have certain characteristics.

U.S. Pat. No. 5,715,325 discloses a technique involving reduced resolution images. A location step compares an image with a background image to define candidate face regions. Subsequent analysis is based on a three level brightness image and is performed by comparing each candidate region with a stored template.

U.S. Pat. No. 5,629,752 discloses a technique in which analysis is based on locating body contours in an image and checking for symmetry and other characteristics of such contours. This technique also checks for horizontally symmetrical eye regions by detecting horizontally symmetrical dark ellipses whose major axes are oriented symmetrically.

Sako et al, Proceedings of 12 IAPR International Conference on Pattern Recognition, Jerusalem 6-13 October 1994, Vol. 11, pp. 320-324, "Real Time Facial Feature Tracking Based on Matching Techniques and its Applications" discloses various analysis techniques including detection of eye regions by comparison with a stored template.

Chen et al, IEEE (0-8186-7042-8) pp. 591-596, 1995, "Face Dtection by Fuzzy Pattern Matching" performs candidate face location by fuzzy matching to a "face model". Candidates are analysed by checking whether eye/eyebrow and nose/mouth regions are present on the basis of an undefined "model".

BRIEF SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided a method of detecting a human face in an image, comprising locating in the image a candidate face region and analysing the candidate face region for a first characteristic indicative of a facial feature, characterised in that the first characteristic comprises a substantially symmetrical horizontal brightness profile comprising a maximum disposed between first and second minima and in that the analysing step comprises forming a vertical integral projection of a portion of the candidate face region and determining whether the vertical integral projection has first and second minima disposed substantially symmetrically about a maximum.

The locating and analysing steps may be repeated for each image of a sequence of images, such as consecutive fields or frames from a video camera.

The or each image may be a colour image and the analysing step may be performed on a colour component of the colour image.

The analysing step may determine whether the vertical integral projection has first and second minima whose horizontal separation is within a predetermined range.

The analysing step may determine whether the vertical integral projection has a maximum and first and second minima such that the ratio of the difference between the maximum and the smaller of the first and second minima to the maximum is greater than a first threshold.

The vertical integral projection may be formed for a plurality of portions of the face candidate and the portion having the highest ratio may be selected as a potential target image.

The analysing step may comprise forming a measure of the symmetry of the portion.

The symmetry measure may be formed as: ##EQU1##

Where V (x) is the value of the vertical integral projection at horizontal position x and x.sub.0 is the horizontal position of the middle of the vertical integral projection.

The vertical integral projection may be formed for a plurality of portions of the face candidate and the portion having the highest symmetry measure may be selected as a potential target image.

The analysing step may comprise dividing a portion of the candidate face region into left and right halves, forming a horizontal integral projection of each of the halves, and comparing a measure of horizontal symmetry of the left and right horizontal integral projections with a second threshold.

The analysing step may determine whether the candidate face region has first and second brightness minima disposed at substantially the same height with a horizontal separation within a predetermined range.

The analysing step may determine whether the candidate face region has a vertically extending region of higher brightness than and disposed between the first and second brightness minima.

The analysing step may determine whether the candidate face region has a horizontally extending region disposed below and of lower brightness than the vertically extending region.

The analysing step may comprise locating, in the candidate face region, candidate eye pupil regions where a green image component is greater than a red image component or where a blue image component is greater than a green image component. Locating the candidate eye pupil regions may be restricted to candidate eye regions of the candidate face region. The analysing step may form a function E(x,y) for picture elements (x,y) in the candidate eye regions such that: ##EQU2##

where R, G and B are red, green and blue image components, C.sub.1 and C.sub.2 are constants, E(x,y)=1 represents a picture element inside the candidate eye pupil regions and E(x,y)=0 represents a picture element outside the candidate eye pupil regions. The analysing step may detect the centres of the eye pupils as the centroids of the candidate eye pupil regions.

The analysing step may comprise locating a candidate mouth region in a sub-region of the candidate face region which is horizontally between the candidate eye pupil regions and vertically below the level of the candidate eye pupil regions by between substantially half and substantially one and half times the distance between the candidate eye pupil regions. The analysing step may form a function M(x,y) for picture elements (x,y) within the sub-regions such that: ##EQU3##

where R, G and B are red, green and blue image components, Ti is a constant, M(x,y)=1 represents a picture element inside the candidate mouth region and M(x,y)=0 represents a picture element outside the candidate mouth region. Vertical and horizontal projection profiles of the function M(x,y) may be formed and a candidate lip region may be defined in a rectangular sub-region where the vertical and horizontal projection profiles exceed first and second predetermined thresholds, respectfully. The first and second predetermined thresholds may be proportional to maxima of the vertical and horizontal projection profiles, respectively.

The analysing step may check whether the aspect ratio of the candidate lip region is between first and second predefined thresholds.

The analysing step may check whether the ratio of the vertical distance from the candidate eye pupil regions to the top of the candidate lip region to the spacing between the candidate eye pupil regions is between first and second preset thresholds.

The analysing step may comprise dividing a portion of the candidate face region into left and right halves and comparing the angles of the brightness gradients of horizontally symmetrically disposed pairs of points for symmetry.

The locating and analysing steps may be stopped when the first characteristic is found r times in R consecutive images of the sequence.

The locating step may comprise searching the image for a candidate face region having a second characteristic indicative of a human face.

The second characteristic may be uniform saturation.

The searching step may comprise reducing the resolution of the image by averaging the saturation to form a reduced resolution image and searching for a region of the reduced resolution image having, in a predetermined shape, a substantially uniform saturation which is substantially different from the saturation of the portion of the reduced resolution image surrounding the predetermined shape.

The image may comprise a plurality of picture elements and the resolution may be reduced so that the predetermined shape is from two to three reduced resolution picture elements across.

The image may comprise a rectangular array of M by N picture elements, the reduced resolution image may comprise (M/m) by (N/n) picture elements, each of which corresponds to m by n picture elements of the image, and the saturation of each picture element of the reduced resolution image may be given by: ##EQU4##

where f (i,j) is the saturation of the picture element of the ith column and the jth row of the m by n picture elements.

The method may comprise storing the saturations in a store.

A uniformity value may be ascribed to each of the reduced resolution picture elements by comparing the saturation of each of the reduced resolution picture elements with the saturation of at least one adjacent reduced resolution picture element.

Each uniformity value may be ascribed a first value if

where max(P) and min(P) are the maximum and minimum values, respectively, of the saturations of the reduced resolution picture element and the or each adjacent picture element and T is a threshold, and a second value different from the first value otherwise.

T maybe substantially equal to 0.15.

The or each adjacent reduced resolution picture element may not have been ascribed a uniformity value and each uniformity value may be stored in the store in place of the corresponding saturation.

The resolution may be reduced such that the predetermined shape is two or three reduced resolution picture elements across and the method may further comprise indicating detection of a candidate face region when a uniformity value of the first value is ascribed to any of one reduced resolution picture element, two vertically or horizontally adjacent reduced resolution picture elements and a rectangular two-by-two array of picture elements and when a uniformity value of the second value is ascribed to each surrounding reduced resolution picture element.

Detection may be indicated by storing a third value different from the first and second values in the store in place of the corresponding uniformity value.

The method may comprise repeating the resolution reduction and searching at least once with the reduced resolution picture elements shifted with respect to the image picture elements.

The saturation may be derived from red, green and blue components as

where max(R,G,B) and min(R,G,B) are the maximum and minimum values, respectively, of the red, green and blue components.

A first image may be captured while illuminating an expected range of positions of a face, a second image may be captured using ambient light, and the second image may be subtracted from the first image to form the image.

According to a second aspect of the invention, there is provided an apparatus for detecting a human face in an image, comprising means for locating in the image a candidate face region and means for analysing the candidate face region for a first characteristic indicative of a facial feature.

According to a third aspect of the invention, there is provided an observer tracking display including an apparatus according to the second aspect of the invention.

It is thus possible to provide a method of and an apparatus for automatically detecting a human face in, for example, an incoming video image stream or sequence. This may be used, for example, to replace the interactive method of capturing a template as described hereinbefore and as disclosed in GB 2 324 428 and EP 0 877 274, for instance in an initialisation stage of an observer video tracking system associated with a tracked autostereoscopic display. The use of such techniques for automatic target image capture increases the ease of use of a video tracking system and an associated autostereoscopic display and consequently increases the commercial prospects for such systems.

By using a two-stage approach in the form of a face locator and a face analyser, the face locator enables the more computing intensive face analysis to be limited to a number of face candidates. Such an arrangement is capable of detecting a face in a sequence of video images in real time, for instance at a speed of between 5 and 30 Hz, depending on the complexity of the image content. When used in an observer tracking autostereoscopic display, the face detection may be terminated automatically after a face is detected consistently over a number of consecutive images. The whole process may take no more than a couple of seconds and the initialisation need only be performed once at the beginning of each use of the system.

The face locator increases the reliability of the face analysis because the analysis need only be performed on the or each candidate face region located in the or each image. Although a non-face candidate region may contain image data similar to that which might be indicative of facial features, the face locator limits the analysis based on such characteristics to the potential face candidates. Further, the analysis helps to remove false face candidates found by the locator and is capable of giving more precise position data of a face and facial features thereof, such as the mid point between the eyes of an observer so that a target image of the eye region may be obtained.

By separating the functions of location and analysis, each function or step may use simpler and more efficient methods which can be implemented commercially without excessively demanding computing power and cost. For instance, locating potential face candidates using skin colour can accommodate reasonable lighting changes. This technique is capable of accommodating a relatively wide range of lighting conditions and is able to cope with people of different age, sex and skin colour. It may even be capable of coping with the wearing of glasses of light colours.

These techniques may use any of a number of modules in terms of computer implementation. Each of these modules may be replaced or modified to suit various requirements. This increases the flexibility of the system, which may therefore have a relatively wide range of applications, such as security surveillance, video and image compression, video conferencing, computer games, driver monitoring, graphical user interfaces, face recognition and personal identification.

The invention will be further described, by way of example, with reference to the accompanying drawings, in which;

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block schematic diagram of a known type of observer tracking autostereoscopic display;

FIG. 2 is a block schematic diagram of an observer tracking display to which the present invention may be applied;

FIG. 3 is a flow diagram illustrating observer tracking in the display of FIG. 2;

FIG. 4 illustrates a typical target image or template which is captured by the method illustrated in FIG. 3;

FIG. 5 illustrates the appearance of a display during template capture by the display of FIG. 2;

FIG. 6 is a flow diagram illustrating a method of detecting a human face constituting an embodiment of the invention;

FIG. 7 is a flow diagram illustrating a face location part of the method illustrated in FIG. 6;

FIG. 8 is a diagram illustrating a hue-saturation-value (HSV) colour scheme;

FIG. 9 is a diagram illustrating image resolution reduction by averaging in the method illustrated in FIG. 7;

FIG. 10 is a diagram illustrating calculation of uniformity values in the method illustrated in FIG. 7;

FIG. 11 is a diagram illustrating patterns used in a face-candidate selection in the method illustrated in FIG. 7;

FIG. 12 is a diagram illustrating the effect of different positions of a face on the method illustrated on FIG. 7;

FIG. 13 is a diagram illustrating a modification to the method illustrated in FIG. 7 for accommodating different face positions;

FIG. 14 is a flow diagram illustrating in more detail the face analysis stage of the method illustrated in FIG. 6;

FIG. 15 is a flow diagram illustrating in more detail a facial feature extraction step of the method illustrated in FIG. 14;

FIG. 16 illustrates an image portion of an eye region and a corresponding vertical integral projection;

FIG. 17 illustrates a technique for searching for an eye signature;

FIG. 18 is a flow diagram illustrating a further facial characteristic extraction technique forming part of the method illustrated in FIG. 14;

FIG. 19 illustrates vertical integral projections of too coarse a step size;

FIG. 20 illustrates the use of horizontal integral projection profiles for eliminating false face candidates;

FIG. 21 illustrates detection of a pair of eyes represented as a pair of brightness minima;

FIG. 22 illustrates a nose detection technique;

FIG. 23 is a flow diagram illustrating in more detail a modified facial feature extraction step of the method illustrated in FIG. 14;

FIG. 24 illustrates eye pupil and mouth regions with vertical and horizontal integral projections of the mouth region;

FIG. 25 illustrates a technique based on analysing facial symmetry;

FIG. 26 is a flow diagram illustrating a technique for terminating the method illustrated in FIG. 14;

FIG. 27 is a block schematic diagram of an observer tracking display to which the present invention is applied; and

FIG. 28 is a system block diagram of a video tracking system of the display of FIG. 13 for performing the method of the invention.

Like reference numerals refer to like parts throughout the drawings.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 6 illustrates in flow diagram form a method of automatically detecting and locating a human face in a pixelated colour image from a video image sequence. The video image sequence may be supplied in real time, for instance by a video camera of the type described hereinbefore with reference to FIG. 2. The method is capable of operating in real time as the initialisation stage 9 shown in FIG. 3 and supplies a target image or template to the tracking stage 10 shown in FIG. 3.

In a step S1, the latest digital image in the red, green and blue (RGB) format is obtained. For instance, this step may comprise storing the latest field of video data from the video camera in a field store. In a step S2, the image is searched in order to locate regions constituting face candidates. A step S3 determines whether any face candidates have been found. If not, the step S1 is performed and the steps S2 and S3 are repeated until at least one face candidate is found in the latest image. The steps S2 and S3 therefore constitute a face locator 17 which will be described in more detail hereinafter.

The or each face candidate is then supplied to a face analyser 18 which analyses the face candidates to determine the presence of one or more characteristics indicative of facial features. A step S4 receives the portions of the image, one-by-one, corresponding to the face candidates located by the face locator 17. The step S4 analyses each face candidate and, if it determines that the candidate contains characteristics indicative of a facial feature, extracts a target image or template in the form of the eye region illustrated at 11 in FIG. 4 from the latest image supplied from the step S1. A step S5 determines whether all of the face candidates have been tested and the step S4 is repeated until all the candidates have been tested. A step S6 determines whether any templates have been obtained. If not, control passes to the step S1 and the procedure is repeated for the next colour image. If any template has been obtained, the or each such template is output at a step S7.

The face locator 17 may be of any suitable type and a manual technique for face location is described hereinafter. However, a suitable automatic technique is disclosed in GB 2 333 590 and EP 0 932 114 and this is described in detail with reference to FIGS. 7 to 13.

In a step S21, the video image is converted from the RGB format to the HSV (hue-saturation-value) format so as to obtain the saturation of each pixel. In practice, it is sufficient to obtain the S component only in the step S21.

The RGB format is a hardware-oriented colour scheme resulting from the way in which camera sensors and display phosphors work. The HSV format is closely related to the concepts of tint, shade and tone. In the HSV format, hue represents colour (for instance, the distinction between red and yellow), saturation represents the amount of colour that is present (for instance, the distinction between red and pink), and value represents the amount of light (for instance, the distinction between dark red and light red or between dark grey and light grey). The "space" in which these values may be plotted can be shown as a circular or hexagonal cone or double cone, for instance as illustrated in FIG. 8, in which the axis of the cone is the grey scale progression from black to white, distance from the axis represents saturation and the direction or angle about the axis represents the hue.

The colour of human skin is created by a combination of blood (red) and melanin (yellow, brown). Skin colours lie between these two extreme hues and are somewhat saturated but are not extremely saturated. The saturation component of the human face is relatively uniform.

Several techniques exist for converting video image data from the RGB format to the HSV format. Any technique which extracts the saturation component may be used. For instance, the conversion may be performed in accordance with the following expression for the saturation component S: ##EQU5##

Following the conversion step S21, the spatial image resolution of the saturation component is reduced by averaging in a step S22. As described hereinbefore with reference to FIG. 2, the approximate distance of the face of an observer from the display is known so that the approximate size of a face in each video image is known. The resolution is reduced such that the face of an adult observer occupies about two to three pixels in each dimension as indicated in FIG. 7. A technique for achieving this will be described in more detail hereinafter.

A step S23 detects, in the reduced resolution image from the step S22, regions or "blobs" of uniform saturation of predetermined size and shape surrounded by a region of reduced resolution pixels having a different saturation. A technique for achieving this is also described in more detail hereinafter. A step S24 detects whether a face candidate or face-like region has been found. If not, the steps S1 to S24 are repeated. When the step S24 confirms that at least one candidate has been found, the position of the or each uniform blob detected in the step S23 is output at a step S25.

FIG. 9 illustrates the image resolution reduction step S22 in more detail. 30 illustrates the pixel structure of an image supplied to the step S1. The spatial resolution is illustrated as a regular rectangular array of M.times.N square or rectangular pixels. The spatial resolution is reduced by averaging to give an array of (M/m).times.(N/N) pixels as illustrated at 31. The array of pixels 30 is effectively divided up into "windows" or rectangular blocks of pixels 32, each comprising mxn pixels of the structure 30. The S values of the pixels are indicated in FIG. 9 as f(i,j), for 0.ltoreq.i<m and 0.ltoreq.j<n. The average saturation value P of the window is calculated as: ##EQU6##

In the embodiment illustrated in the drawings, the reduction in spatial resolution is such that an adult observer face occupies about two to three of the reduced resolution pixels in each dimension.

The step S23 comprises assigning a uniformity status or value U to each reduced resolution pixel and then detecting patterns of uniformity values representing face-like regions. The uniformity value is 1 or 0 depending on the saturations of the pixel and its neighbours. FIG. 10 illustrates at 35 a pixel having an averaged saturation value P.sub.0 and the averaged saturation values P.sub.1, P.sub.2 and P.sub.3 of the three neighbouring pixels. The assignment of uniformity values begins at the top left pixel 37 and proceeds from left to right until the penultimate pixel 38 of the top row has been assigned its uniformity value. This process is then repeated for each row in turn from top to bottom ending at the penultimate row. By "scanning" the pixels in this way and using neighbouring pixels to the right and below the pixel whose uniformity value has been calculated, it is possible to replace the average saturation values P with the uniformity values U by overwriting so that memory capacity can be used efficiently and it is not necessary to provide further memory capacity for the uniformity values.

The uniformity value U is calculated as: ##EQU7##

where T is a predetermined threshold, for instance having a typical value of 0.15, fmax is the maximum of P.sub.0, P.sub.1, P.sub.2 and P.sub.3 and fmin is the minimum of P.sub.0, P.sub.1, P.sub.2 and P.sub.3.

When the ascribing of the uniformity values has been completed, the array 36 contains a pattern of 0s and 1s representing the uniformity of saturation of the reduced resolution pixels. The step S23 then looks for specific patterns of 0s and 1s in order to detect face-like regions. FIG. 11 illustrates an example of four patterns of uniformity values and the corresponding pixel saturation patterns which are like the face candidates in the video images. FIG. 11 shows at 40 a uniform blob in which dark regions represent averaged saturation values of sufficient uniformity to indicate a face-like region. The surrounding light regions or squares represent a region surrounding the uniform saturation pixels and having substantially different saturations. The corresponding pattern of uniformity values is illustrated at 41 and compresses a pixel location with the uniformity value 1 completely surrounded by pixel locations with the uniformity value 0.

Similarly, FIG. 11 shows at 42 another face-like region and at 43 the corresponding pattern of uniformity values. In this case, two horizontally adjacent pixel locations have the uniformity value 1 and are completely surrounded by pixel locations having the uniformity value 0. FIG. 11 illustrates at 44 a third pattern whose uniformity values are as shown at 45 and are such that two vertically adjacent pixel locations have the uniformity value 1 and are surrounded by pixel locations with the uniformity value 0.

The fourth pattern shown at 46 in FIG. 11 has a square block of four (two-by-two) pixel locations having the uniformity value 1 completely surrounded by pixel locations having the uniformity value 0. Thus, whenever any of the uniformity value patterns illustrated at 41, 43, 45 and 47 in FIG. 11 occurs, the step S23 indicates that a face-like region or candidate has been found. Searching for these patterns can be performed efficiently. For instance, the uniformity values of the pixel locations are checked in turn, for instance left to right in each row and top to bottom of the field. Whenever a uniformity of value of 1 is detected, the neighbouring pixel locations to the right and below the current pixel location are inspected. If at least one of these uniformity values is also 1 and the region is surrounded by uniformity values of 0, then a pattern corresponding to a potential face candidate is found. The corresponding pixel locations may then be marked, for instance by rep,lacing their uniformity values with a value other than 1 or 0, for example a value of 2. Unless no potential face candidate has been found, the positions of the candidates are output.

The appearance of the patterns 40, 42, 44 and 46 may be affected by the actual position of the face-like region in relation to the structure of the reduced resolution pixels 36. FIG. 12 illustrates an example of this for a face-like region having a size of two-by-two reduced resolution pixels as shown at 49. If the face-like region indicated by a circle 50 is approximately centred at a two-by-two block, the pattern 47 of uniformity values will be obtained and detection will be correct. However, if the face were shifted by the extent of half a pixel in both the horizontal and vertical direction as illustrated at 51, the centre part of the face-like region may have a uniformity value which is different from the surrounding region. This may result in failure to detect a genuine candidate.

In order to avoid this possible problem, the steps S21 to S24 may be repeated for the same video field or for one or more succeeding video fields of image data. However, each time the steps S21 to S24 are repeated, the position of the array 31 of reduced resolution pixels is changed with respect to the array 30 of the colour image pixels. This is illustrated in FIG. 13 where the whole image is illustrated at 52 and the region used for spatial resolution reduction by image averaging is indicated at 53. The averaging is performed in the same way as illustrated in FIG. 9 but the starting position is changed. In particular, whereas the starting position for the first pixel in FIG. 8 is at the top left comer 54 of the whole image 52, FIG. 13 illustrates a subsequent averaging where the starting position is shifted from the top left comer by an amount Sx to the right in the horizontal direction and Sy downwardly in the vertical direction, where:

Each image may be repeatedly processed such that all combinations of the values of Sx and Sy are used so that mxn processes must be performed. However, in practice, it is not necessary to use all of the starting positions, particularly in applications where the detection of face-like regions does not have to be very accurate. In the present example where the face-like region detection forms the first step of a two step process, the values of Sx and Sy may be selected from a more sparse set of combinations such as:

Where i,j,p and q are integers satisfying the following relationships: 0.ltoreq.i<p 0.ltoreq.j<q 1.ltoreq.p<m 1.ltoreq.q<n

This results in a total of pxq combinations.

As mentioned hereinbefore, the steps S21 to S24 may be repeated with the different starting pos


Free Web Sudoku Puzzles.
Solve with your browser.
4 3 2 9 8        
  8     3   6    
    5            
            2   8
7     5 1 4     6
5   3            
            5    
    7   5     2  
        6 9 8 3 7
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!