Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Save hundreds on your next Dell purchase Amazing secrets reveale...
Category:
Computers  

Taking Supplements
Category:
Health / Fitness  

Six Rules for Penis Enlargement Beginners
Category:
Health / Fitness  

Brief idea about piles
Category:
Health / Fitness  

How to get more customers just by knowing their name
Category:
Marketing  

Why Choose a Core 2 Duo For Your CPU In Your New High End Comput...
Category:
Computers  

What Have We Learned From Antiaging Research
Category:
Health / Fitness  

Chronic Fatigue Syndrome What Causes This Mysterious Illness
Category:
Health / Fitness  

Benefits of Hypnotherapy
Category:
Business  

Choose the Right Oil to Fight Thyroid Disease
Category:
Health / Fitness  

Buying a Refurbished Laptop Computer
Category:
Computers  

Stay At Home And Lose Weight
Category:
Health / Fitness  

Indoor Air Pollution and Your Health
Category:
Health / Fitness  

How To Start An Ad Agency From Home With No Money Down
Category:
Business  

Don t Under Estimate Your Savings Account
Category:
Business  

Exclusive Solution to A Meaningless Christmas Season
Category:
Home And Family  

The Key to Anti Aging Health
Category:
Health / Fitness  

How to Profit from Other People Articles
Category:
Marketing  

Forty Million Americans Drinking Lead Contaminated Water
Category:
Health / Fitness  

Ideas for Adding some Variety to your Vegetarian Lifestyle
Category:
Health / Fitness  

Carnival Fantasy Great for a quick getaway
Category:
Travel  

How to Make Money through PPC
Category:
Business  

The Effects of Methamphetamine as an Addictive Substance
Category:
Health / Fitness  

The Right Weight Loss Program
Category:
Health / Fitness  

Weight Loss Success Strategies
Category:
Health / Fitness  

Things You Should Know About Urine Infection
Category:
Health / Fitness  

Steps that can be taken to reduce unnatural hair loss in Women
Category:
Health / Fitness  

Secured Loans for Every Need
Category:
Finance / Investment  

Funding A Business With A Bad Credit History
Category:
Business  

What Lies Beneath
Category:
Finance / Investment  

Link Exchange Services reciprocal link checker
Category:
Marketing  

8 Steps to Irresistible Email Copy Every Time article 1 9
Category:
Marketing  

Increase Your Sales with Hypnotic Double Meanings
Category:
Business  

How to Generate Massive Income from Affiliates for Your Niche We...
Category:
Marketing  

Window Signs Making Tips
Category:
Business  

Vinyl Banner as a Powerful Marketing Tool
Category:
Business  

What Only One Day a Year for Mom
Category:
Home And Family  

How memory foam mattresses can add comfort to my life
Category:
Business  

Grow Into An Affiliate MLM Network
Category:
Marketing  

Why Choose Harley Davidson Motorcycle Parts
Category:
Business  

Diet Coke Mentos Phenomenon Part 1
Category:
Business  

Downloading Online Movies
Category:
Entertainment / Television  

Buying Beds Fast at the Best Prices
Category:
Home And Family  

How To Choose A Credit Card To Meet Your Needs
Category:
Business  

You Can Become A Super Affiliate Marketer
Category:
Marketing  

Gold Pocket Watch The Unique and Elegant Gift
Category:
Home And Family  

Silicone Awareness Bracelets Are More Than Just A Fashion Statem...
Category:
Home And Family  

New Spyware Threat Costs People Big Money
Category:
Computers  

Planning For the Best Results No Matter What Curves Life May Bri...
Category:
Real Estate  

Timeshare Resorts
Category:
Travel  

Puerto Rico A Vacationer s Paradise
Category:
Travel  

6 POWERFUL VRE Business Models You Can Start Building In 2006 Us...
Category:
Marketing  

Collection Of Cricket Equipment
Category:
Sports  

New hope for IBS sufferers
Category:
Health / Fitness  

What to Look for in Bill Consolidation Programs
Category:
Business  

Winning Ideas for Trade Show Display Success
Category:
Business  

Home Theater Buyers Guide
Category:
Home And Family  

Quick Weight Loss Fact Or Fiction
Category:
Health / Fitness  

Choosing an ID Card System
Category:
Business  

Priceless advice to enjoy luxury without high price whilst waiti...
Category:
Travel  

An Introduction to Antique Furniture Part Two
Category:
Business  

Practical Ways Of Dealing With Credit Card Bad Debt
Category:
Business  

Lower Back Pain Treatment
Category:
Health / Fitness  

Career opportunities in Mobile data services
Category:
Business  

Review of the Epson Stylus Pro 9800 Its First Year in Retrospect...
Category:
Computers  

Gastric Bypass Reduces Hunger in Some Surprising Ways
Category:
Health / Fitness  

5 Quick Steps to a Better Credit Score
Category:
Finance / Investment  

Business resource Business related podcasts
Category:
Business  

Reading Credit Reports
Category:
Finance / Investment  

OVER 1 Million Brits expected to cancel their holidays
Category:
Travel  

Making money with only affiliate marketing
Category:
Business  

Not Enough Doctors
Category:
Health / Fitness  

Top Tips For Finding Your Christmas Tree
Category:
Home And Family  

Surety Bond Benefits
Category:
Finance / Investment  

Save Big Bucks On Motorcycle Insurance
Category:
Business

Streaming video bookmarks Number:7,143,353 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Streaming video bookmarks

Abstract: A method, apparatus and systems for bookmarking an area of interest of stored video content is provided. As a viewer is watching a video and finds an area of interest, they can bookmark the particular segment of the video and then return to that segment with relative simplicity. This can be accomplished by pressing a button, clicking with a mouse or otherwise sending a signal to a device for marking a particular location of the video that is of interest. Frame identifiers can also be used to select a desired video from an index and to then retrieve the video from a medium containing multiple videos.

Patent Number: 7,143,353 Issued on 11/28/2006 to McGee,   et al.


Inventors: McGee; Thomas (Garrison, NY), Agnihotri; Lalitha (Fishkill, NY), Dimitrova; Nevenka (Yorktown Hights, NY), Jasinschi; Radu (Ossining, NY)
Assignee: Koninklijke Philips Electronics, N.V. (Eindhoven, NL)
Appl. No.: 09/822,447
Filed: March 30, 2001


Current U.S. Class: 715/723 ; 715/512; 715/719; 715/720; 715/721; 725/41
Current International Class: G06F 3/00 (20060101); H04N 5/91 (20060101)
Field of Search: 345/716,719-721,723,733,738,764,864 715/500.1,512,716,719-721,723,733,738,764,864 725/37,41,46


References Cited [Referenced By]

U.S. Patent Documents
5414644 May 1995 Seaman et al.
5414808 May 1995 Williams
5467288 November 1995 Fasciano et al.
5521841 May 1996 Arman et al.
5524193 June 1996 Covington et al.
5535063 July 1996 Lamming
5708845 January 1998 Wistendahl et al.
5973723 October 1999 DeLuca
6119154 September 2000 Weaver et al.
6125229 September 2000 Dimitrova et al.
6137544 October 2000 Dimitrova et al.
6173317 January 2001 Chaddha et al.
6212328 April 2001 Nakamura
6219837 April 2001 Yeo et al.
6285361 September 2001 Brewer et al.
6289346 September 2001 Milewski et al.
6295055 September 2001 Miller et al.
6321024 November 2001 Fujita et al.
6332144 December 2001 deVries et al.
6332147 December 2001 Moran et al.
6360234 March 2002 Jain et al.
6400378 June 2002 Snook
6400890 June 2002 Nagasaka et al.
6404978 June 2002 Abe
6429879 August 2002 Sturgeon et al.
6449653 September 2002 Klemets et al.
6452615 September 2002 Chiu et al.
6529920 March 2003 Arons et al.
6551357 April 2003 Madduri
6580437 June 2003 Liou et al.
6661430 December 2003 Brewer et al.
6681398 January 2004 Verna
6721361 April 2004 Covell et al.
6754389 June 2004 Dimitrova et al.
6882793 April 2005 Fu et al.
2001/0021268 September 2001 Jun et al.
2001/0044808 November 2001 Milewski et al.
2002/0007485 January 2002 Rodriguez et al.
2002/0028026 March 2002 Chen et al.
2002/0069218 June 2002 Sull et al.
2002/0078446 June 2002 Dakss et al.
2002/0092019 July 2002 Marcus
2003/0067554 April 2003 Klarfeld et al.
2003/0177503 September 2003 Sull et al.
Foreign Patent Documents
0820180 Jan., 1998 EP
1220479 Jul., 2002 EP
9962008 Dec., 1999 WO
0058967 Oct., 2000 WO

Other References

Jennifer J. Salopek, "Coolness is a state of mind", Nov. 1998, Training and Development, v. 52, No. 11, pp. 22-6+. cited by examiner .
"Exciting Events, Broad-based Seminars, Top Exhibitors and New Strategic Alliances Make DVD'98 Fall Conference & Exhibition a Must for the Multimedia Communications Industry", Sep. 1998, PRNewswire. cited by exam- iner .
Cynthia Morgan, "Digital lava erupts", Jul. 1999, ComputerWorld v.33 n.28 pp. 64. cited by examiner .
Jennifer J. Salopek, "Crank Up Your Coolness Quotient", (part 1 and 2), Nov. 1998, Training & Development, v. 52, n. 11, pp. 21-34. cited by exam- iner .
"I-Recall . . . Releases Souvenir And Souvenir Publisher, Streaming Media Bookmarking Software with Web Publishing And Email Capabilities", May 2000, PR Newswire. cited by examiner .
"I-Recall introduces Souvenir and Souvenir Publisher", www.i-recall.com, before Mar. 30, 2001. cited by examiner .
"TiVo", 1999, USByte Resources, pp. 1-3. cited by examiner .
Chuck Fuller, "Deploying Video on the Web", 1999, NewArchitect.com, pp. 1-8. cited by examiner .
Larry Bouthillier, "Delivering Streaming Media", 1999, NewArchitect.com, pp. 1-12. cited by examiner .
Li-Qun Xu et al, "Video summarization and semantics editing tools", Jan. 24, 2001,SPIE, Proceedings of SPIE-The International Society for Optical Engineering, v.4315, pp. 242-252. cited by examiner .
Steven Sweet, "Multimedia Files Organizing the Easy Way", Mar./Apr. 1994, Computer Pictures, ps-15(3). cited by examiner .
Will Tait, "Asymetrix Multimedia Toolbook 3.0", Jun. 1994, AV Video, v16, n6, ps9(1). cited by examiner .
John M. Gauch, et al, "Real time video scene detection and classification", May 1999, Information Processing and Management, vol. 35, No. 3, pp. 381-400. cited by examiner .
Tanaka Satoshi et al, "Development of Video Cataloging and Browsing Technology featuring Scene Structure Analysis", 199 Institute of Electronics, Information and Communication Engineers, vol. 99, No. 179, pp. 53-58. cited by examiner .
Junghwan Oh, "Efficient techniques for management and delivery of video data", Fall 2000, vol. 61/10-B of Dissertation Abstracts International, p. 5412 (117 pages). cited by examiner .
"Probabilistic Reasoning In Intelligent Systems: Networks of Plausible Inference", by Judea Pearl, pp. 116-131. cited by other .
"Parsing TV Programs for Identification and Removal of Non-Story Segments", by T. McGee et al., pp. 243-251. cited by other .
"PNRS--Personal News Retrieval System", by N. Dimitrova et al., SPIE vol. 3846, Sep. 1999. cited by other .
"Test, Speech, and Vision for Video Segmentation: The Informedia TM Project", by A. Hauptmann et al. cited by other .
"Video Keyframe Extraction and Filtering: A Keyframe is not a Keyframe to Everyone", by Dimitrova et al, pp. 113-120. cited by other .
"Face Detection for Image Annotation" by G. Wei et al. cited by other .
"Text Detection for Video Analysis", by L. Agnihotri et al, pp. 109-113. cited by other .
"Motion Segmentation and Qualitative Dynamic Scene Analysis from an Image Sequence", by P. Bouthemy et al. cited by other .
"Uniqueness and Estimation of Three-Dimensional Motion Parameters of Rigid Objects with Curved Surfaces", by Roger Tsai et al. cited by other .
"Content-Based Classification, Search, and Retrieval of Audio", by E. Wold et al. cited by other .
"Automatic Transcription of English Broadcast News", by P. Beyerlein et al. cited by other .
"Audio Databases with Content-Based Retrieval", by T. Blum et al. cited by other .
"Video Classification using Speaker Identification" by N. Patel et al. cit- ed by other .
"Towards Music Understanding without Separation: Segmenting Music with Correlogram Comodulation", by E. Scheirer. cited by other.

Primary Examiner: Bautista; Lucila X.

Claims



What is claimed is:

1. A method of making segments of a video, comprising the steps of: providing video content that includes a series of video frames and reviewing the video; sending a bookmark signal during review of the video to select a selected frame being displayed when the bookmark signal is sent; identifying a characteristic feature of the selected frame and recording information corresponding to said selected frame on a storage medium; and in response to said sending the bookmark signal, processing the video to define a segment of the video containing the selected frame by identifying segment starting point and segment endpoint boundaries, and storing at least a portion of the segment, wherein the starting point and endpoint of the segment are based on detecting changes of information in the audio, visual or transcript portion of the video with respect to the frame recorded information, in frames prior to and subsequent to the selected frame, respectively.

2. The method of claim 1, wherein the characteristic feature of the frame is obtained from a frame signature created using the visual, audio or text properties of the frame.

3. The method of claim 1, wherein the characteristic feature of the frame is its frame number.

4. The method of claim 1, wherein the characteristic feature of the frame is its time stamp in the video.

5. The method of claim 1, wherein the information analyzed to detect changes is selected from the group consisting of DCT coefficients, color, edge, shape, silence, speech, music, close captioning, and audio content of the video, and combinations thereof.

6. The method of claim 1, wherein the segment is stored on the same medium as the video.

7. The method of claim 1, further comprising the step of: storing a predefined length of the video from the segment start point.

8. The method of claim 1, wherein the segment start point is determined as a predefined length of the video before the selected frame.

9. The method of claim 1, wherein the stored segment contains at least one of EPG data, a frame of the segment, or transcript information from the segment and combinations thereof.

10. A system for identifying segments of a video for later retrieval, comprising: a bookmarker component constructed to operate in connection with a video player that can display the content of a video that includes a series of video frames; the bookmarker being responsive to a signal to identify a selected frame of the video being displayed when the bookmark signal is received; the bookmarker also being constructed to: identify a characteristic feature of the selected frame and record information corresponding to said selected frame on a storage medium, process the video to define a segment of the video containing the selected frame by identifying segment starting point and segment endpoint boundaries, and to send at least a portion of the segment containing the selected frame to a storage medium, wherein the bookmarker includes an analysis engine constructed to detect changes of information in the audio, visual or transcript portion of the video with respect to the frame recorded information, in frames prior to and subsequent to the selected frame, respectively.

11. The system of claim 10, wherein the characteristic feature of the selected frame is obtained from a frame signature created using the visual, audio or text properties of the frame.

12. The system of claim 10, wherein the characteristic feature of the frame is its frame number and the bookmarker is constructed to determine the frame number.

13. The system of claim 10, wherein the characteristic feature of the frame is its time stamp in the video and the bookmarker constructed to determine the time stamp in the video.

14. The system of claim 10, wherein the analysis engine analyzes information selected from the group consisting of DCT coefficients, color, edge, shape, silence, speech, music, closed captioning and audio content of the video and combinations thereof.

15. The system of claim 10, wherein the bookmarker is constructed to record at least a portion of the segment on the same medium as the video.

16. The system of the claim 10, wherein the bookmarker is constructed to record a predefined length of the video from the segment start point.

17. A method of processing electronically stored content, comprising: electronically storing and displaying items of content, selected from the group consisting of videos, audio, images, text and combinations thereof and while displaying the content, in response to receipt of a bookmark signal, bookmarking a portion of the content item and storing information associated with the bookmarked content item, processing the content item to define a segment of the content item containing the portion by identifying segment starting point and segment endpoint boundaries and storing a selected segment of the content item having sufficient identifying information to identify the content item, then retaining on a storage device only the segment identifying the item and not the entire item, then at a later time, comparing the information identifying the item to identify that same item, wherein the starting point and endpoint of the segment are based on detecting changes of information in the audio, visual or transcript portion of the video with respect to the stored information associated with the selected content item, prior to and subsequent to the selected content item, respectively.

18. The method of claim 17, wherein the bookmarks are backed up into a remote device.

19. The method of claiming 17, wherein the bookmarks are categorized on the basis of at least one of EPG data, transcript information, captioning information or video feature information.

20. The method of claim 17, wherein the bookmarks are accessed by at least one of web pages, mobile communications devices via a wireless connection, PDAs, and computerized watches.

21. The method of claim 17, wherein the bookmarks are stored at multiple levels and different individuals have access to the different levels.

22. The method of claim 17, wherein the bookmark is created by storing it with a first storage device onto a storage medium and is then transferred to a storage medium associated with another device.

23. The method of claim 17, wherein the item of content is deleted from the medium on which it had been stored after the bookmark is created, while retaining the bookmark.

24. A method of identifying items of content, selected from the group consisting of videos, audio, images, text and combinations thereof, comprising the steps of: receiving a bookmark signal to bookmark a portion of the content item; in response to the bookmark signal, storing information associated with the content and processing the content item to define a segment of the content item containing the portion by identifying segment starting point and segment endpoint boundaries wherein the starting point and endpoint of the segment are based on detecting changes of information in the audio, visual or transcript portion with respect to the recorded content information before and after the bookmarked portion of the content item; creating a bookmark comprising the segment of the content item and retaining the segment identifying the item on a storage medium; and downloading the bookmarks to a user at a remote location at the request of the user, the user then using the bookmarks to identify the original item of the content from which the bookmark was created.
Description



BACKGROUND OF THE INVENTION

The invention relates generally to accessing stored video content and more particularly to a method and apparatus for bookmarking video content for identifying meaningful segments of a video signal for convenient retrieval at a later time.

Users often obtain videos stored in VHS format, DVD, disks, files or otherwise for immediate viewing or for viewing at a later time. Frequently, the videos can be of great length and might have varied content. For example, a viewer might record several hours of content, including various television programs or personal activities on a single video cassette, hard drive or other storage medium. It is often difficult for viewers to return to particularly significant portions of a video. It is often inconvenient to record frame counter numbers or recording time information, particularly while viewing a video.

Users frequently use frustrating hit-or-miss methods for returning to segments of particular interest. For example, a viewer might record or obtain a video that includes performances of a large number of comedians or figure skaters, but only be interested in the performances of a relatively small number of these individuals. Also, a viewer might be recording the broadcast while watching the Superbowl or World Series, and wish to return to five or six memorable plays of the game.

Current methods for locating particular segments of interest have been inconvenient to use and accordingly, it is desirable to provide an improved apparatus and method for bookmarking a meaningful segment of a video.

SUMMARY OF THE INVENTION

Generally speaking, in accordance with the invention, a method, apparatus and systems for bookmarking an area of interest of stored video content is provided. As a viewer is watching a video and finds an area of interest, they can bookmark the particular segment of the video and then return to that segment with relative simplicity. This can be accomplished by pressing a button, clicking with a mouse or otherwise sending a signal to a device for marking a particular location of the video that is of interest. The boundaries of the entire segment can then be automatically identified using various superhistograms, frame signatures, cut detection methods, closed caption information, audio information, and so on, by analyzing the visual, audio and transcript portions of the video signal. The visual information can be analyzed for changes in color, edge and shape to determine change of individuals by face changes, key frames, video texts and the like. Various audio features such as silence, noise, speech, music, and combinations thereof can be analyzed to determine the beginning and ending of a segment. Closed captioning information can also be analyzed for words, categories and the like. By processing this information to determine the boundaries of a meaningful segment of the video, the bookmark will not merely correspond to a specific point of the video, but to an entire automatically created segment of the content.

Thus, not only can bookmark methods, systems and devices in accordance with the invention enable a user to conveniently return to a segment of a video of interest, the user can be brought to the beginning of the segment and can optionally only view the particular segment of interest, or scroll through or view only segments of interest in sequence.

For example, if a bookmark signal is sent while a particular speaker is speaking in a video of a situation comedy, identifying the current speaker when the bookmark signal is delivered can identify segment boundaries by determining when that speaker begins and stops speaking. This information can be useful for certain types of content, such as identifying a segment of a movie, but not for others. Histogram information such as change of color-palette signals can also help identify segment changes. Closed captions and natural language processing techniques can provide further information for delineating one topic from the next and will also help in identifying boundaries based on topics, dialogues and so forth. By selecting or combining evidence from the above segment identification techniques, the boundaries of the segment can be determined and established. The above can also be combined with analysis of the structure of the program as a whole to further identify the segments.

In one embodiment of the invention, the bookmark signal identifies a frame and the segment is based on time, such as 30 seconds or 1 minute, or video length such a s a selected number of frames, for example, before and after the selected frame. Alternatively, the segment can be set to a predefined length, such as 30 seconds or 1 minute from the segment beginning. Thus, if a bookmark signal is sent towards the end of a long segment, only the first part of the segment and possibly just the portion with the bookmark signal will be stored. Each segment can include EPG data, a frame or transcript information or combinations thereof. Indices of segments can be reviewed from remote locations, such as via the internet or world wide web and videos can be selected by searching through such an index.

In one embodiment of the invention, new scenes are detected on a running basis as a video is being watched. When a bookmark signal is activated, the system then looks for the end of the scene and records/indexes the bookmarked scene or stores the scene separately.

In one embodiment of the invention, when a user watching video activates the bookmark feature, the unique characteristics of the individual frame are recorded. Then, if a user has a large volume of video content in a storage medium and wants to return to a bookmarked scene or segment, but cannot remember the identity of the movie, television program or sporting event, the characteristics of the frame, as a unique or relatively unique identifier are searched and the scene (or entire work) can be retrieved. Thus, a viewer could scroll through a series of video bookmarks until the desired scene is located and go directly to the scene or to the beginning of the work. Users can even keep personal lists of favorite bookmarked segments of not only video, but music, audio and other stored content and can access content from various internet or web accessible content sources by transmitting the frame identifier or segment identifier to the content source.

Bookmarks in accordance with the invention can be backed up to a remote device, such as a PDA or other computerized storage device. Such a device can categorize the bookmarks, such as by analyzing EPG data, frame information, transcript information, such as by doing a key word search, or other video features. In fact, the systems and methods in accordance with the invintion can also be used to bookmark and categorize various types of electronic content, such as segments from audio books, music, radio programs, text documents, multimedia presentations, photographs or other images, and so on. It can also be advantageous to store bookmarks as different levels, so that certain privacy and/or parental guidance issues can be addressed. In certain embodiments of the invention, the bookmarks can be accessed through web pages, mobile communication devices, PDAs, watches and other electronic devices.

Thus, an individual can store EPG data, textual data or some other information as well as the bookmarks to give a richer prospective of the video. This textual information could be part or all of the transcript, the EPG data related to a synopsis or actor, a keyframe and so on. This information could be further used to characterize the segment and bookmark.

Accordingly, it is an object of the invention to provide an improved method, system and device for bookmarking and retrieving video and other content which overcomes drawbacks of existing methods, systems and devices.

BRIEF DESCRIPTION OF THE DRAWINGS

For a fuller description of the invention, reference is had to the following description, taken in connection with the accompanying drawings, in which:

FIG. 1 illustrates a video analysis process for segmenting video content in accordance with embodiments of the invention;

FIGS. 2A and 2B are block diagrams of devices used in creating a visual index of segments in accordance with embodiments of the invention;

FIG. 3 is a schematic diagram showing the selection of frame information from a video image in accordance with embodiments of the invention;

FIG. 4 is a chart showing three levels of a segmentation analysis in accordance with embodiments of the invention; and

FIG. 5 shows the process flow for the incoming video.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Often a viewer would like to bookmark a segment of the video they are watching for future retrieval. Bookmarking video can make it much easier to return to particular segments of interest. As a user watches a live video or video stored on a tape, disk, DVD, VHS tape or otherwise, they can press a button or otherwise cause a signal to be sent to a device electronically coupled to the video to enter a marking point. This marking point (or the signature of the frame) can be recorded in free areas of the tape (such as control areas) or medium on which the video is recorded or the time or frame count for the particular point of the tape can be recorded on a separate storage medium.

FIG. 5 shows the process flow. The incoming video can be divided (formatted) into frames in step 501. Next for each of the frames, a signature is developed and stored in step 502. If the user has selected the frame for bookmarking then the frame is identified and the signature with its frame position and video information stored as a bookmark in step 503. The boundaries around the bookmark are then identified and their information can be stored as well in step 504. The segment identification, such as the segment boundaries or the video can be stored depending on the user in step 505.

In one embodiment of the invention, a user might store the bookmarks on a PDA, server or other storage device. This can act as a look up table. A user can also verify if they have viewed or obtained a specific video by comparing a bookmark or frame information to frame information of the video, stored, for example on an external server. A viewer might download video and then after viewing, delete the video, keeping only the bookmark(s) and then retrieve the video from an external source when additional viewing is desired. Thus, storage resources can be maximized and the efficiency of centralized content storage sources can be utilized.

In one embodiment of the invention, when a viewer clicks on a video, the frame being displayed at that time is extracted out for analysis. A signature, histogram, closed captioning or some other low-level feature or combination of features, could represent this frame. Examples will be provided below.

Although systems in accordance with the invention can be set up to return to the exact point where the bookmark signal is activated, in enhanced systems or applications a meaningful segment of the video can be bookmarked and users can have the option of returning to either the exact point or to the beginning of a meaningful segment, rather than to the middle of a segment or to the end of a segment, as a user might not decide to bookmark a segment until after it has been viewed and found to be of interest.

Identifying the segment to which a bookmark corresponds can be accomplished in various manners. For example, in a preferred embodiment of the invention, the entire video or large portions thereof can be analyzed in accordance with the invention and broken down into segments. Then, when a bookmark signal is activated, the segment which is occurring when the signal is activated (or the prior segment, or both) can be bookmarked. In another embodiment of the invention, the analysis to determine the boundaries of a segment are not conducted until after the bookmark signal is activated. This information (video signature, start and end time of the tape, frame count and so forth) can be stored in the same location identified above.

In still another embodiment of the invention, a method of identifying items of content such as videos, audio, images, text and combinations thereof, and the like can be performed by creating a bookmark comprising a selected segment of the content item having sufficient identifying information to identify the content item and retaining the segment identifying the item on a storage medium, such as a storage medium at a service provider. Users could then download the bookmarks at a remote location at their election. Users could then use the bookmarks to identify the original item of content from which the bookmark was created. These downloads of bookmarks can be created in accordance with personal profiles.

DCT Frame Signatures

When the viewer selects a frame, one type of frame signature can be derived from the composition of the DCT coefficients. A frame signature representation is derived for each grouping of similarly valued DCT blocks in a frame, i.e., a frame signature is derived from region signatures within the frame. Each region signature is derived from block signatures as explained in the section below. Qualitatively, the frame signatures contain information about the prominent regions in the video frames representing identifiable objects. The signature of this frame can than be used to retrieve this portion of the video.

Referring to FIG. 3, extracting block, region and frame signatures can be performed as follows. Based on the DC and highest values of the AC coefficients, a signature is derived for each block 301 in a video frame 302. Then, blocks 301 with similar signatures are compared and size and location of groups of blocks 301 are determined in order to derive region signatures.

The block signature 310 can be eight bits long, out of which three bits 320 are devoted to the DC signature and five bits 330 are devoted to the AC values. The DC part 320 of the signature 310 is derived by determining where the DC value falls within a specified range of values (e.g. -2400 to 2400). The range can be divided into a preselected number of intervals. In this case, eight intervals are used (eight values are represented by three bits). Depending on the type of application, the size of the whole signature can be changed to accommodate a larger number of intervals and therefore finer granularity representation. Each interval is assigned a predefined mapping from the range of DC values to the DC part 320 of the signature. Five bits 330 are used to represent the content of the AC values. Each AC value is compared to a threshold, e.g. 200 and if the value is greater than the threshold, the corresponding bit in the AC signature is set to one. An example is shown in FIG. 3, where only value 370 is greater than the threshold of 200.

As shown in FIG. 3, five bits are used to represent the content of the AC values. Each AC value is compared to a threshold, if the value is greater than the threshold, the corresponding bit in the AC signature is set to one.

After deriving block signatures for each frame, regions of similarly valued block signatures are determined. Regions consist of two or more blocks that share similar block signatures. In this process, a region growing method can be used for isolating regions in the image. Traditionally, region growing methods use pixel color and neighborhood concepts to detect regions. In one embodiment of the invention, block signature is used as a basis for growing regions. Each region can then be assigned a region signature, e.g.: regionSignature(mblockSignature, regionSize, Rx, Ry) where Rx and Ry are the coordinates of the center of the region. Each region corresponds roughly to an object in the image.

A selected frame can be represented by the most prominent groupings (regions) of DCT blocks. An n-word long signature is derived for a frame where n determines the number of important regions (defined by the application) and a word consists of a predetermined number of bytes. Each frame can be represented by a number of prominent regions. In one embodiment of the invention, the number of regions in the image is limited and only the largest regions are kept. Because one frame is represented by a number of regions, the similarity between frames can be regulated by choosing the number of regions that are similar, based on their block signature, size and location. The regions can be sorted by region size and then the top n region signatures can be selected as a representative of the frame:frame (regionSignature1, . . . regionSignaturen). It should be noted that this representation of keyframes is based on the visual appearance of the images, and does not attempt to describe any semantics of the images.

Frame Searching

To find the position in the video, a frame comparison procedure compares a bookmarked frame F'' with all frames F' in a list of frames. Their respective region signatures are compared according to their size:

.times..times.''' ##EQU00001##

The frame difference can be calculated for the regions in the frame signature with the same centroids. In this case, the position of the objects as well as the signature value is taken into account. On the other hand, there are cases when the position is irrelevant and we need to compare just the region sizes and disregard the position of the regions.

If the frame difference is zero then we can use the position information from the matching frame to retrieve that section of the video.

Other Frame Signature Types

Signatures can be created by using combination of features from the frames, such as the maximum absolute difference (MAD) between the preceding and/or following frame. The intensity of the frame, bitrate used for the frame, whether the frame is interlaced or progressive; whether the frame is from a 16:9 or 4:3 format, and so forth. This type of information could be used in any combination to identify the frame and a retrieval process developed similar to that described above used.

Color Histograms

Instead of using the signatures described above, one could calculate a color histogram for the frame and use this for retrieval. The color histogram could consist of any number of bins.

Closed Captioning

Closed captioning data could also be used to bookmark the segment by extracting out the key words that represent the section.

Combinations

Any combination of the above could also be used to bookmark the frame or section.

Defining the segments

The segments could be manually bookmarked by the viewer by having the viewer click on the start and end points of the video. Alternatively, the bookmarking could happen automatically using a technique such as a superhistogram. Automatic techniques for determining the boundaries of a frame are discussed below. For example, a scene will often maintain a certain color palette. A change in scene usually entails a break in this color palette. While the video is playing automatic video analysis can be performed to extract the histograms. When the viewer clicks on the video the color histogram for that frame is compared to the previous captured frames to identify the start of the frame then the same comparisons can be done to find the end of the scene. Using this information it is now possible to store only the segment of interest for the viewer. This information can also be used for more meaningful retrieval of the full video. For instance, instead of going directly to the position of when the viewer clicked, one could actually go to the start of the scene that contains that frame.

EXAMPLE

The viewer is watching a video of the Wizard of Oz movie. The current view contains frames where Dorothy, the Tin Man, the Cowardly Lion and the Scarecrow go into the Emerald City from the poppy field. The viewer clicks on the video, e.g., when the Horse of a Different Color passes. In one embodiment of the invention, the frame/scene analysis has been continuous. The system can then extract the selected frame and generates both the DCT frame signature as well as the color histogram, for example. The analysis program searches through the previous stored frames until it finds one that does not belong to the same color palette. This denotes the start of the scene. The program has continued analyzing the video until it locates the end of the scene by virtue of another significant change in color palette. If the user had already decided to record the whole video, the start and end points are marked. In another embodiment of the invention, only the segment is stored. Meanwhile the program has been analyzing and storing the DCT frame information for the individual frames. Sometime later, if the viewer views the bookmarked frame and decides to retrieve the portion of the video, the DCT frame information is compared with the stored information until a match is found. Then the marked points around this frame are used to retrieve that portion of the video.

Segmenting the video can be performed using analysis techniques such as those discussed in U.S. Pat. Nos. 6,137,544 and 6,125,229, the contents of which are incorporated herein by reference.

Segmenting a video signal can also be accomplished with use of a layered probabilistic system which can be referred to as a Bayesian Engine or BE. Such a system is described in J. Pearl, "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference," Morgan Kaufmann Publishers, Inc. San Mateo, Calif. (1988). Such a system can be understood with reference to FIG. 9.

FIG. 4 shows a three layer probabilistic framework in three layers: low level 410, mid-level 420 and high level 430. Low level layer 410 describes signal processing parameters for a video signal 401. These can include visual features such as color, edge, and shape, audio parameters, such as average energy, bandwidth, pitch, mel-frequency cepstral coefficients, linear prediction coding coefficients, and zero-crossings; and the transcript, which can be pulled from the ASCII characters of the closed captions. If closed caption information is not available, voice recognition methods can be used to convert the audio to transcript characters.

The arrows indicate the combinations of low-level 410 features that create mid-level 420 features. Mid-level 420 features are associated with whole frames or collections of frames, while low level 410 features are associated with pixels or short time intervals. Keyframes (first frame of a shot), faces, and video text are mid-level visual features. Silence, noise, speech, music and combinations thereof are mid-level 420 features. Keywords and the closed caption/transcript categories also are part of mid-level 420.

High level features can describe semantic video content obtained through the integration of mid-level 420 features across the different modalities.

This approach is highly suitable because probabilistic frameworks are designed to deal with uncertain information, and they are appropriate for representing the integration of information. The BE's probabilistic integration employs either intra or inter-modalities. Intra-modality integration refers to integration of features within a single domain. For example: integration of color, edge, and shape information for videotext represents intra-modality integration because it all takes place in the visual domain. Integration of mid-level audio categories with the visual categories face and videotext offers an example of inter-modalities.

Bayesian networks are directed acyclical graphs (DAG) in which the nodes correspond to (stochastic) variables. The arcs describe a direct causal relationship between the linked variables. The strength of these links is given by conditional probability distributions (cpds). More formally, let the set .OMEGA.{x.sub.i, . . . x.sub.N} of N variables define a DAG. For each variable x.sub.i, i+1, . . . , N, there exists a sub-set of variables of .OMEGA., .PI..sub.xi, the parents set of x.sub.i i.e., the predecessors of x.sub.i, in the DAG, such that P(x.sub.i|.PI..sub.xi)=P(x.sub.i|x.sub.l, . . . , x.sub.i-1), where P(.cndot.|.cndot.) is a cpd, strictly positive. Now, given the joint probability density function (pdf) P(x.sub.l, . . . x.sub.N), using the chain rule: P(x.sub.l, . . . x.sub.N)=P(x.sub.N|x.sub.N-l, . . . , x.sub.l) x . . . xP(x.sub.2|x.sub.1) x P(x.sub.1). According to this equation, the parent set .PI..sub.xi has the property that x.sub.i and {x.sub.1, . . . , x.sub.N}\.PI..sub.xi are conditionally independent given .PI..sub.xi.

In FIG. 4, the flow diagram of the BE has the structure of a DAG made up of three layers. In each layer, each element corresponds to a node in the DAG. The directed arcs join one node in a given layer with one or more nodes of the preceding layer. Two sets of arcs join the elements of the three layers. For a given layer and for a given element we compute a joint pdf as previously described. More precisely, for an element (node) i.sup.(l) associated with the l-th layer, the joint pdf is: P.sup.(l)(x.sup.(l).sub.i(l), .PI..sup.(l-1), . . . , .PI..sup.(2))=P(x.sup.(l).sub.i(l)|.PI..sup.(1)) x{P(x.sup.(l-1).sub.1|.PI..sup.(l-1).sub.1) . . . P(x.sup.(l-1).sub.N(l-1)|.PI..sup.(l-1).sub.N(l-1))} . . . x{P(x.sup.2.sub.1.PI..sup.2.sub.l) . . . P(x.sup.2.sub.N2|.PI..sup.(2).sub.N2)}, (1) where for each element x.sub.i.sup.(1) there exists a parent set .PI..sub.i.sup.(1), the union of the parent sets for a given level l, i.e., .PI..sup.(1).ident..SIGMA..sub.i-l.sup.N(l).PI..sub.i.sup.(l). There can exist an overlap between the different parent sets for each level.

Topic segmentation (and classification) performed by BE is shown in the third layer (high-level) of FIG. 4. The complex nature of multimedia content requires integration across multiple domains. It is preferred to use the comprehensive set of data from the audio, visual, and transcript domains.

In the BE structure, FIG. 4, for each of the three layers, each node and arrow is associated to a cpd. In the low-level layer the cpd's are assigned by the AE as described above. For the mid-level layer, twenty closed captions categories (for example) are generated: weather, international, crime, sports, movie, fashion, tech stock, music, automobile, war, economy, energy, stock, violence, financial, national (affairs), biotech, disaster, art, and politics. It is advantageous to use a knowledge tree for each category made up of an association table of keywords and categories. After a statistical processing, the system performs categorization using category vote histograms. If a word in the closed captions file matches a knowledge base keyword, then the corresponding category gets a vote. The probability, for each category, is given by the ratio between the total number of votes per keyword and the total number of votes for a closed captions paragraph.

Systems in accordance with the invention can perform segmentation segmenting the TV program into commercial vs. non-commercial parts; classifying the non-commercial parts into segments based on two high-level categories: financial news and talk shows, for example (performed by the BE).

Initial segmentation can be done using closed caption data to divide the video into program and commercial segments. Next the closed captions of the program segments are analyzed for single, double, and triple arrows. Double arrows indicate a speaker change. The system marks text between successive double arrows with a start and end time in order to use it as an atomic closed captions unit. Systems in accordance with the invention can use these units as the segmenting building blocks. In order to determine a segment's high-level indexing (whether it is financial news or a talk show, for example) Scout computes two joint probabilities. These are defined as: p-FIN-TOPIC=p-VTEXT*p-KWORDS*p-FACE*p-AUDIO-FIN*p-CC-FIN*p-FACETEXT-FIN (2), p-TALK-TOPIC=p-VTEXT*p-KWORDS*p-FACE*p-AUDIO-TALK*p-CC-TALK*p-FACETE- XT-TALK (3).

The audio probabilities p-AUDIO-FIN for financial news and p-AUDIO-TALK for talk shows are created by the combination of different individual audio category probabilities. The closed captions probabilities p-CC-FIN for financial news and p-CC-TALK for talk shows are chosen as the largest probability out of the list of twenty probabilities. The face and videotext probabilities p-FACETEXT-FIN and p-FACETEXT-TALK are obtained by com-paring the face and videotext probabilities p-FACE and p-TEXT which determine, for each individual closed caption unit, the probability of face and text occurrence. One heuristic use builds on the fact that talk shows are dominated by faces while financial news has both faces and text. The high-level segmenting is done on each closed captions unit by computing in a new pair of probabilities: p-FIN-TOPIC and p-TALK-TOPIC. The highest value dictates the classification of the segment as either financial news or talk show.

A video signal can be pre-processed to extract other useful information which can be used to identify the boundaries of a segment. For example, The signal can be divided into audio, visual an transcript domains in order to generate semantically indexed information from video segments. The content of the domains can be analyzed and the analysis combined to more accurately identify the boundaries of a segment.

Video pre-processing can be accomplished with an Analysis Engine (AE) which can also combine video pre-processing with analysis. The AE can take MPEG-2 input, for example, and extract closed caption (cc) information. It can also perform audio and visual analysis, as discussed below for additional feature extraction. This information is then combined to identify segment boundaries.

One preferred AE is a Philips brand TriMedia.TM. Triodec card. The card has a TM 1000 processor and has 8 Mb of memory. It can be run using the hosted mode in a 600 MHz PIII computer with a WinNT operating system. "C" code can be downloaded to the TriMedia where analysis is performed.

The AE can perform shot detection first by extracting a new keyframe when it detects a significant difference between sequential I-frames (intermediate frames) of a group of pictures (GOP). It can employ two DCT based implementations for the frame differencing" histogram and macroblock. Unicolor keyframes or frames that appear similar to previously extracted keyframes can be filtered out using a one-byte frame signature. This keyframe extraction produces an uncompressed image and a list giving the keyframe number and the probability of a cut. The AE can base this probability on the relative amount above the threshold using the differences between the sequential I-frames. The system can then pass keyframes on for videotext detection. The AE can look for videotext using and edge-based method on the uncompressed image and can tag each keyframe for the presence or absence of text. These keyframes can be analyzed for the presence of faces.

The extraction process can produce a complete time-stamped program transcript. The time stamps can be used in order to align the transcript data with the related video.

Multiple, e.g. 20 low-level audio parameters can be extracted from the audio portion of the video signal and used for analysis in the segmenting process. This can be accomplished using, for example, .wav files on a PC. The outputs of the video pre-processing can then be used in the segmenting process.

Additional methods and systems for segmentation of the video signal are set forth below and in "Parsing TV Programs For Identification and Removal of Non-Story Segments", by T. McGee and N. Dimitrova, Proc. of SPIE Conf. on Storage and Retrieval for Image and Video Databases, pp. 243 251, San Jose, Calif., January, 1999; "PNRS-Personal News Retrieval System", by N. Dimitrova, H. Elenbaas and T. McGee, SPIE Conference on Multimedia Storage and Archiving Systems IV, pp. 2 10, September 1999, Boston; and "Text, Speech, and Vision For Video Segmentation: The Infomedia Project" by A. Hauptmann and M. Smith, AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision 1995, the entire disclosures of which are incorporated herein by reference.

The following techniques can also be used to obtain information useful for identifying segment boundaries:

Cut detection: wherein two consecutive video frames are compared to identify abrupt scene changes (hard cuts) or soft transitions (dissolve, fade-in and fade-out). An explanation of cut detection is provided in the publication by N. Dimitrova, T. McGee, H. Elenbaas, entitled "Video Keyframe Extraction and Filtering: A Keyframe is Not a Keyframe to Everyone", Proc. ACM Conf. on Knowledge and Information Management, pp. 113 120, 1997, the entire disclosure of which is incorporated herein by reference.

Face detection: wherein regions of the video frames are identified which contain skin-tone and which correspond to oval-like shapes. An explanation of face detection is provided in the publication by Gang Wei and Ishwar K. Sethi, entitled "Face Detection for Image Annotation", Pattern Recognition Letters, Vol. 20, No. 11, November 1999, the entire disclosure of which is incorporated herein by reference.

Text detection: wherein text which appears in the video frame such as overlayed or superimposed text is identified. An explanation of text detection is provided in the article entitled "Text Detection in Video Segments" by L. Agnihotri and N. Dimitrova, Proceedings of IEEE Workshop on CBAIVL, Fort Collins, Colo., June 1999, held in conjunction with IEEE Conference on Computer Vision and Pattern Recognition 1999, the entire disclosure of which is incorporated herein by reference. In one preferred embodiment of the invention, once the text is detected, optical character recognition (OCR) which is known in the art is employed.

Motion Estimation/Segmentation/Detection: wherein moving objects are determined in video sequences and the trajectory of the moving object is analyzed. In order to determine the movement of objects in video sequences, known operations such as optical flow estimation, motion compensation and motion segmentation are preferably employed. An explanation of motion estimation/segmentation/detection is provided in the publication by Patrick Bouthemy and Francois Edouard, entitled "Motion Segmentation and Qualitative Dynamic Scene Analysis from an Image Sequence", International Journal of Computer Vision, Vol. 10, No. 2, pp. 157 182, April 1993, the entire disclosure of which is incorporated herein by reference.

Camera Motion: wherein a set of five (5) global camera parameters are employed, preferably two (2) translational and three (3) rotational. The 3-D camera motion is then classified as pure tracking (horizontal motion), booming (vertical motion), dollying (motion in depth), panning (rotation about the vertical global axis), tilting (rotation about the horizontal axis), and rolling (rotation about the z-axis) or combinations of these motions. This information can be used to classify the video shots into, for example, "static", "zoom" and/or "span", and to further determine the director's intention for producing the shot. An explanation of camera motion detection is provided in the publication by R. Y. Tsai and T. S. Huang entitled "Uniqueness and Estimation of Three-Dimensional Motion Parameters of Rigid Objects with Curved Surfaces", IEEE Transaction on PAMI, Vol. 6, number 1, pp. 13 27, 1994, the entire disclosure of which


Free Web Sudoku Puzzles.
Solve with your browser.
      2 4   6    
          6 3    
8     3       7  
5 9   6   4      
4               1
      5   1   9 8
  1       7     6
    3 8          
    6   3 5      
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!