Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Getting the Best Business Credit Card for Your Company
Category:
Finance / Investment  

Discover How to Stop Hair Loss Grow More Hair
Category:
Health / Fitness  

Separation Anxiety
Category:
Health / Fitness  

Visit Stuttgart to discover a city of easy contrast
Category:
Travel  

Home Typing Jobs Paid Per Assignment
Category:
Business  

crying baby
Category:
Home And Family  

Reshape Your Body Attraction with Phentermine
Category:
Health / Fitness  

Conflict Leadership And The Leadership Talk
Category:
Business  

Cash out refinance Turning lemons into lemonade
Category:
Finance / Investment  

Asthma Natural Remedies With no Side Effects
Category:
Health / Fitness  

Affiliate Cloaking What It IS And WHY You Should Be Using It If ...
Category:
Business  

Peers and Power Are a Potent Mix
Category:
Business  

The Simple Truth About Content Writing
Category:
Marketing  

Strength Training Gets Everyone Lean and Fit
Category:
Health / Fitness  

The Role of a Medical Malpractice Attorney
Category:
Health / Fitness  

How to Grab Those Current Broadway Shows and Tickets
Category:
Entertainment / Television  

5 Proven Tips For Network Marketing Success
Category:
Marketing  

Essential Money Management Strategies For Horse Betting
Category:
Sports  

Come in to life again with cialis
Category:
Health / Fitness  

Why You Should Reject Most Credit Card Offers
Category:
Business  

Mangosteen Xanthones and Antioxidants
Category:
Health / Fitness  

Why affiliate marketers should use Google Adsense
Category:
Marketing  

Madonna Confesses Dance Floor Tour May Be Coming
Category:
Entertainment / Television  

Spyware Has Your Computer Been Infected
Category:
Computers  

The Income opportuneness provable to make money online
Category:
Business  

Generic Cialis Branded Solution For Your Problem
Category:
Health / Fitness  

free ringtones
Category:
Pets  

How Opt in Email Marketing Is Still A Marketing Tool To Remember...
Category:
Marketing  

Park City Utah Winter Activities
Category:
Travel  

Life Insurance Plan Online 7 Terms You Should Know
Category:
Finance / Investment  

Performing Successful Data Recovery For Computers
Category:
Computers  

Keeping Moisture Out Of Your House
Category:
Home And Family  

How To Maximize Your Aerobics Workout
Category:
Health / Fitness  

Food Processing Industry in China Expanding Fast
Category:
Business  

Want to Get a New Job Before 2007 Sets in Part Two
Category:
Business  

Five Important Things You Sould Know Before Visiting Costa Rica
Category:
Travel  

Web designing Is your website doing business
Category:
Computers  

Add A Gourmet Taste With Cilantro
Category:
Food / Drink  

Artists Create New Music Specially for Cell Phones
Category:
Business  

What Everyone Needs to Know about High Blood Pressure
Category:
Health / Fitness  

Zero Down Payment On Georgia Homes Offered By Investor
Category:
Finance / Investment  

7 Ways To Convert Your Traffic Into Cash
Category:
Marketing  

Scanning Images Made Simple In 5 Steps
Category:
Computers  

Mortgage Insurance Protects Bank Forced Repossess Your House Los...
Category:
Business  

Secrets To A Successful Carpet Cleaning Business
Category:
Business  

Guerrilla Marketing for the Small Marketing Budget
Category:
Marketing  

My Baseball Appreciation List
Category:
Sports  

Making Money Online What are you doing
Category:
Marketing  

Find Keywords with this Cutting Edge Strategy
Category:
Business  

Orchid Plants
Category:
Home And Family  

Small Business 10 Key Attributes Your Sales Staff Must Have
Category:
Business  

Vitamin A
Category:
Health / Fitness  

How To Play Guitar Free Online Guitar Lesson
Category:
Entertainment / Television  

Securing a personal loan is much easier when you know what optio...
Category:
Finance / Investment  

Tips To Selecting The Right Acne Medicine
Category:
Health / Fitness  

Used Bicycles Sustainability Anyone
Category:
Sports  

Google Adwords Writing Secrets you Need to Know
Category:
Marketing  

Cellular Phones Can they effect your life
Category:
Business  

The Best Fixer Upper Home Contractor Tips to Find One
Category:
Business  

Beer Through the Years
Category:
Food / Drink  

Ringtones Choosing Your Ringtone Type
Category:
Entertainment / Television  

the principles of scientific cookery
Category:
Health / Fitness  

For the Air and Space Enthusiast
Category:
Hobbies / Pastimes  

How Honda Screwed Up The Internet Marketer s Logic And How You C...
Category:
Marketing  

Cheap futon Cheap and Elegant goes hand in hand in the shape of ...
Category:
Home And Family  

What are no load mutual funds
Category:
Finance / Investment  

Now Yurt talking
Category:
Travel  

Advantages and disadvantages of using web templates or WYSIWYG e...
Category:
Self Help  

Fertile Ground
Category:
Health / Fitness  

Is It Really Necessary To Buy Bottled Water
Category:
Food / Drink  

Hot Tips For Investing In Real Estate
Category:
Real Estate  

How Can Inspections Save You Money in Real Estate Investment
Category:
Business  

Automated Lead Systems Online with BurnBuilder
Category:
Business  

Looking for a site to help you with property
Category:
Business  

Make over 150 000 a year with Vending Machines
Category:
Business

Ontology-based parser for natural language processing Number:7,027,974 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Biden Celebrates US Independence Day with Troops in Iraq by VOA News
     Pakistani Airstrikes Kill at Least 10 Militants in Northwest by VOA News
     New US Offensive in Southern Afghanistan Puts Pakistani Military on Alert by Catherine Maddux

Title: Ontology-based parser for natural language processing

Abstract: An ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicates. The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.

Patent Number: 7,027,974 Issued on 04/11/2006 to Busch,   et al.


Inventors: Busch; Justin Eliot (Irvine, CA); Lin; Albert Deirchow (San Diego, CA); Graydon; Patrick John (Windsor, CA); Caudill; Maureen (San Diego, CA)
Assignee: Science Applications International Corporation (San Diego, CA)
Appl. No.: 697676
Filed: October 27, 2000


Current U.S. Class: 704/4 ; 704/9; 707/4
Current International Class: G10L 13/08 (20060101)
Field of Search: 704/4,9 707/4


References Cited [Referenced By]

U.S. Patent Documents
4270182 May 1981 Asija
4864502 September 1989 Kucera et al.
4887212 December 1989 Zamora et al.
4914590 April 1990 Loatman et al.
4984178 January 1991 Hemphill et al.
5056021 October 1991 Ausborn
5101349 March 1992 Tokuume et al.
5146406 September 1992 Jensen
5237502 August 1993 White et al.
5297039 March 1994 Kanaegami et al.
5309359 May 1994 Katz et al.
5317507 May 1994 Gallant
5321833 June 1994 Chang et al.
5331556 July 1994 Black, Jr. et al.
5386556 January 1995 Hedin et al.
5404295 April 1995 Katz et al.
5418948 May 1995 Turtle
5446891 August 1995 Kaplan et al.
5475588 December 1995 Schabes et al.
5535382 July 1996 Ogawa
5576954 November 1996 Driscoll
5619709 April 1997 Caid et al.
5675710 October 1997 Lewis
5680627 October 1997 Anglea et al.
5687384 November 1997 Nagase
5694523 December 1997 Wical
5694592 December 1997 Driscoll
5706497 January 1998 Takahashi et al.
5721902 February 1998 Schultz
5721938 February 1998 Stuckey
5761389 June 1998 Maeda et al.
5790754 August 1998 Mozer et al.
5794050 August 1998 Dahlgren et al.
5802515 September 1998 Adar et al.
5835087 November 1998 Herz et al.
5864855 January 1999 Ruocco et al.
5870701 February 1999 Wachtel
5870740 February 1999 Rose et al.
5873056 February 1999 Liddy et al.
5893092 April 1999 Driscoll
5915249 June 1999 Spencer
5920854 July 1999 Kirsch et al.
5930746 July 1999 Ting
5933822 August 1999 Braden-Harder et al.
5940821 August 1999 Wical
5953718 September 1999 Wical
5960384 September 1999 Brash
5963940 October 1999 Liddy et al.
5974412 October 1999 Hazlehurst et al.
5974455 October 1999 Monier
6006221 December 1999 Liddy et al.
6012053 January 2000 Pant et al.
6021387 February 2000 Mozer et al.
6021409 February 2000 Burrows
6026388 February 2000 Liddy et al.
6038560 March 2000 Wical
6047277 April 2000 Parry et al.
6049799 April 2000 Mangat et al.
6055531 April 2000 Bennett et al.
6076051 June 2000 Messerly et al.
6233575 May 2001 Agrawal et al.
6778979 August 2004 Grefenstette et al.
2002/0143755 October 2002 Wynblatt et al.
Foreign Patent Documents
0413132 Feb., 1991 EP
WO 0049517 Aug., 2000 WO

Other References

Koller, D., and Sahami, M. Hierarchically classifying documents using very few words. ICML-97; Proceedings of the Fourteenth International Conference on Machine Learning, 1997. cited by other .
Susan T. Dumais, John Platt, David Hecherman, Mehran Sahami: Inductive Learning Algorithms and Representations for Text Categorization. CIKM pp. 148-155, 1998. cited by other .
Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. Building Domain-Specific Search Engines with Machine Learning Techniques. AAAI-99 Spring Symposium, 1999. cited by other .
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization, Jul. 1998, Madison, Wisconsin. cited by other .
Larkey, Leah S. (1999) A Patent Search and Classification System In Digital Libraries 99--The Fourth ACM Conference on Digital Libraries (Berkeley, CA, Aug. 11-14 1999) ACM Press, pp. 79-87. cited by other .
Teuvo Kohonen, Self Organization and Associative memory, 3.sup.rd Edition, Table of Contents, Springer-Verlag, New York, NY 1989. cited by other .
Dunja Mladinic, Turning Yahoo into an Automatic Web Page Classifier, ECAI 98:13.sup.th European Conference on Artificial Intelligence, Brighton, UK, Aug. 23 to Aug. 28, 1998, pp. 473-474, John Wiley & Sons, Ltd. cited by other .
Tom M. Mitchell, "Machine Learning", WCB/McGraw-Hill 1997. cited by other .
Choon Yang Quek, "Classification of World Wide Web Documents", Senior Honors Thesis, CMU. cited by other .
Bresnan, Joan (ed.). 1982. The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. cited by other .
Charniak, Eugene. 1993. Statistical Language Learning. Cambridge, MA: MIT Press. cited by other .
Domingos, Pedro and Michael Pazzani. 1997. "On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss." Machine Learning 29:103-130. cited by other .
Duda, Richard and Peter Hart. 1973. Pattern Classification and Scene Analysis. New York, NY: Wiley. cited by other .
Gold, Mark. 1967. "Language Identification in the Limit." Information and Control 10:447-474. cited by other .
Horning, James. 1969. A Study of Grammatical Inference. Ph.D. thesis, Stanford. cited by other .
Magerman, David and Mitchell Marcus. 1991. "Pearl: A Probabilistic Chart Parser." Proceedings of the 2.sup.nd International Workshop for Parsing Technologies. cited by other .
Manning, Christopher and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. cited by other .
McCallum, A., K. Nigam, S. Thrun, and T. Mitchell. 1988. "Learning to Classify Text from Labeled and Unlabeled Documents." Proceedings of the 1998 National Conference on Artificial Intelligence, Jul. 1998. cited by other .
McCallum, A., R. Rosenfeld., T. Mitchell and A. Ng. 1998. "Improving Text Classification by Shrinkage in a Hierarchy of Classes," Proceedings of the 1998 International Conference on Machine Learning. cited by other .
Pollard, Carl and Ivan Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press. cited by other .
Dirk van Eylen, "Ranking of search results using AltaVista", http://ping4.ping.be/.about.ping0658/avrank.html. cited by other .
Avrim Blum, Tom Mitchell, "Combining Labeled and Unlabeled Data with Co-Training", Proceedings of the 1998 conference on Computational Learning Theory. cited by other.

Primary Examiner: Abebe; Daniel
Attorney, Agent or Firm: Banner & Witcoff, Ltd.

Parent Case Text



Applicants hereby incorporate by reference co-pending application Ser. No. 09/627,295 filed in the U.S. Patent and Trademark Office on Jul. 27, 2000, entitled "Concept-Based Search and Retrieval System."
Claims



What is claimed is:

1. A system for ontological parsing that converts natural-language text into predicate-argument format comprising: a sentence lexer for converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and a parser for converting the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the natural language sentence and binds arguments into predicates.

2. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said sentence lexer comprises: a document iterator that receives text input and outputs individual sentences; a lexer that receives said individual sentences from said document iterator and outputs individual words; and an ontology that receives said individual words from said lexer and returns ontological entities or words tagged with default assumptions about an ontological status of said individual words to said lexer.

3. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, wherein said ontology is a parameterized ontology that assigns numbers to concepts.

4. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein said numbers can be subtracted to determine if features are in agreement, wherein a non-negative number indicates agreement.

5. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein said numbers can be subtracted to determine if features are in agreement, wherein a negative number indicates feature incompatibility.

6. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein in said parameterized ontology, each data structure includes at least one integer value, where groups of digits of said integer value correspond to specific branches taken at corresponding levels in a parse tree.

7. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said parameterized ontology is encoded in two ways: a base of said integer value bounds a number of branches extending from a root node of said ontology, while a number of digits in the integer value bounds a potential depths of said parse tree.

8. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein a first digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.

9. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said predicates and arguments are represented by encodings comprising at least one digit separated into multiple groups to provide multiple ontological levels and a branching factor at each node.

10. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, further comprising lexer filters for modifying said individual sentences based on word meanings.

11. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 10, wherein said lexer filters comprise at least one of: a noun filter, an adjective filter, an adverb filter, a modal verb filter, a stop word filter, a pseudo-predicate filter, and a pseudo-concept filter.

12. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said stop word filter removes stop words from said individual sentences.

13. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said adjective filter removes lexemes representing adjectives from said individual sentences.

14. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said noun filter groups proper nouns into single lexical nouns.

15. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said modal verb filter removes modal verbs from objects of said individual sentences.

16. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said adverb filter removes lexemes containing adverb concepts from said individual sentences.

17. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said pseudo-predicate filter removes verbs from queries.

18. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said pseudo-concept filter removes concepts from queries.

19. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said parser comprises: a sentence receiver that receives sentences including ontological entities from said sentence lexer; a parser component that parses said sentences, received by said sentence receiver, into parse trees representing concepts in a respective sentence received by said sentence receiver; and a parse tree converter that receives the output of said parser component and converts said parse trees into predicates.

20. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 19, wherein said parser component further comprises: parser filters operating on said predicates to remove erroneous predicates.

21. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 19, wherein said parser component looks ahead at least one word, scans input from left-to-right, and constructs said parse tree.

22. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 20, wherein said parser filters remove parse trees that violate one of statistical and ontological criteria for well-formedness.

23. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 22, wherein said parser filters include a selectional restriction filter and a parse probability filter.

24. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said selectional restriction filter vetoes parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said second concept.

25. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said parse probability filter vetoes parse trees that fall below a minimum probability for semantic interpretation.

26. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said system is modular to permit the use of any part-of-speech-tagged ontology.

27. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said parse trees is represented by modified hexadecimal digits that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.

28. A method of ontological parsing that converts natural-language text into predicate-argument format comprising the steps of: converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and converting said sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the natural language sentence and binds arguments into predicates.

29. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of modifying said natural language sentence based on word meanings.

30. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the steps of: receiving sentences including ontological entities; parsing said sentences including ontological entities into parse trees representing concepts in the corresponding sentence including ontological entities; and converting said parse trees into predicates.

31. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, wherein the step of parsing comprises the step of looking ahead one word, scanning input from left-to-right, and constructing said parse tree.

32. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of removing parse trees that violate one of statistical and ontological criteria for well-formedness.

33. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of vetoing parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said second concept.

34. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 30, further comprising the step of assigning numbers to said concepts.

35. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 34, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein a negative number indicates feature incompatibility.

36. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 34, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein a non-negative number indicates agreement.

37. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of vetoing parse trees that fall below a minimum probability for semantic interpretation.

38. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, wherein the step of converting the natural language sentence into the sequence of ontological entities includes data structures, wherein each data structure includes an integer value, where each digit of said integer value corresponds to a specific branch taken at a corresponding level in a parse tree.

39. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 38, further comprising the step of encoding in two ways: a number of grouped digits and their numerical base bounds a number of branches extending from a root node of an ontology, while at least one of said groups of digits in said integer value bounds a potential depth of said parse tree.

40. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 39, wherein a most significant digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.

41. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 38, further comprising the step of representing said parse trees by modified hexadecimal numbers that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.

42. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of representing said predicates and arguments by encodings comprising at least one digit separated into groups to provide multiple ontological levels and a branching factor at each node.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an ontological parser for natural language processing. More particularly, the present invention relates to a system and method for ontological parsing of natural language that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligence and knowledge-base research.

The ontology-based parser is designed around the idea that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The system and method of ontology-based parsing of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.

In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes.

2. Background of the Invention

Numerous techniques have been developed to process natural language input. These techniques tend to be complicated and cumbersome. Often numerous passes through the input sentence(s) are required to fully parse the input, thereby adding to the time required to parse the input. Often the previous techniques do not have very robust feature checking capabilities. In particular, the techniques do not check for both syntactic and semantic compatibility. Often these techniques expend significant time trying to parse words that can be pruned or filtered according to their information.

The previous techniques of natural language processing are often limited to the performance of a particular purpose and cannot be used for other purposes. Conventional parsing techniques may be designed to function as part of a grammar checking system, but cannot function as part of a search engine, summarization application, or categorization application.

Furthermore, conventional parsing techniques do not take full advantage of an ontology as a lexical resource. This limits the versatility of the techniques.

U.S. Pat. No. 4,864,502 to Kucera et al. discloses a device that tags and parses natural-language sentences, and provides interactive facilities for grammar correction by an end user. The system taught by Kucera et al. has a complicated analysis, and cannot afford semantic status to each word relative to all the other words within the dictionary. The Kucera et al. system uses three parsing stages, each of which needs more than one pass through the sentence to complete its analysis.

U.S. Pat. No. 4,887,212 to Zamora et al. discloses a parser for syntactic analysis of text using a fast and compact technique. After part-of-speech tagging and disambiguation, syntactic analysis occurs in four steps. The grammar of Zamora et al. operates by making multiple passes to guess at noun phrases and verb phrases and then attempts to reconcile the results. Furthermore, the grammar violation checking technique of the Zamora et al. system checks only for syntactic correctness.

U.S. Pat. No. 4,914,590 to Loatman et al. discloses a natural language understanding system. The goal of the Loatman et al. system is to provide a formal representation of the context of a sentence, not merely the sentence itself. Case frames used in Loatman et al. require substantial hard-coded information to be programmed about each word, and a large number of case frames must be provided to obtain reasonable coverage.

Tokuume et al., U.S. Pat. No. 5,101,349, discloses a natural language processing system that makes provisions for validating grammar from the standpoint of syntactic well-formedness, but does not provide facilities for validating the semantic well-formedness of feature structures.

U.S. Pat. No. 5,146,496 to Jensen discloses a technique for identifying predicate-argument relationships in natural language text. The Jensen system must create intermediate feature structures to store semantic roles, which are then used to fill in predicates whose deep structures have missing arguments. Post-parsing analysis is needed and the parsing time is impacted by the maintenance of these variables. Additionally, semantic feature compatibility checking is not possible with Jensen's system.

U.S. Pat. No. 5,721,938 to Stuckey discloses a parsing technique, which organizes natural language into symbolic complexes, which treat all words as either nouns or verbs. The Stuckey system is oriented towards grammar-checker-style applications, and does not produce output suitable for a wide range of natural-language processing applications. The parser of the Stuckey system is only suitable for grammar-checking applications.

U.S. Pat. No. 5,960,384 to Brash discloses a parsing method and apparatus for symbolic expressions of thought such as English-language sentences. The parser of the Brash system assumes a strict compositional semantics, where a sentence's interpretation is the sum of the lexical meanings of nearby constituents. The Brash system cannot accommodate predicates with different numbers of arguments, and makes an arbitrary assumption that all relationships are transitive. The Brash system makes no provisions for the possibility that immediate relationships are not in fact the correct expression of sentence-level concepts, because it assumes that syntactic constituency is always defined by immediate relationships. The Brash system does not incorporate ontologies as the basis for its lexical resource, and therefore does not permit the output of the parser to be easily modified by other applications. Furthermore, the Brash system requires target languages to have a natural word order that already largely corresponds to the style of its syntactic analysis. Languages such as Japanese or Russian, which permit free ordering of words, but mark intended usage by morphological changes, would be difficult to parse using the Brash system.

The patent to Hemphill et al. (U.S. Pat. No. 4,984,178) discloses a chart parser designed to implement a probabilistic version of a unification-based grammar. The decision-making process occurs at intermediate parsing stages, and parse probabilities are considered before all parse paths have been pursued. Intermediate parse probability calculations have to be stored, and the system has to check for intermediate feature clashes.

U.S. Pat. No. 5,386,406 to Hedin et al. discloses a system for converting natural-language expressions into a language-independent conceptual schema. The output of the Hedin et al. system is not suitable for use in a wide variety of applications (e.g. machine translation, document summarization, categorization). The Hedin et al. system depends on the application in which it is used.

SUMMARY OF THE INVENTION

The foregoing and other deficiencies are addressed by the present invention, which is directed to an ontology-based parser for natural language processing. More particularly, the present invention relates to a system that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research.

The design of the ontology-based parser is based on the premise that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The ontology-based parser of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.

In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes.

The present system imposes a logical structure on text, and a semantic representation is the form used for storage. The present system further provides logical representations for all content in documents. The advantages of the present system are the provision of a semantic representation of comparable utility with significantly reduced processing requirements, and no need to train the system to produce semantic representations of text content.

The system and method for ontological parsing of natural language according to the present invention has a far simpler analysis process than conventional parsing techniques, and utilizes a dictionary containing tags with syntactic information. The preferred implementation of the present system and method affords semantic status to each word relative to all the other words within the dictionary, and uses a single-pass context-free grammar to provide complete predicate structures containing subject and object relationships. The system and method of the present invention also provides a robust feature-checking system that accounts for semantic compatibility as well as syntactic compatibility.

The ontology of the present invention converts all inflected words to their canonical forms. Additionally, the system and method can filter lexical items according to their information content. For example, in an information retrieval application, it is capable of pulling out stopwords and unintended query words (as in the pseudo-concept and pseudo-predicate filters). In one embodiment, the grammar of the system and method of the present invention operates in a single pass to produce predicate structure analyses, and groups noun phrases and verb phrases as they occur, not by making multiple passes to guess at them and then attempting to reconcile the results. In the embodiment discussed above, the grammar violation checking of the system and method of the present invention filters both by the probability of a syntactically successful parse and the compatibility of the lexical semantics of words in the ontology. The compatibility referred to here is the self-consistent compatibility of words within the ontology; no particular requirement is imposed to force the ontology to be consistent with anything outside the present system.

In the predicate representation scheme of the present invention, there are only a few distinct frames for predicate structures, as many as needed to cover the different numbers of arguments taken by different verbs. Predicates may be enhanced with selectional restriction information, which can be coded automatically for entire semantic classes of words, rather than on an individual basis, because of the ontological scheme.

The manner in which the present invention constructs parse trees, from which predicate structures and their arguments can be read directly, uses context-free grammars, which result in faster execution. The system of the present invention maintains arguments as variables during the parsing process, and automatically fills in long-distance dependencies as part of the parsing process. No post-parsing analysis is needed to obtain this benefit, and the parsing time is not impacted by the maintenance of these variables, thus resulting in faster parsing execution. Additionally, the ontologies used permit semantic feature compatibility checking.

The system and method of the present invention isolates predicate-argument relationships into a consistent format regardless of text types. The predicate-argument relationships can be used in search, grammar-checking, summarization, and categorization applications, among others.

The system and method of the present invention can accommodate predicates with different numbers of arguments, and does not make arbitrary assumptions about predicate transitivity or intransitivity. Instead the system and method of the present invention incorporates a sophisticated syntactic analysis component, which allows facts about parts-of-speech to determine the correct syntactic analysis. Additionally, by incorporating ontologies as the basis for the lexical resource, the present invention permits the output of the parser to be easily modified by other applications. For example, a search engine incorporating our parser can easily substitute words corresponding to different levels of abstraction into the arguments of a predicate, thus broadening the search. As long as grammatical roles can be identified, the present system and method can be easily adapted to any language. For example, certain case-marked languages, such as Japanese or German, can be parsed through a grammar which simply records the grammatical relationships encoded by particular markers, and the resulting output is still compatible with the parsing results achieved for other languages.

From the foregoing, it is an object of the present invention to provide a system and method for parsing natural language input that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents.

Another object of the present invention is to provide a system and method for parsing natural language input that utilizes unstructured text as an input and produces a set of data structures representing the conceptual content of the document as output, where the output is an ontological entity with a structure that matches the organization of concepts in natural language.

Still another object of the present invention is to provide a system and method for parsing natural language input that transforms data using a syntactic parser and ontology, where the ontology is used as a lexical resource.

Yet another object of the present invention is to provide a system and method for parsing natural language input that provides ontological entities as output that are predicate-argument structures.

Another object of the present invention is to provide a system and method for parsing natural language input that derives predicate structures with minimal computational effort.

Still another object of the present invention is to provide a system and method for parsing natural language input that permits the use of arithmetic operations in text-processing programs, where the output predicate structures contain numeric tags that represent the location of each concept within the ontology, and the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure.

Another object of the present invention is to provide a system and method for parsing natural language input that realizes enormous speed benefits from the parameterized ontology that the parser utilizes.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other attributes of the present invention will be described with respect to the following drawings in which:

FIG. 1 is a block diagram of the sentence lexer according to the present invention;

FIG. 2 is a block diagram of the parser according to the present invention;

FIG. 3 is a diagram showing two complete parse trees produced according to the present invention;

FIG. 4 is an example parse tree according to the present invention;

FIG. 5 is another example parse tree according to the present invention;

FIG. 6 is another example parse tree according to the present invention; and

FIG. 7 is another example parse tree incorporating real words according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed discussion of the present invention, numerous terms, specific to the subject matter of a system and method for concept-based searching, are used. In order to provide complete understanding of the present invention, the meaning of these terms is set forth below as follows:

The term concept as used herein means an abstract formal representation of meaning, which corresponds to multiple generic or specific words in multiple languages. Concepts may represent the meanings of individual words or phrases, or the meanings of entire sentences. The term predicate means a concept that defines an n-ary relationship between other concepts. A predicate structure is a data type that includes a predicate and multiple additional concepts; as a grouping of concepts, it is itself a concept. An ontology is a hierarchically organized complex data structure that provides a context for the lexical meaning of concepts. An ontology may contain both individual concepts and predicates.

The ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicate structures.

The ontological parser is designed to be modular, so that improvements and language-specific changes can be made to individual components without reengineering the other components. The components are discussed in detail below.

The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.

Ontological parsing is a grammatical analysis technique built on the proposition that the most useful information that can be extracted from a sentence is the set of concepts within it, as well as their formal relations to each other. Ontological parsing derives its power from the use of ontologies to situate words within the context of their meaning, and from the fact that it does not need to find the correct purely syntactic analysis of the structure of a sentence in order to produce the correct analysis of the meaning of a sentence.

An ontological parser is a tool that transforms natural-language sentences into predicate structures. Predicate structures are representations of logical relationships between the words in a sentence. Every predicate structure contains a predicate, which is either a verb or a preposition, and a set of arguments, which may be any part of speech. Predicates are words which not only have intrinsic meaning of their own, but which also provide logical relations between other concepts in a sentence. Those other concepts are the arguments of the predicate, and are generally nouns, because predicate relationships are usually between entities.

As stated previously, the ontological parser has two major components, a sentence lexer 100 and a parser 200. The sentence lexer 100 is a tool for transforming text strings into ontological entities. The parser is a tool for analyzing syntactic relationships between entities.

Referring to FIG. 1, the sentence lexer 100 is shown. Document iterator 120 receives documents or text input 110, and outputs individual sentences to the lexer 130. As the lexer 130 receives each sentence, it passes each individual word to the ontology 140. If the word exists within the ontology 140, it is returned as an ontological entity; if not, it is returned as a word tagged with default assumptions about its ontological status. In one embodiment, words are automatically assumed to be nouns; however, the words may be other parts of speech.

After the lexer 130 has checked the last word in a sentence against the contents of the ontology 140, the unparsed sentence is passed to a series of lexer filters 150. Lexer filters 150 are modular plug-ins, which modify sentences based on knowledge about word meanings. The preferred embodiment contains several filters 150, although more may be developed, and existing filters may be removed from future versions, without altering the scope of the invention. For example, in an information retrieval application, an ontological parser may employ the following filters: proper noun filter, adjective filter, adverb filter, modal verb filter, and stop word filter. Similarly, for information retrieval purposes, an embodiment of the ontological parser optimized for queries may make use of all these filters, but add a pseudo-predicate filter and a pseudo-concept filter.

The stop word filter removes stop words from sentences. Stop words are words that serve only as placeholders in English-language sentences. The stop word filter will contain a set of words accepted as stop words; any lexeme whose text is in that set is considered to be a stop word.

An adjective filter serves to remove lexemes representing adjective concepts from sentences. Adjective filter checks each adjective for a noun following the adjective. The noun must follow either immediately after the adjective, or have only adjective and conjunction words appearing between the noun and the adjective. If no such noun or conjunction is found, the adjective filter will veto the sentence. The noun must also meet the selectional restrictions required by the adjective; if not, the adjective filter will veto the sentence. If a noun is found and it satisfies the restrictions of the adjective, the adjective filter will apply the selectional features of the adjective to the noun by adding all of the adjective's selectional features to the noun's set of selectional features.

The proper noun filter groups proper nouns in a sentence into single lexical nouns, rather than allowing them to pass as multiple-word sequences, which may be unparsable. A proper noun is any word or phrase representing a non-generic noun concept. Although a number of proper nouns are already present in the lexicon, they are already properly treated as regular lexical items. Since proper nouns behave syntactically as regular nouns, there is no need to distinguish proper nouns and nouns already in the lexicon. The purpose of the proper noun filter is to ensure that sequences not already in the lexicon are treated as single words where appropriate.

The modal verb filter removes modal verbs from sentence objects. Modal verbs are verbs such as "should", "could", and "would". Such verbs alter the conditions under which a sentence is true, but do not affect the basic meaning of the sentence. Since truth conditions do not need to be addressed by the ontological parser 120 or 140, such words can be eliminated to reduce parsing complexity. The modal verb filter will contain a set of modal verbs similar to the stop word list contained in stop word filter. Any Lexeme whose text is in that set and whose concept is a verb is identified as a modal verb, and will be removed.

The adverb filter removes Lexemes containing adverb concepts from sentences. Adverbs detail the meaning of the verbs they accompany, but do not change them. Since the meaning of the sentence remains the same, adverbs can be removed to simplify parsing.

The pseudo-predicate filter operates in one embodiment, as a query ontological parser.


Free Web Sudoku Puzzles.
Solve with your browser.
1       6   9 4 5
    7   1       3
      3          
    1   5   4   9
  6           7  
2   8   9   6    
          9      
7       2   1    
4 3 9   7       8
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!