Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Video poker The talk of the day
Category:
 

Video poker machines The future of gambling
Category:
 

Video poker casino Get the desirable results
Category:
 

Corporate Events strengthen the Employees and Employer Bond
Category:
 

The Specialty of Amsterdam Weekends
Category:
 

Blackpool Weekend Ideal For Hen Dos
Category:
 

Luxury Furniture Made Affordable
Category:
 

The Makings of a Hen Weekend Party
Category:
 

Basic Steps To Google Search Engine Optimization
Category:
 

The Magic of a Barcelona Weekend
Category:
 

What A Fashion Merchandising School Can Teach You
Category:
 

Business Magazine To Keep Tabs On The Market
Category:
 

The Importance of Corporate Team Building
Category:
 

The Manufacturers of Sticker Impressions
Category:
 

The Fun of Watching Madrid Flamenco Show
Category:
 

Affordable Magazine Subscription for your Daily Needs
Category:
 

Online casinos The future of casinos is here
Category:
 

Team Building Exercise For A Healthy Team Spirit
Category:
 

Online casinos Try luck online
Category:
 

Online casino vp Get the digital way to make money
Category:
 

Best Magazines in the UK
Category:
 

Stag Weekends in Budapest are the Funniest
Category:
 

Home phones and cordless phones excellent one
Category:
 

Your Choices in Family Holidays
Category:
 

Stag Parties are the Best Ways to Kill Boredom
Category:
 

Stag Nights Unlimited Fun and Merry Making
Category:
 

Stag Do s And Don ts
Category:
 

Mobile Phone Repairs Help You Stay Connected
Category:
 

Casino reviews The desirable casinos
Category:
 

Online casino reviews The thrill just got bigger
Category:

Ontology-based parser for natural language processing Number:7,027,974 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

Google
 

Top Breaking News
     Oil Rig Catches Fire in Gulf of Mexico by Greg Flakus
     Pakistani Officials Caution Against Large Outdoor Religious Ceremonies by Ayaz Gul
     US Withdrawal from Iraq Looms Over Afghan War by Gary Thomas

Title: Ontology-based parser for natural language processing

Abstract: An ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicates. The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.

Patent Number: 7,027,974 Issued on 04/11/2006 to Busch,   et al.


Inventors: Busch; Justin Eliot (Irvine, CA); Lin; Albert Deirchow (San Diego, CA); Graydon; Patrick John (Windsor, CA); Caudill; Maureen (San Diego, CA)
Assignee: Science Applications International Corporation (San Diego, CA)
Appl. No.: 697676
Filed: October 27, 2000


Current U.S. Class: 704/4 ; 704/9; 707/4
Current International Class: G10L 13/08 (20060101)
Field of Search: 704/4,9 707/4


References Cited [Referenced By]

U.S. Patent Documents
4270182 May 1981 Asija
4864502 September 1989 Kucera et al.
4887212 December 1989 Zamora et al.
4914590 April 1990 Loatman et al.
4984178 January 1991 Hemphill et al.
5056021 October 1991 Ausborn
5101349 March 1992 Tokuume et al.
5146406 September 1992 Jensen
5237502 August 1993 White et al.
5297039 March 1994 Kanaegami et al.
5309359 May 1994 Katz et al.
5317507 May 1994 Gallant
5321833 June 1994 Chang et al.
5331556 July 1994 Black, Jr. et al.
5386556 January 1995 Hedin et al.
5404295 April 1995 Katz et al.
5418948 May 1995 Turtle
5446891 August 1995 Kaplan et al.
5475588 December 1995 Schabes et al.
5535382 July 1996 Ogawa
5576954 November 1996 Driscoll
5619709 April 1997 Caid et al.
5675710 October 1997 Lewis
5680627 October 1997 Anglea et al.
5687384 November 1997 Nagase
5694523 December 1997 Wical
5694592 December 1997 Driscoll
5706497 January 1998 Takahashi et al.
5721902 February 1998 Schultz
5721938 February 1998 Stuckey
5761389 June 1998 Maeda et al.
5790754 August 1998 Mozer et al.
5794050 August 1998 Dahlgren et al.
5802515 September 1998 Adar et al.
5835087 November 1998 Herz et al.
5864855 January 1999 Ruocco et al.
5870701 February 1999 Wachtel
5870740 February 1999 Rose et al.
5873056 February 1999 Liddy et al.
5893092 April 1999 Driscoll
5915249 June 1999 Spencer
5920854 July 1999 Kirsch et al.
5930746 July 1999 Ting
5933822 August 1999 Braden-Harder et al.
5940821 August 1999 Wical
5953718 September 1999 Wical
5960384 September 1999 Brash
5963940 October 1999 Liddy et al.
5974412 October 1999 Hazlehurst et al.
5974455 October 1999 Monier
6006221 December 1999 Liddy et al.
6012053 January 2000 Pant et al.
6021387 February 2000 Mozer et al.
6021409 February 2000 Burrows
6026388 February 2000 Liddy et al.
6038560 March 2000 Wical
6047277 April 2000 Parry et al.
6049799 April 2000 Mangat et al.
6055531 April 2000 Bennett et al.
6076051 June 2000 Messerly et al.
6233575 May 2001 Agrawal et al.
6778979 August 2004 Grefenstette et al.
2002/0143755 October 2002 Wynblatt et al.
Foreign Patent Documents
0413132 Feb., 1991 EP
WO 0049517 Aug., 2000 WO

Other References

Koller, D., and Sahami, M. Hierarchically classifying documents using very few words. ICML-97; Proceedings of the Fourteenth International Conference on Machine Learning, 1997. cited by other .
Susan T. Dumais, John Platt, David Hecherman, Mehran Sahami: Inductive Learning Algorithms and Representations for Text Categorization. CIKM pp. 148-155, 1998. cited by other .
Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. Building Domain-Specific Search Engines with Machine Learning Techniques. AAAI-99 Spring Symposium, 1999. cited by other .
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization, Jul. 1998, Madison, Wisconsin. cited by other .
Larkey, Leah S. (1999) A Patent Search and Classification System In Digital Libraries 99--The Fourth ACM Conference on Digital Libraries (Berkeley, CA, Aug. 11-14 1999) ACM Press, pp. 79-87. cited by other .
Teuvo Kohonen, Self Organization and Associative memory, 3.sup.rd Edition, Table of Contents, Springer-Verlag, New York, NY 1989. cited by other .
Dunja Mladinic, Turning Yahoo into an Automatic Web Page Classifier, ECAI 98:13.sup.th European Conference on Artificial Intelligence, Brighton, UK, Aug. 23 to Aug. 28, 1998, pp. 473-474, John Wiley & Sons, Ltd. cited by other .
Tom M. Mitchell, "Machine Learning", WCB/McGraw-Hill 1997. cited by other .
Choon Yang Quek, "Classification of World Wide Web Documents", Senior Honors Thesis, CMU. cited by other .
Bresnan, Joan (ed.). 1982. The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. cited by other .
Charniak, Eugene. 1993. Statistical Language Learning. Cambridge, MA: MIT Press. cited by other .
Domingos, Pedro and Michael Pazzani. 1997. "On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss." Machine Learning 29:103-130. cited by other .
Duda, Richard and Peter Hart. 1973. Pattern Classification and Scene Analysis. New York, NY: Wiley. cited by other .
Gold, Mark. 1967. "Language Identification in the Limit." Information and Control 10:447-474. cited by other .
Horning, James. 1969. A Study of Grammatical Inference. Ph.D. thesis, Stanford. cited by other .
Magerman, David and Mitchell Marcus. 1991. "Pearl: A Probabilistic Chart Parser." Proceedings of the 2.sup.nd International Workshop for Parsing Technologies. cited by other .
Manning, Christopher and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. cited by other .
McCallum, A., K. Nigam, S. Thrun, and T. Mitchell. 1988. "Learning to Classify Text from Labeled and Unlabeled Documents." Proceedings of the 1998 National Conference on Artificial Intelligence, Jul. 1998. cited by other .
McCallum, A., R. Rosenfeld., T. Mitchell and A. Ng. 1998. "Improving Text Classification by Shrinkage in a Hierarchy of Classes," Proceedings of the 1998 International Conference on Machine Learning. cited by other .
Pollard, Carl and Ivan Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press. cited by other .
Dirk van Eylen, "Ranking of search results using AltaVista", http://ping4.ping.be/.about.ping0658/avrank.html. cited by other .
Avrim Blum, Tom Mitchell, "Combining Labeled and Unlabeled Data with Co-Training", Proceedings of the 1998 conference on Computational Learning Theory. cited by other.

Primary Examiner: Abebe; Daniel
Attorney, Agent or Firm: Banner & Witcoff, Ltd.

Parent Case Text



Applicants hereby incorporate by reference co-pending application Ser. No. 09/627,295 filed in the U.S. Patent and Trademark Office on Jul. 27, 2000, entitled "Concept-Based Search and Retrieval System."
Claims



What is claimed is:

1. A system for ontological parsing that converts natural-language text into predicate-argument format comprising: a sentence lexer for converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and a parser for converting the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the natural language sentence and binds arguments into predicates.

2. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said sentence lexer comprises: a document iterator that receives text input and outputs individual sentences; a lexer that receives said individual sentences from said document iterator and outputs individual words; and an ontology that receives said individual words from said lexer and returns ontological entities or words tagged with default assumptions about an ontological status of said individual words to said lexer.

3. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, wherein said ontology is a parameterized ontology that assigns numbers to concepts.

4. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein said numbers can be subtracted to determine if features are in agreement, wherein a non-negative number indicates agreement.

5. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein said numbers can be subtracted to determine if features are in agreement, wherein a negative number indicates feature incompatibility.

6. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 3, wherein in said parameterized ontology, each data structure includes at least one integer value, where groups of digits of said integer value correspond to specific branches taken at corresponding levels in a parse tree.

7. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said parameterized ontology is encoded in two ways: a base of said integer value bounds a number of branches extending from a root node of said ontology, while a number of digits in the integer value bounds a potential depths of said parse tree.

8. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein a first digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.

9. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said predicates and arguments are represented by encodings comprising at least one digit separated into multiple groups to provide multiple ontological levels and a branching factor at each node.

10. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 2, further comprising lexer filters for modifying said individual sentences based on word meanings.

11. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 10, wherein said lexer filters comprise at least one of: a noun filter, an adjective filter, an adverb filter, a modal verb filter, a stop word filter, a pseudo-predicate filter, and a pseudo-concept filter.

12. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said stop word filter removes stop words from said individual sentences.

13. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said adjective filter removes lexemes representing adjectives from said individual sentences.

14. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said noun filter groups proper nouns into single lexical nouns.

15. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said modal verb filter removes modal verbs from objects of said individual sentences.

16. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said adverb filter removes lexemes containing adverb concepts from said individual sentences.

17. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said pseudo-predicate filter removes verbs from queries.

18. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 11, wherein said pseudo-concept filter removes concepts from queries.

19. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said parser comprises: a sentence receiver that receives sentences including ontological entities from said sentence lexer; a parser component that parses said sentences, received by said sentence receiver, into parse trees representing concepts in a respective sentence received by said sentence receiver; and a parse tree converter that receives the output of said parser component and converts said parse trees into predicates.

20. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 19, wherein said parser component further comprises: parser filters operating on said predicates to remove erroneous predicates.

21. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 19, wherein said parser component looks ahead at least one word, scans input from left-to-right, and constructs said parse tree.

22. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 20, wherein said parser filters remove parse trees that violate one of statistical and ontological criteria for well-formedness.

23. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 22, wherein said parser filters include a selectional restriction filter and a parse probability filter.

24. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said selectional restriction filter vetoes parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said second concept.

25. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 23, wherein said parse probability filter vetoes parse trees that fall below a minimum probability for semantic interpretation.

26. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 1, wherein said system is modular to permit the use of any part-of-speech-tagged ontology.

27. A system for ontological parsing that converts natural-language text into predicate-argument format as recited in claim 6, wherein said parse trees is represented by modified hexadecimal digits that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.

28. A method of ontological parsing that converts natural-language text into predicate-argument format comprising the steps of: converting a natural language sentence into a sequence of ontological entities that are tagged with part-of-speech information; and converting said sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the natural language sentence and binds arguments into predicates.

29. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of modifying said natural language sentence based on word meanings.

30. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the steps of: receiving sentences including ontological entities; parsing said sentences including ontological entities into parse trees representing concepts in the corresponding sentence including ontological entities; and converting said parse trees into predicates.

31. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, wherein the step of parsing comprises the step of looking ahead one word, scanning input from left-to-right, and constructing said parse tree.

32. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of removing parse trees that violate one of statistical and ontological criteria for well-formedness.

33. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of vetoing parse trees having conflicts between selectional features of concepts serving as arguments to a second concept and restrictions of said second concept.

34. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 30, further comprising the step of assigning numbers to said concepts.

35. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 34, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein a negative number indicates feature incompatibility.

36. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 34, further comprising the step of subtracting said numbers to determine if features are in agreement, wherein a non-negative number indicates agreement.

37. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of vetoing parse trees that fall below a minimum probability for semantic interpretation.

38. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, wherein the step of converting the natural language sentence into the sequence of ontological entities includes data structures, wherein each data structure includes an integer value, where each digit of said integer value corresponds to a specific branch taken at a corresponding level in a parse tree.

39. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 38, further comprising the step of encoding in two ways: a number of grouped digits and their numerical base bounds a number of branches extending from a root node of an ontology, while at least one of said groups of digits in said integer value bounds a potential depth of said parse tree.

40. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 39, wherein a most significant digit difference between two nodes provides a measure of the degree of ontological proximity of two concepts.

41. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 38, further comprising the step of representing said parse trees by modified hexadecimal numbers that have an octet of hexadecimal pairs to provide eight ontological levels and a branching factor at each node of 256.

42. A method of ontological parsing that converts natural-language text into predicate-argument format as recited in claim 28, further comprising the step of representing said predicates and arguments by encodings comprising at least one digit separated into groups to provide multiple ontological levels and a branching factor at each node.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an ontological parser for natural language processing. More particularly, the present invention relates to a system and method for ontological parsing of natural language that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligence and knowledge-base research.

The ontology-based parser is designed around the idea that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The system and method of ontology-based parsing of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.

In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes.

2. Background of the Invention

Numerous techniques have been developed to process natural language input. These techniques tend to be complicated and cumbersome. Often numerous passes through the input sentence(s) are required to fully parse the input, thereby adding to the time required to parse the input. Often the previous techniques do not have very robust feature checking capabilities. In particular, the techniques do not check for both syntactic and semantic compatibility. Often these techniques expend significant time trying to parse words that can be pruned or filtered according to their information.

The previous techniques of natural language processing are often limited to the performance of a particular purpose and cannot be used for other purposes. Conventional parsing techniques may be designed to function as part of a grammar checking system, but cannot function as part of a search engine, summarization application, or categorization application.

Furthermore, conventional parsing techniques do not take full advantage of an ontology as a lexical resource. This limits the versatility of the techniques.

U.S. Pat. No. 4,864,502 to Kucera et al. discloses a device that tags and parses natural-language sentences, and provides interactive facilities for grammar correction by an end user. The system taught by Kucera et al. has a complicated analysis, and cannot afford semantic status to each word relative to all the other words within the dictionary. The Kucera et al. system uses three parsing stages, each of which needs more than one pass through the sentence to complete its analysis.

U.S. Pat. No. 4,887,212 to Zamora et al. discloses a parser for syntactic analysis of text using a fast and compact technique. After part-of-speech tagging and disambiguation, syntactic analysis occurs in four steps. The grammar of Zamora et al. operates by making multiple passes to guess at noun phrases and verb phrases and then attempts to reconcile the results. Furthermore, the grammar violation checking technique of the Zamora et al. system checks only for syntactic correctness.

U.S. Pat. No. 4,914,590 to Loatman et al. discloses a natural language understanding system. The goal of the Loatman et al. system is to provide a formal representation of the context of a sentence, not merely the sentence itself. Case frames used in Loatman et al. require substantial hard-coded information to be programmed about each word, and a large number of case frames must be provided to obtain reasonable coverage.

Tokuume et al., U.S. Pat. No. 5,101,349, discloses a natural language processing system that makes provisions for validating grammar from the standpoint of syntactic well-formedness, but does not provide facilities for validating the semantic well-formedness of feature structures.

U.S. Pat. No. 5,146,496 to Jensen discloses a technique for identifying predicate-argument relationships in natural language text. The Jensen system must create intermediate feature structures to store semantic roles, which are then used to fill in predicates whose deep structures have missing arguments. Post-parsing analysis is needed and the parsing time is impacted by the maintenance of these variables. Additionally, semantic feature compatibility checking is not possible with Jensen's system.

U.S. Pat. No. 5,721,938 to Stuckey discloses a parsing technique, which organizes natural language into symbolic complexes, which treat all words as either nouns or verbs. The Stuckey system is oriented towards grammar-checker-style applications, and does not produce output suitable for a wide range of natural-language processing applications. The parser of the Stuckey system is only suitable for grammar-checking applications.

U.S. Pat. No. 5,960,384 to Brash discloses a parsing method and apparatus for symbolic expressions of thought such as English-language sentences. The parser of the Brash system assumes a strict compositional semantics, where a sentence's interpretation is the sum of the lexical meanings of nearby constituents. The Brash system cannot accommodate predicates with different numbers of arguments, and makes an arbitrary assumption that all relationships are transitive. The Brash system makes no provisions for the possibility that immediate relationships are not in fact the correct expression of sentence-level concepts, because it assumes that syntactic constituency is always defined by immediate relationships. The Brash system does not incorporate ontologies as the basis for its lexical resource, and therefore does not permit the output of the parser to be easily modified by other applications. Furthermore, the Brash system requires target languages to have a natural word order that already largely corresponds to the style of its syntactic analysis. Languages such as Japanese or Russian, which permit free ordering of words, but mark intended usage by morphological changes, would be difficult to parse using the Brash system.

The patent to Hemphill et al. (U.S. Pat. No. 4,984,178) discloses a chart parser designed to implement a probabilistic version of a unification-based grammar. The decision-making process occurs at intermediate parsing stages, and parse probabilities are considered before all parse paths have been pursued. Intermediate parse probability calculations have to be stored, and the system has to check for intermediate feature clashes.

U.S. Pat. No. 5,386,406 to Hedin et al. discloses a system for converting natural-language expressions into a language-independent conceptual schema. The output of the Hedin et al. system is not suitable for use in a wide variety of applications (e.g. machine translation, document summarization, categorization). The Hedin et al. system depends on the application in which it is used.

SUMMARY OF THE INVENTION

The foregoing and other deficiencies are addressed by the present invention, which is directed to an ontology-based parser for natural language processing. More particularly, the present invention relates to a system that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents. The system utilizes unstructured text as input and produces a set of data structures representing the conceptual content of the document as output. The data is transformed using a syntactic parser and ontology. The ontology is used as a lexical resource. The output that results is also an ontological entity with a structure that matches the organization of concepts in natural language. The resulting ontological entities are predicate-argument structures designed in accordance with the best practices of artificial intelligences and knowledge-base research.

The design of the ontology-based parser is based on the premise that predicate structures represent a convenient approach to searching through text. Predicate structures constitute the most compact possible representation for the relations between grammatical entities. Most of the information required to construct predicates does not need to be stored, and once the predicates have been derived from a document, the predicates may be stored as literal text strings, to be used in the same way. The ontology-based parser of the present invention is directed towards techniques for deriving predicate structures with minimal computational effort.

In addition, the ontology-based parser is designed to permit the use of arithmetic operations instead of string operations in text-processing programs, which employ the ontology-based parser. The output predicate structures contain numeric tags that represent the location of each concept within the ontology. The tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure. All applications making use of the fact that the output of the ontology-based parser is an ontological entity may realize enormous speed benefits from the parameterized ontology that the parser utilizes.

The present system imposes a logical structure on text, and a semantic representation is the form used for storage. The present system further provides logical representations for all content in documents. The advantages of the present system are the provision of a semantic representation of comparable utility with significantly reduced processing requirements, and no need to train the system to produce semantic representations of text content.

The system and method for ontological parsing of natural language according to the present invention has a far simpler analysis process than conventional parsing techniques, and utilizes a dictionary containing tags with syntactic information. The preferred implementation of the present system and method affords semantic status to each word relative to all the other words within the dictionary, and uses a single-pass context-free grammar to provide complete predicate structures containing subject and object relationships. The system and method of the present invention also provides a robust feature-checking system that accounts for semantic compatibility as well as syntactic compatibility.

The ontology of the present invention converts all inflected words to their canonical forms. Additionally, the system and method can filter lexical items according to their information content. For example, in an information retrieval application, it is capable of pulling out stopwords and unintended query words (as in the pseudo-concept and pseudo-predicate filters). In one embodiment, the grammar of the system and method of the present invention operates in a single pass to produce predicate structure analyses, and groups noun phrases and verb phrases as they occur, not by making multiple passes to guess at them and then attempting to reconcile the results. In the embodiment discussed above, the grammar violation checking of the system and method of the present invention filters both by the probability of a syntactically successful parse and the compatibility of the lexical semantics of words in the ontology. The compatibility referred to here is the self-consistent compatibility of words within the ontology; no particular requirement is imposed to force the ontology to be consistent with anything outside the present system.

In the predicate representation scheme of the present invention, there are only a few distinct frames for predicate structures, as many as needed to cover the different numbers of arguments taken by different verbs. Predicates may be enhanced with selectional restriction information, which can be coded automatically for entire semantic classes of words, rather than on an individual basis, because of the ontological scheme.

The manner in which the present invention constructs parse trees, from which predicate structures and their arguments can be read directly, uses context-free grammars, which result in faster execution. The system of the present invention maintains arguments as variables during the parsing process, and automatically fills in long-distance dependencies as part of the parsing process. No post-parsing analysis is needed to obtain this benefit, and the parsing time is not impacted by the maintenance of these variables, thus resulting in faster parsing execution. Additionally, the ontologies used permit semantic feature compatibility checking.

The system and method of the present invention isolates predicate-argument relationships into a consistent format regardless of text types. The predicate-argument relationships can be used in search, grammar-checking, summarization, and categorization applications, among others.

The system and method of the present invention can accommodate predicates with different numbers of arguments, and does not make arbitrary assumptions about predicate transitivity or intransitivity. Instead the system and method of the present invention incorporates a sophisticated syntactic analysis component, which allows facts about parts-of-speech to determine the correct syntactic analysis. Additionally, by incorporating ontologies as the basis for the lexical resource, the present invention permits the output of the parser to be easily modified by other applications. For example, a search engine incorporating our parser can easily substitute words corresponding to different levels of abstraction into the arguments of a predicate, thus broadening the search. As long as grammatical roles can be identified, the present system and method can be easily adapted to any language. For example, certain case-marked languages, such as Japanese or German, can be parsed through a grammar which simply records the grammatical relationships encoded by particular markers, and the resulting output is still compatible with the parsing results achieved for other languages.

From the foregoing, it is an object of the present invention to provide a system and method for parsing natural language input that provides a simple knowledge-base-style representation format for the manipulation of natural-language documents.

Another object of the present invention is to provide a system and method for parsing natural language input that utilizes unstructured text as an input and produces a set of data structures representing the conceptual content of the document as output, where the output is an ontological entity with a structure that matches the organization of concepts in natural language.

Still another object of the present invention is to provide a system and method for parsing natural language input that transforms data using a syntactic parser and ontology, where the ontology is used as a lexical resource.

Yet another object of the present invention is to provide a system and method for parsing natural language input that provides ontological entities as output that are predicate-argument structures.

Another object of the present invention is to provide a system and method for parsing natural language input that derives predicate structures with minimal computational effort.

Still another object of the present invention is to provide a system and method for parsing natural language input that permits the use of arithmetic operations in text-processing programs, where the output predicate structures contain numeric tags that represent the location of each concept within the ontology, and the tags are defined in terms of an absolute coordinate system that allows calculation of conceptual similarity according to the distance within a tree structure.

Another object of the present invention is to provide a system and method for parsing natural language input that realizes enormous speed benefits from the parameterized ontology that the parser utilizes.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other attributes of the present invention will be described with respect to the following drawings in which:

FIG. 1 is a block diagram of the sentence lexer according to the present invention;

FIG. 2 is a block diagram of the parser according to the present invention;

FIG. 3 is a diagram showing two complete parse trees produced according to the present invention;

FIG. 4 is an example parse tree according to the present invention;

FIG. 5 is another example parse tree according to the present invention;

FIG. 6 is another example parse tree according to the present invention; and

FIG. 7 is another example parse tree incorporating real words according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed discussion of the present invention, numerous terms, specific to the subject matter of a system and method for concept-based searching, are used. In order to provide complete understanding of the present invention, the meaning of these terms is set forth below as follows:

The term concept as used herein means an abstract formal representation of meaning, which corresponds to multiple generic or specific words in multiple languages. Concepts may represent the meanings of individual words or phrases, or the meanings of entire sentences. The term predicate means a concept that defines an n-ary relationship between other concepts. A predicate structure is a data type that includes a predicate and multiple additional concepts; as a grouping of concepts, it is itself a concept. An ontology is a hierarchically organized complex data structure that provides a context for the lexical meaning of concepts. An ontology may contain both individual concepts and predicates.

The ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicate structures.

The ontological parser is designed to be modular, so that improvements and language-specific changes can be made to individual components without reengineering the other components. The components are discussed in detail below.

The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.

Ontological parsing is a grammatical analysis technique built on the proposition that the most useful information that can be extracted from a sentence is the set of concepts within it, as well as their formal relations to each other. Ontological parsing derives its power from the use of ontologies to situate words within the context of their meaning, and from the fact that it does not need to find the correct purely syntactic analysis of the structure of a sentence in order to produce the correct analysis of the meaning of a sentence.

An ontological parser is a tool that transforms natural-language sentences into predicate structures. Predicate structures are representations of logical relationships between the words in a sentence. Every predicate structure contains a predicate, which is either a verb or a preposition, and a set of arguments, which may be any part of speech. Predicates are words which not only have intrinsic meaning of their own, but which also provide logical relations between other concepts in a sentence. Those other concepts are the arguments of the predicate, and are generally nouns, because predicate relationships are usually between entities.

As stated previously, the ontological parser has two major components, a sentence lexer 100 and a parser 200. The sentence lexer 100 is a tool for transforming text strings into ontological entities. The parser is a tool for analyzing syntactic relationships between entities.

Referring to FIG. 1, the sentence lexer 100 is shown. Document iterator 120 receives documents or text input 110, and outputs individual sentences to the lexer 130. As the lexer 130 receives each sentence, it passes each individual word to the ontology 140. If the word exists within the ontology 140, it is returned as an ontological entity; if not, it is returned as a word tagged with default assumptions about its ontological status. In one embodiment, words are automatically assumed to be nouns; however, the words may be other parts of speech.

After the lexer 130 has checked the last word in a sentence against the contents of the ontology 140, the unparsed sentence is passed to a series of lexer filters 150. Lexer filters 150 are modular plug-ins, which modify sentences based on knowledge about word meanings. The preferred embodiment contains several filters 150, although more may be developed, and existing filters may be removed from future versions, without altering the scope of the invention. For example, in an information retrieval application, an ontological parser may employ the following filters: proper noun filter, adjective filter, adverb filter, modal verb filter, and stop word filter. Similarly, for information retrieval purposes, an embodiment of the ontological parser optimized for queries may make use of all these filters, but add a pseudo-predicate filter and a pseudo-concept filter.

The stop word filter removes stop words from sentences. Stop words are words that serve only as placeholders in English-language sentences. The stop word filter will contain a set of words accepted as stop words; any lexeme whose text is in that set is considered to be a stop word.

An adjective filter serves to remove lexemes representing adjective concepts from sentences. Adjective filter checks each adjective for a noun following the adjective. The noun must follow either immediately after the adjective, or have only adjective and conjunction words appearing between the noun and the adjective. If no such noun or conjunction is found, the adjective filter will veto the sentence. The noun must also meet the selectional restrictions required by the adjective; if not, the adjective filter will veto the sentence. If a noun is found and it satisfies the restrictions of the adjective, the adjective filter will apply the selectional features of the adjective to the noun by adding all of the adjective's selectional features to the noun's set of selectional features.

The proper noun filter groups proper nouns in a sentence into single lexical nouns, rather than allowing them to pass as multiple-word sequences, which may be unparsable. A proper noun is any word or phrase representing a non-generic noun concept. Although a number of proper nouns are already present in the lexicon, they are already properly treated as regular lexical items. Since proper nouns behave syntactically as regular nouns, there is no need to distinguish proper nouns and nouns already in the lexicon. The purpose of the proper noun filter is to ensure that sequences not already in the lexicon are treated as single words where appropriate.

The modal verb filter removes modal verbs from sentence objects. Modal verbs are verbs such as "should", "could", and "would". Such verbs alter the conditions under which a sentence is true, but do not affect the basic meaning of the sentence. Since truth conditions do not need to be addressed by the ontological parser 120 or 140, such words can be eliminated to reduce parsing complexity. The modal verb filter will contain a set of modal verbs similar to the stop word list contained in stop word filter. Any Lexeme whose text is in that set and whose concept is a verb is identified as a modal verb, and will be removed.

The adverb filter removes Lexemes containing adverb concepts from sentences. Adverbs detail the meaning of the verbs they accompany, but do not change them. Since the meaning of the sentence remains the same, adverbs can be removed to simplify parsing.

The pseudo-predicate filter operates in one embodiment, as a query ontological parser.


Free Web Sudoku Puzzles.
Solve with your browser.
      6 8   3    
9       3     6  
1       5   8 7  
      3     1    
    3 7   8 9    
    7     5      
  9 1   6       3
  5     7       2
    4   1 2      
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!