Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Lock assembly
Patent Number: 7,437,903 Issued on 10/21/2008 to Cheng

Title: System and method for detering theft of motorized vehicles
Patent Number: 7,437,902 Issued on 10/21/2008 to Monash

Title: Locking cap system
Patent Number: 7,437,901 Issued on 10/21/2008 to Trempala

Title: Externally releasable security bar system
Patent Number: 7,437,900 Issued on 10/21/2008 to Slone

Title: Suitcase lock assembly
Patent Number: 7,437,899 Issued on 10/21/2008 to Ma

Title: Lock device
Patent Number: 7,437,898 Issued on 10/21/2008 to Su

Title: Dyeing machine with automatic in-line dip depletion control
Patent Number: 7,437,897 Issued on 10/21/2008 to Bellini,   et al.

Title: Sock and production method of the same
Patent Number: 7,437,896 Issued on 10/21/2008 to Hirao,   et al.

Title: Seamless glove of high support performance
Patent Number: 7,437,895 Issued on 10/21/2008 to Okuno

Title: Circular knitting machine with fabric scanner, and break-away mount for fabric scanner
Patent Number: 7,437,894 Issued on 10/21/2008 to Lagueux

Title: Method for producing optical glass
Patent Number: 7,437,893 Issued on 10/21/2008 to Onodera,   et al.

Title: Apparatus having vacuum applying facilities and method of using vacuum to bend and/or shape one or more sheets
Patent Number: 7,437,892 Issued on 10/21/2008 to Siskos

Title: Recovery and purification of ethylene
Patent Number: 7,437,891 Issued on 10/21/2008 to Reyneke,   et al.

Title: Cryogenic air separation system with multi-pressure air liquefaction
Patent Number: 7,437,890 Issued on 10/21/2008 to Howard

Title: Method and apparatus for producing products from natural gas including helium and liquefied natural gas
Patent Number: 7,437,889 Issued on 10/21/2008 to Roberts,   et al.

Title: Refrigerator
Patent Number: 7,437,888 Issued on 10/21/2008 to Son,   et al.

Title: Recirculation cooling system
Patent Number: 7,437,887 Issued on 10/21/2008 to Hinder,   et al.

Title: Refrigerator ice storage bin with lid
Patent Number: 7,437,886 Issued on 10/21/2008 to Kuehl,   et al.

Title: Water spillage management for in the door ice maker
Patent Number: 7,437,885 Issued on 10/21/2008 to Wu,   et al.

Title: Air conditioner
Patent Number: 7,437,884 Issued on 10/21/2008 to Otake,   et al.

Title: Body armor cooling system
Patent Number: 7,437,883 Issued on 10/21/2008 to Baldal

Title: Apparatus for driving a compressor and a refrigerating air conditioner
Patent Number: 7,437,882 Issued on 10/21/2008 to Matsunaga,   et al.

Title: Control valve for variable displacement compressor
Patent Number: 7,437,881 Issued on 10/21/2008 to Hirota

Title: Pump bypass control apparatus and apparatus and method for maintaining a predetermined flow-through rate of a fluid through a pump
Patent Number: 7,437,880 Issued on 10/21/2008 to Bansch,   et al.

Title: Cold carbonation system for beverage dispenser with remote tower
Patent Number: 7,437,879 Issued on 10/21/2008 to Wolski,   et al.

Title: Multi-stage pulse tube cryocooler with acoustic impedance constructed to reduce transient cool down time and thermal loss
Patent Number: 7,437,878 Issued on 10/21/2008 to Gedeon,   et al.

Title: Compressor having low-pressure and high-pressure compressor operating at optimum ratio between pressure ratios thereof and gas turbine engine adopting the same
Patent Number: 7,437,877 Issued on 10/21/2008 to Kawamoto,   et al.

Title: Augmenter swirler pilot
Patent Number: 7,437,876 Issued on 10/21/2008 to Koshoffer

Title: Thermally driven cooling systems
Patent Number: 7,437,875 Issued on 10/21/2008 to Zuili,   et al.

Title: System and method for backpressure compensation for controlling exhaust gas particulate emissions
Patent Number: 7,437,874 Issued on 10/21/2008 to Ramamurthy,   et al.

Title: Connection block for a hydrostatic piston machine
Patent Number: 7,437,873 Issued on 10/21/2008 to Wanschura,   et al.

Title: Object separating apparatus using gas
Patent Number: 7,437,872 Issued on 10/21/2008 to Kim,   et al.

Title: Automatic engine protection system for use when electronic parts of a control system are exposed to overtemperature conditions
Patent Number: 7,437,871 Issued on 10/21/2008 to Cook

Title: Chain link plate with high strength
Patent Number: 7,437,870 Issued on 10/21/2008 to Wu

Title: High temperature resistant rope systems and methods
Patent Number: 7,437,869 Issued on 10/21/2008 to Chou,   et al.

Title: Core yarn manufacturing apparatus
Patent Number: 7,437,868 Issued on 10/21/2008 to Baba,   et al.

Title: Core yarn production method and apparatus
Patent Number: 7,437,867 Issued on 10/21/2008 to Sawada

Title: Sensor for bale shape monitoring in round balers
Patent Number: 7,437,866 Issued on 10/21/2008 to Smith,   et al.

Title: Mower with baffle
Patent Number: 7,437,865 Issued on 10/21/2008 to Koike,   et al.

Title: Shift mechanism for trim mower cutting units
Patent Number: 7,437,864 Issued on 10/21/2008 to Link,   et al.

Title: Wrap dispensing apparatus
Patent Number: 7,437,863 Issued on 10/21/2008 to Moser,   et al.

Title: Machine and method for inserting sheets into envelopes
Patent Number: 7,437,862 Issued on 10/21/2008 to van der Werff

Title: Bagging machine with a tunnel at least partially formed of flexible material
Patent Number: 7,437,861 Issued on 10/21/2008 to Cullen

Title: Stamp applicator with automatic sizing feature
Patent Number: 7,437,860 Issued on 10/21/2008 to Brandow,   et al.

Title: Method for producing objects, volumes, furniture modules and furniture, and articles produced by said method
Patent Number: 7,437,859 Issued on 10/21/2008 to Stolarov

Title: Welded wire reinforcement for modular concrete forms
Patent Number: 7,437,858 Issued on 10/21/2008 to Pfeiffer,   et al.

Title: Compression anchor
Patent Number: 7,437,857 Issued on 10/21/2008 to Maguire,   et al.

Title: Surface mount window for doors
Patent Number: 7,437,856 Issued on 10/21/2008 to Berger, Jr.

Title: Water and water vapor structural barrier
Patent Number: 7,437,855 Issued on 10/21/2008 to Locke,   et al.

Title: Stabilizer system for portable structure
Patent Number: 7,437,854 Issued on 10/21/2008 to O'Reilly

Title: Openable screen element such as a door
Patent Number: 7,437,853 Issued on 10/21/2008 to Jensen

Title: Sliding window apparatus
Patent Number: 7,437,852 Issued on 10/21/2008 to Dufour,   et al.

Title: Two-stage fishing bobber
Patent Number: 7,437,851 Issued on 10/21/2008 to Bennis

Title: Mag float
Patent Number: 7,437,850 Issued on 10/21/2008 to Seay,   et al.

Title: Secondary fishing lure
Patent Number: 7,437,849 Issued on 10/21/2008 to Selvaggio

Title: Firearm aiming and photographing compound apparatus
Patent Number: 7,437,848 Issued on 10/21/2008 to Chang

Title: Pivotable shoulder stock for a handgun
Patent Number: 7,437,847 Issued on 10/21/2008 to Mabry

Title: Dual window display box
Patent Number: 7,437,846 Issued on 10/21/2008 to Franco,   et al.

Title: Foldable caution device with bag
Patent Number: 7,437,845 Issued on 10/21/2008 to Huang

Title: Wire support frame for corrugated sign
Patent Number: 7,437,844 Issued on 10/21/2008 to Kennedy

Title: Banner mounting system
Patent Number: 7,437,843 Issued on 10/21/2008 to Lefebvre

Title: Folding display apparatus
Patent Number: 7,437,842 Issued on 10/21/2008 to Sgambellone

Title: Serving dish for children
Patent Number: 7,437,841 Issued on 10/21/2008 to Thum

Title: Memorabilia apparatus
Patent Number: 7,437,840 Issued on 10/21/2008 to Ratmansky,   et al.

Title: Cutting edge for a V-blade snowplow
Patent Number: 7,437,839 Issued on 10/21/2008 to Christy,   et al.

Title: Article of footwear
Patent Number: 7,437,838 Issued on 10/21/2008 to Nau

Title: Cord and strap combination shoe closure
Patent Number: 7,437,837 Issued on 10/21/2008 to Jacobs

Title: Insole assembly for increasing weight of footwear and heavy footwear having weight-increasing midsole/outsole
Patent Number: 7,437,836 Issued on 10/21/2008 to Kim

Title: Cushioning sole for an article of footwear
Patent Number: 7,437,835 Issued on 10/21/2008 to Marvin,   et al.

Title: Method of processing substrate and substrate processing apparatus
Patent Number: 7,437,834 Issued on 10/21/2008 to Nakatsukasa,   et al.

Title: Hand dryer with top surface opening and vertical splash plates
Patent Number: 7,437,833 Issued on 10/21/2008 to Sato,   et al.

Title: Reduced pressure drying apparatus
Patent Number: 7,437,832 Issued on 10/21/2008 to Usui

Title: Linear guide apparatus
Patent Number: 7,437,831 Issued on 10/21/2008 to Hayashi,   et al.

Title: Stud marker
Patent Number: 7,437,830 Issued on 10/21/2008 to Kulavic

Title: Cross tie connection bracket
Patent Number: 7,437,829 Issued on 10/21/2008 to Pryor

Ranking parser for a natural language processing system Number:7,143,036 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Ranking parser for a natural language processing system

Abstract: A natural language parse ranker of a natural language processing (NLP) system employs a goodness function to rank the possible grammatically valid parses of an utterance. The goodness function generates a statistical goodness measure (SGM) for each valid parse. The parse ranker orders the parses based upon their SGM values. It presents the parse with the greatest SGM value as the one that most likely represents the intended meaning of the speaker. The goodness function of this parse ranker is highly accurate in representing the intended meaning of a speaker. It also has reasonable training data requirements. With this parse ranker, the SGM of a particular parse is the combination of all of the probabilities of each node within the parse tree of such parse. The probability at a given node is the probability of taking a transition ("grammar rule") at that point. The probability at a node is conditioned on highly predicative linguistic phenomena, such as "phrase levels," "null transitions," and "syntactic history".

Patent Number: 7,143,036 Issued on 11/28/2006 to Weise


Inventors: Weise; David N. (Kirkland, WA)
Assignee: Microsoft Corporation (Redmond, WA)
Appl. No.: 10/929,167
Filed: August 30, 2004


Current U.S. Class: 704/245 ; 704/257
Current International Class: G10L 15/00 (20060101)
Field of Search: 704/9,10,257,245


References Cited [Referenced By]

U.S. Patent Documents
4868750 September 1989 Kucera et al.
4931928 June 1990 Greenfeld
5148406 September 1992 Church
5317647 May 1994 Pagallo
5418717 May 1995 Su et al.
5966686 October 1999 Heidorn et al.
6278987 August 2001 Akers et al.
Primary Examiner: Abebe; Daniel
Attorney, Agent or Firm: Lee & Hayes, PLLC

Parent Case Text



RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S. patent application Ser. No. 09/620,745, filed Jul. 20, 2000, the disclosure of which is incorporated by reference herein.
Claims



The invention claimed is:

1. A method of determining language usage probabilities of a natural language based upon a training corpus, the method comprising: examining a training corpus, wherein such corpus includes phrases parsed in accordance with a set of grammar rules; computing probabilities of usage of combinations of linguistic features based upon empirical tracking of appearances of instances of such combinations in phrases within the training corpus; wherein the combinations of linguistic features consist of: (transition, headword, phrase level, syntactic history, segtype); (headword, phrase level, syntactic history, segtype); (modifying headword, transition, headword); or (transition, headword).

2. A method as recited in claim 1, wherein the computing comprises counting appearances of instances of combinations of linguistic features within the training corpus.

3. A computer-readable storage medium having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 1.

4. A method for determining a probability at a node in a parse tree, the method comprising: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probability at the node based upon linguistic features of the node and the language-usage probabilities; wherein the combinations of linguistic features consist of: (transition, headword, phrase level, syntactic history, segtype); (headword, phrase level, syntactic history, segtype); (modifying headword, transition, headword); or (transition, headword).

5. A method as recited in claim 4, wherein the calculating comprises using PredParamRule Probability formula to calculate the probability at the node.

6. A method as recited in claim 4, wherein the calculating comprises using both PredParamRule Probability and SynBigram Probability formulas to calculate the probability at the node.

7. A computer-readable storage medium having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 4.

8. A method for determining a statistical goodness measure (SGM) of a parse tree representing a parse of a phrase, the parse tree comprising one or more nodes, the method comprising calculating a statistical product of probabilities of each node in the parse tree, wherein the calculating comprises: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probability at the node based upon linguistic features of the node and the language-usage probabilities.

9. A computer-readable storage medium having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 8.

10. A method for determining a statistical goodness measure (SGM) of a parse tree representing a parse of a phrase, the parse tree comprising one or more nodes, the method comprising: combining probabilities of each node in the parse tree, wherein the probabilities of each node are determined by the steps comprising: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probabilities of each node based upon linguistic features of each node and the language-usage probabilities; wherein the combinations of linguistic features comprises: (transition, headword, phrase level, syntactic history, segtype); (headword, phrase level, syntactic history, segtype); (modifying headword, transition, headword); and (transition, headword).

11. A method as recited in claim 10, wherein the calculating comprises using PredParamRule Probability formula to calculate the probability at the node.

12. A method as recited in claim 10, wherein the calculating comprises using both PredParamRule Probability and SynBigram Probability formulas to calculate the probability at the node.

13. A method for ranking multiple parse trees, each tree representing a syntactically valid parse of a phase, the method comprising: determining statistical goodness measures (SGMs) of each parse tree by the method as recited in claim 10 to get an SGM values associated with each tree; organizing the trees in order of each tree's associated SGM value.

14. A computer-readable storage medium having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 10.

15. A method for determining a statistical goodness measure (SGM) of a parse tree representing a parse of a phrase, the parse tree comprising one or more nodes, the method comprising: combining probabilities of each node in the parse tree, wherein the probabilities of each node are determined by the steps comprising: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probabilities of each node based upon linguistic features of each node and the language-usage probabilities; wherein during the combining, the probabilities of each node in the parse tree are combined in a top-down, generative approach.

16. A method for determining statistical goodness measures (SGMs) of multiple parse frees, each tree representing a syntactically valid parse of a phrase, the method comprising determining a SGM of each parse tree representing a parse of a phrase, the parse tree comprising one or more nodes, the determining comprising: combining probabilities of each node in the parse tree, wherein the probabilities of each node are determined by the steps comprising: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probabilities of each node based upon linguistic features of each node and the language-usage probabilities.

17. A computer-readable storage medium having computer-executable instructions that, when executed by a computer, perform a method to determine a statistical goodness measure (SGM) of a parse tree representing a parse of a phrase, the parse tree comprising one or more nodes, the method comprising: combining probabilities of each node in the parse tree, wherein the probabilities of each node are determined by the steps comprising: receiving language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculating the probabilities of each node based upon linguistic features of each node and the language-usage probabilities.

18. An apparatus comprising: a processor; a natural-language-usage probability determiner executable on the processor to: examine a training corpus, wherein such corpus includes phrases parsed in accordance with a set of grammar rules; compute probabilities of usage of combinations of linguistic features based upon empirical tracking of appearances of instances of such combinations in phases within the training corpus; wherein the combinations of linguistic features consist of: (transition, headword, phase level, syntactic history, segtype); (headword, phase level, syntactic history, segtype); (modifying headword, transition, headword); or (transition, headword).

19. An apparatus comprising: a processor; a natural-language-usage probability determiner executable on the processor to: receive language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus; calculate a probability at a node in a parse tree based upon linguistic features of the node and the language-usage probabilities, wherein the combinations of linguistic features consist of (transition, headword, phrase level, syntactic history, segtype); (headword, phrase level, syntactic history, segtype); (modifying headword, transition, headword); or (transition, headword).

20. An apparatus as recited in claim 19, wherein the determiner calculates the probability at the node by using PredParamRule Probability formula.

21. An apparatus as recited in claim 19, wherein the determiner calculates the probability at the node by using both PredParamRule Probability and SynBigram Probability formulas.

22. A data structure for use with a computer having a processor and a memory, said structure comprising: a corpus comprising one or more phrases in a natural language; parse trees having hierarchical nodes, each tree representing at least one syntactically valid parse of each phase in a subset of the corpus; wherein each node as an associated probability, wherein the associated probability of a node is based upon linguistic features of such node and language-usage probabilities derived from appearances of instances of combinations of linguistic features within a training corpus; wherein PredParamRule Probability formula is used to calculate a probability associated with a node.

23. A data structure as recited in claim 22, wherein the combinations of linguistic features comprises: (transition, headword, phrase level, syntactic history, segtype); (headword, phrase level, syntactic history, segtype); (modifying headword, transition, headword); and (transition, headword).

24. A data structure as recited in claim 22, wherein both PredParamRule Probability and SynBigram Probability formulas are used to calculate a probability associated with a node.

25. The structure as recited in claim 22, wherein the subset of the corpus includes all phrases in the corpus.
Description



TECHNICAL FIELD

This invention relates to ranking parses produced by a parser for a natural language processing system.

BACKGROUND

In general, a computer is a digital machine that uses precise languages with absolute values, such as "on", "off", "1", "0", "3+4", "AND", and "XOR". In contrast, a human is an analog, biological machine that inherently uses imprecise languages with few or no absolute values. Since computers are tools for human use, input devices and input processing system are needed for humans to use the computer tools.

Since it is generally easier to train humans to conform to the digital requirements of computers than vice versa, humans have used precise input interfaces such as a keyboard and a mouse. In addition, the computer is often only required to receive the input and not to process it for syntax and semantics.

In the past, this has been the situation because of limited processing capabilities of typical computers and because of the inherent difficulties of modeling imprecise human language within a digital computer. However, as typical computing power increases, natural language processing systems are being used by computers to "understand" imprecise human language.

Natural Language Processing

A natural language processing (NLP) system is typically a computer-implemented software system, which intelligently derives meaning and context from an input string of natural language text. "Natural languages" are the imprecise languages that are spoken by humans (e.g., English, French, Japanese). Without specialized assistance, computers cannot distinguish linguistic characteristics of natural language text. For instance, a sentence in a natural language text read as follows: Betty saw a bird.

A student of English understands that, within the context of this sentence, the word "Betty" is a noun, the word "saw" is a verb, the word "a" is an adjective, and the word "bird" is a noun. However, in the context of other sentences, the same words might assume different parts of speech. Consider the following sentence: Use a saw.

The English student recognizes that the word "use" is a verb, the word "a" is an adjective, and the word "saw" is a noun. Notice that the word "saw" is used in the two sentences as different parts of speech--a verb and a noun--which an English speaking person realizes. To a computer, however, the word "saw" is represented by the same bit stream and hence can be identical for both sentences. The computer is equally likely to consider the word "saw" as a noun as it is a verb, in either sentence.

A NLP system assists the computer in distinguishing how words are used in different contexts and in applying rules to construct syntactical and meaning representations. A NLP system has many different applications where a computer derives meaning and information from the natural language of a human. Such applications include speech recognition, handwriting recognition, grammar checking, spell checking, formulating database searches, and language translation.

The core of a NLP system is its parser. Generally, a parser breaks an utterance (such as a phrase or sentence) down into its component parts with an explanation of the form, function, and syntactical relationship of each part.

NLP Parser

The NLP parser takes a phrase and builds for the computer a representation of the syntax of the phrase that the computer can understand. A parser may produce multiple different representations for a given phrase. The representation makes explicit the role each word plays and the relationships between the words, much in the same way as grade school children diagram sentences. In addition to "diagramming" a sentence, the parser ranks the multiple diagrams in order of most likely meaning to least likely.

Herein, an utterance is equivalent to a phrase. A phase is a sequence of words intended to have meaning. In addition, a sentence is understood to be one or more phrases. In addition, references herein to a human speaker include a writer and speech includes writing.

FIG. 1 shows a NLP parser 20 of a typical NLP system. The parser 20 has four key components: Tokenizer 28; Grammar Rules Interpreter 26; Searcher 30; and Parse Ranker 34.

The parser 20 receives a textual string 22. Typically, this is a sentence or a phrase. The parser also receives grammar rules 24. These rules attempt to codify and interpret the actual grammar rules of a particular natural language, such as English. Alternatively, these rules may be stored in memory within the parser.

The grammar rules interpreter 26 interprets the codified grammar rules. The tokenizer 28 identifies the words in the textual string 22, looks them up in a dictionary, makes records for the parts of speech (POS) of a word, and passes these to the searcher.

The searcher 30 in cooperation with the grammar rules interpreter generates multiple grammatically correct parses of the textual string. The searcher sends its results to the parse ranker 34.

The parse ranker 34 mathematically measures the "goodness" of each parse and ranks them. "Goodness" is a measure of the likelihood that such a parse represents the intended meaning of the human speaker (or writer). The ranked output of the parser ranker is the output of the ranker. This output is one or more of parses 38 ranked from most to least goodness.

Foundational Concepts

Three concepts form the foundation for understanding the invention described herein: statistics, linguistics, and computational linguistics.

Statistics is the branch of mathematics that deals with the relationships among and between groups of measurements, and with the relevance of similarities and differences in those relationships.

Linguistics is the analytic study of human natural language.

Computational linguistics is the analytic study of human natural language within computer science to mathematically represent language rules such as grammar, syntax, and semantics.

Statistics

Probability. The expression "Prob(x)" is the probability of event x occurring. The result of Prob(x) is a number between zero (0) and one (1), where zero means that the event never occurs and one means that it always occurs. For example, using a six-sided fair die with the sides labeled 1 6, the probability of rolling a three is 1/6. Similarly, using a randomly shuffled deck of cards (All examples using a deck of cards are based upon a standard American deck of cards having four suits (spades, hearts, diamonds, clubs) and thirteen cards per suit: Prob(top card is an Ace)=1/13 Prob(top card is a club)=1/4 Prob(top card is 3 of diamonds)=1/52

Estimating Probabilities using Training Data. The probability of events using a randomly shuffled deck or fair die can be mathematically derived. However, in many cases, there is no mathematical formula for a probability of a given event. For example, assume that one wished to determine the probability of rolling a three given a weighted die. The probability may be 1/6 (as it would be with a fair die), but it is likely to be more or less than that depending upon how the die is weighted.

How would one estimate the probability? The answer is to run an experiment. The die is thrown many times and the number of rolls where "3" is rolled is counted. This data is called the "training data". It is sometimes called the "training corpus." To determine the probability of rolling a three in the future, it is assumed that the behavior of the die in the future will be the same as it was during our experiment and thus:

.function..function..times..times..times..times..times..times. ##EQU00001## .function..times..times..times..times..times..times..times..times..times.- .times..times..times..times..times..times..times..times..times..times..tim- es..times..times..times..times..times..times..times..times. ##EQU00001.2##

In general, the accuracy of the estimate increases as the amount of training data increases. Theoretically, the estimates increase in accuracy as the amount of training data increases.

Conditional Probability. Conditional probabilities are used when there is additional information known about an event that affects the likelihood of the outcome. The notation used is Prob(x|y) meaning, "What is the probability of an unknown event x occurring given that known event y occurred."

Conditional probability is defined to be:

.function..function.&.times..times..function..function.&.times..times..tim- es..times..times..times..times..times..times..times..times..times..times..- times..times..times..function.&.times..times..function. ##EQU00002##

When the known event is predicative of the outcome, knowing the conditional probabilities is better than knowing just the unconditional probability. For example, assume that a man is in a casino playing the following game. The man then can bet $1 or pass. The House rolls a pair of dice. If the man bets and if the dice sum to 12, the man gets $35, otherwise the man loses his bet. Since the probability of rolling two die that sum to 12 is 1/36, the man should expect to lose money playing this game. On average, the man will make only $35 for every $36 that he bets.

Now suppose that the man had a fairy godmother that could whisper in his ear and tell him whether one of the die rolled was going to be a six. Knowing this, the probabilities of rolling a twelve are: Prob(two die summing to 12|one die is a 6)=1/6 Prob(two die summing to 12|neither die is a 6)=0

With the fairy godmother's help, the man can make money on the game. The strategy is to only bet when the fairy godmother says that one of the die is a six. On average, the man should expect to make $35 for every $6 that he bets.

As another example, consider the problem of predicting what the next word in a stream of text will be. E.g., is the next word after "home" more likely to be "table" or "run."? Word.sub.i represents the i.sup.th word in the lexicon and Prob(word.sub.i) is the probability that word.sub.i will be the next word in the stream. The standard approach (using unconditional probability) for computing Prob(word.sub.i) is to take a training corpus and count up the number of times the word appears. This formula represents this approach:

.function..function..times..times..times..times..times..times..times..time- s..times..times..times..times..times..times. ##EQU00003##

Better results are achieved by using conditional probabilities. In English, words don't appear in a random order. For example, "Table red the is" is highly unlikely but "The table is red" is common. Put another way, the last word in a stream is predicative of the next word that will follow. For example, if the last word in a stream is "the" or "a", then the next word is usually either an adjective or noun, but is rarely a verb. If the last word in a stream is "clever", then the next word is likely to be an animate noun like "boy" or "dog" and not likely to be an inanimate noun like "stone."

By making use of this information, the next word that will appear in a stream of text may be better predicted. If every pair of words in our lexicon is considered and a record of how often that pairs appear is kept, the probability of a specific word pairing is:

.times..times..times..times..times..times..times..times..times..times..tim- es..times..times..times..times. ##EQU00004##

This conditional probability is much more accurate than Prob(word.sub.i). This technique is commonly used by conventionally speech recognizers to predict what words are likely to follow a given speech fragment.

Sparse Data Problem. The sparse data problem occurs when there is not enough data in a training corpus to distinguish between events that never occur versus events that are possible but just didn't happen to occur in the training corpus. For example, Prob(word.sub.i+1|word.sub.i) is being computed by counting how often pairs of words occur in the corpus. If in the training corpus the pair of words "gigantic" and "car" never appears, it would be wrong to conclude that it is impossible in the English language to have the words "gigantic" and "car" together.

In general, natural languages have nearly an infinite number of possible word, phrase, and sentence combinations. The training corpus used to determine conditional probabilities must necessarily be a subset of this set of infinite combinations. Thus, the sparse data problem results in poor probabilities with a given combination when the training corpus did not include that given combination.

Chain Rule. Prob(a, b|c)=Prob(a|c) Prob(b|a, c) Linguistics

Linguistics is the scientific study of language. It endeavors to answer the question--what is language and how it is represented in the mind? Linguistics focuses on describing and explaining language.

Linguistics focuses on languages' syntax (sentence and phrase structures), morphology (word formation), and semantics (meaning). Before a computer representation model of a natural language can be generated and effectively used, the natural language must be analyzed. This is the role of linguistics.

Part of Speech. Linguists group words of a language into classes, which show similar syntactic behavior, and often a typical semantic type. These word classes are otherwise called "syntactic" or "grammatical categories", but more commonly still by the traditional names "part of speech" (POS). For example, common POS categories for English include noun, verb, adjective, preposition, and adverb

Word Order and Phrases. Words do not occur in just any order. Languages have constraints on the word order. Generally, words are organized into phrases, which are groupings of words that are clumped as a unit. Syntax is the study of the regularities and constraints of word order and phrase structure. Among the major phrase types are noun phrases, verb phrases, prepositional phrases, and adjective phrases.

Headword. The headword is the key word in a phrase. This is because it determines the syntactic character of a phrase. In a noun phrase, the headword is the noun. In a verb phrase, it is the main verb. For example, in the noun phrase "red book", the headword is "book." Similarly, for the verb phrase "going to the big store", the headword is "going."

Modifying Headword. A modifying headword is the headword of a sub-phrase within a phrase where the sub-phrase modifies the main headword of the main phrase. Assume a phrase (P) has a headword (hwP) and a modifying sub-phrase (M) within the P that modifies hwP. The modifying headword (hwM) is the headword of this modify sub-phrase (M).

For example, if the phrase is "The red bear growled at me", the headword is "growled," the modifying phrase is "the red bear," and the modifying headword is "bear." If the phrase is "running to the store", then the headword is "running", the is modifying phrase is "to the store", and the modifying headword is "to."

Lemma of Headwords. The syntactic and semantic behavior of a headword is often independent of its inflectional morphology. For example, the verbs "walks", "walking", and "walked" are derived from the verb "walk". The transitivity of a verb is independent of such inflection.

Syntactic Features. Syntactic features are distinctive properties of a word relating to how the word is used syntactically. For example, the syntactic features of a noun include whether it is singular (e.g. cat) or plural (e.g. cats) and whether it is countable (e.g. five forks) or uncountable (e.g. air). The syntactic feature of a verb includes whether or not it takes an object: Intransitive verbs do not take an object. For example, "John laughed," and "Bill walked," Mono-transitive verbs take a single direct object. For example, "I hit the ball", Di-transitive verbs takes a direct and an indirect object. For example, "I gave Bill the ball," and "I promised Bill the money" Computational Linguistics

Transitions (i.e., Rewrite Rules). The regularities of a natural language's word order and grammar are often captured by a set of rules called "transitions" or "rewrite rules." The rewrite rules are a computer representation of rules of grammar. These transitions are used to parse a phrase.

A rewrite rule has the notation form: "symbolA.fwdarw.symbolB symbolC . . . ". This indicates that symbol (symbolA) on the left side of the rule may be rewritten as one or more symbols (symbolB, symbolC, etc.) on the right side of the rule.

For example, symbolA may be "s" to indicate the "start" of the sentence analysis. SymbolB may be "np" for noun phrase and symbolC may be "vp" for verb phrase. The "np" and "vp" symbols may be further broken down until the actual words in the sentence are represented by symbolB, symbolC, etc.

For convenience, transitions can be named so that the entire rule need not be recited each time a particular transition is referenced. In Table 1 below the name of the transitions are provided under the "Name" heading. The actual transitions are provided under the "Transition" heading. Table 1 provides an example of transitions being used to parse a sentence like "Swat flies like ants":

TABLE-US-00001 TABLE 1 Name Transition (i.e., rewrite rule) s_npvp s .fwdarw. np vp s_vp s .fwdarw. vp np_noun np .fwdarw. noun np_nounpp np .fwdarw. noun pp np_nounnp np .fwdarw. noun np vp_verb vp .fwdarw. verb vp_verbnp vp .fwdarw. verb np vp_verbpp vp .fwdarw. verb pp vp_verbnppp vp .fwdarw. verb np pp pp_prepnp pp .fwdarw. prep np prep_like prep .fwdarw. like verb_swat verb .fwdarw. swat verb_flies verb .fwdarw. flies verb_like verb .fwdarw. like noun_swat noun .fwdarw. swat noun_flies noun .fwdarw. flies noun_ants noun .fwdarw. ants

In Table 1 above, the transition names (on the left-hand column) represent and identify the transition rule (on the right-hand column). For example, "np_nounpp" is the name for "np.fwdarw.noun pp" rule, which means that "noun phrase" may be rewritten as "noun" and "prepositional phrase."

Context Free Grammar (CFG). The nature of the rewrite rules is that a certain syntactic category (e.g, noun, np, vp, pp) can be rewritten as one or more other syntactic categories or words. The possibilities for rewriting depend solely on the category, and not on any surrounding context, so such phrase structure grammars are commonly referred to as context-free grammars (CFG).

FIG. 2 illustrates a CFG parse tree 50 of a phrase (or sentence). This tree-like representation of the sentence "flies like ants" is deconstructed using a CFG set of rewrite rules (i.e, transitions). The tree 50 has leaf nodes (such as 52a 52c and 54a 54g.)

The tree 50 includes a set of terminal nodes 52a 52c. These nodes are at the end of each branch of the tree and cannot be further expanded. For example, "like" 52b cannot be expanded any further because it is the word itself.

The tree 50 also includes a set of non-terminal nodes 54a 54g. These nodes are internal and may be further expanded. Each non-terminal node has immediate children, which form a branch (i.e., "local tree"). Each branch corresponds to the application of a transition. For example, "np" 54b can be further expanded into a "noun" by application of the "np_noun" transition.

Each non-terminal node in the parse tree is created via the application of some rewrite rule. For example, in FIG. 2, the root node 54a was created by the "s.fwdarw.np vp" rule. The "VP" node 54d by the "s.fwdarw.verb np" rule.

The tree 50 has a non-terminal node 54a designated as the starting node and it is labeled "s."

In general, the order of the children in each branch generates the word order of the sentence, and the tree has a single root node (in FIG. 2 it is node 54a), which is the start of the parse tree.

Segtypes. A non-terminal node has a type that is called its "segtype." In FIG. 2, each non-terminal node 54a g is labeled with its segtype. A node's segtype identifies the rule that was used to create the node (working up from the terminal nodes). In Table 1 above, the segtypes are shown under the "Transition" heading and to the left of the ".fwdarw." symbol. For example, the segtype of node 54b in FIG. 2 is "np" because the rule "np.fwdarw.noun" was used to create the node. In given grammar, a segtype can be many different values including, for example: NOUN, NP (noun phrase), VERB, VP (verb phrase), ADJ (adjective), ADJP (adjective phrase), ADV (adverb), PREP (preposition), PP (prepositional phrase), INFCL (infinitive clauses), PRPRT (present participial clause) PTPRT (past participial clause), RELCL (relative clauses), and AVPVP (a verb phrase that has a verb phrase as its head).

Node-Associated Functional Notation. In this document, a functional notation is used to refer to the information associated with a node. For example, if a variable "n" represents a node in the tree, then "hw(n)" is the headword of node "n."

The following functions are used throughout this document: hw(n) is the headword of node n segtype(n) is the segtype of node n trans(n) is the transition (rewrite rule) associated with node n (e.g., the rules under the heading "Transition" in Table 1) trn(n) is the name of the transition (e.g. the names under the heading "Name" in Table 1) modhw(n) is the modifying headword of node n

Annotated Parse Tree. A parse tree can be annotated with information computed during the parsing process. A common form of this is the lexicalized parse tree where each node is annotated with its headword. One can annotate a parse tree with additional linguistic information (e.g. syntactic features).

FIG. 3 shows an example of such a lexicalized parse tree 60. (For the purposes of this example, directional path 66 with circled reference points is ignored.) FIG. 3 is a parse tree of one of many parses of the sentence, "swat flies like ants." Terminal nodes 62a d, which are the words of the sentence, are not annotated. Non-terminal nodes 64a i are annotated. For example, node 64h has a segtype of "noun" and is annotated with "hw=ants". This means that its headword is "ants."

The parse tree 60 in FIG. 3 is also annotated with the names of the transitions between nodes. For example, the transition name "vp_verbvp" is listed between node 64f and node 64h.

Probabilistic Context Free Grammar (PCFG). A PCFG is a context free grammar where every transition is assigned a probability from zero to one. PCFGs have commonly been used to define a parser's "goodness" function. "Goodness" is a calculated measurement of the likelihood that a parse represents the intended meaning of the human speaker. In a PCFG, trees containing transitions that are more probable are preferred over trees that contain less probable transitions.

Since the probability of a transition occurring cannot be mathematically derived, the standard approach is to estimate the probabilities based upon a training corpus. A training corpus is a body of sentences and phrases that are intended to represent "typical" human speech in a natural language. The speech may be intended to be "typical" for general applications, specific applications, and/or customized applications. This "training corpus" may also be called "training data."

Thus, the probabilities are empirically derived from analyzing a training corpus. Various approaches exist for doing this. One of the simplest approaches is to use an unconditional probability formula like this:

.function..function..times..times..times..times..times..times. ##EQU00005## .function..function..times..times..times..times..times..times..times..tim- es..times..times..times..times..times..times..times..times..times..times. ##EQU00005.2##

However, this approach, by itself, produces inaccurate results because the likelihood that a transition will apply is highly dependent upon the current linguistic context, but this approach does not consider the current linguistic context. This approach simply considers occurrences of specific transitions (trans.sub.i).

Depth First Tree Walk. In order to analyze each node of a parse tree to rank parse trees, a parser must have a method of "visiting" each node. In other words, the nodes are examined in a particular order.

A "depth first tree walk" is a typical method of visiting all the nodes in a parse tree. In such a walk, all of a node's children are visited before any of the node's siblings. The visitation is typically from top of the tree (i.e., the start node) to the bottom of the tree (i.e., terminal nodes). Such visitation is typically done from left-to-right to correspond to the order of reading/writing in English, but may is be done from right-to-left.

The directional path 66 of FIG. 3 shows a depth first tree walk of the parse tree. The sequence of the walk is shown by the directional path 66 with circled reference points. The order of the stops along the path is numbered from 1 to 14 by the circled reference points.

Generative model of syntax. Each sentence-tree pair in a language has an associated top-down derivation consisting of a sequence of rule applications (transitions) of a grammar.

Augmented Phrase Structured Grammar (APSG). An APSG is a CFG that gives multiple names to each rule, thereby limiting the application of each "named" rule. Thus, for each given rewrite rule there are more than one name and the name limits its use to specific and narrower situations. For example, the structure "VP.fwdarw.NP VP" may have these limiting labels: "SubjPQuant" and "VPwNPl."

SubjPQuant specifies subject post-quantifiers on a verb phrase. For example, in "we all found useful the guidelines" is [NP all][VP found useful the guidelines]. "all" is a subject post-qualifier. The structure of "all found useful the guidelines." VPwNPl specifies a subject to a verb phrase. For example, in "John hit the ball" [NP John] [VP hit the ball] where John is the subject.

An APSG is similar to a CFG in that there are rules that look like CFG rules. But an APSG rule can examine the pieces to decide if it is going to put them together, and it determines and labels the syntactic relation between the children. Because of this, one can have multiple rules that represent VP->NP VP. This multiple rules can build different syntactic relationships.

This is an example of how this model may work: Start at the top node. There is a rule that produces the constituents below. Record this rule. Pick the leftmost of the children. If this node does not have children, then visit its sibling to the right. If there is no sibling go up to the parent and visit the parents right sibling. Keep going up and to the right until an unvisited node is found. If all nodes have been visited then we are done. For any node not visited, record the rule and recurse. (This is much easier to describe using code.) This is what is meant by top-down and left to right. This produces a unique representation of the tree.

The Problem

Given the ambiguity that exists in natural languages, many sentences have multiple syntactic interpretations. The different syntactic interpretations generally have different semantic interpretations. In other words, a sentence has more than one grammatically valid structure ("syntactic interpretation") and as a result, may have more than one reasonable meaning ("semantic interpretation"). A classic example of this is the sentence, "time flies like an arrow." There are generally considered to be seven valid syntactic parse trees.

FIGS. 4a and 4b show examples of two of the seven valid parses of this sentence. For the parse tree 70 of FIG. 4a, the object "time" 74 moves in a way that is similar to an arrow. For the parse tree 80 of FIG. 4b, the insects called "time flies" 84 enjoy the arrow object; just as one would say "Fruit flies like a meal."

Either parse could be what the speaker intended. In addition, five other syntactically valid parses may represent the meaning that the speaker intended.

How does a NLP system determine which parse is the "correct" one. It is better to say the most "correct" one. How does a NLP parser judge amongst the multiple grammatically valid parses and select the most "correct" parse?

Previous Approaches

Generally. A parser needs a way to accurately and efficiently rank these parse trees. In other words, the parser needs to compute which parse tree is the most likely interpretation for a sentence, such as "time flies like an arrow."

Since human language is inherently imprecise, rarely is one parse one hundred percent (100%) correct and the others never correct. Therefore, a parser typically ranks the parses from most likely to be correct to least likely to be correct. Correctness in this situation is a measure of what a human most likely means by a particular utterance.

A conventional approach is to use a "goodness" function to calculate a "goodness measure" of each valid parse. Existing parsers differ in the extent to which they rely on a goodness function, but most parsers utilize one.

A simple parser may generate all possible trees without regard to any linguistic knowledge and then allow the goodness function to do all the work in selecting the desired parse. Alternatively, a parser generates reasonable trees based on linguistic knowledge and then uses the goodness function to choose between the reasonable trees. In either case, the problem is to implement an efficient goodness function that accurately reflects and measures the most likely meaning of an utterance.

Straw Man Approach. The most straightforward approach is the "straw man approach." The goodness function of this approach computes the probability of a given parse tree based upon how often identical trees appeared in a training corpus. This approach is theoretical and is rarely (if ever) used in practice. This is because it is inaccurate without an impractically huge training corpus that accurately represents nearly all-possible syntactic and semantic constructions within a given language.

Using the straw man approach, the probability of a parse tree is defined to be:

.function..times..times..times..times..times..times..times..times..times..- times..times..times..times..times..times..times. ##EQU00006##

For example, assume in the training corpus the sentence, "time flies like an arrow" appears ten times. The parse represented by the parse tree 70 of FIG. 4a appears in nine of those times. In addition, the parse represented by the parse tree 80 of FIG. 4b appears only once. Thus, the probability of the parse tree 70 of FIG. 4a would be ninety percent.

If the parse of parse tree 70 is the correct parse, then this example provide good results. Note that, the exact sentence had to occur multiple times within the corpus to provide such good results.

Theoretically, given enough training data, the straw man approach can be highly accurate. Because, the amount of training data required is astronomical. First, it requires that the tagged training corpus contain all the sentences that the parser is likely to ever encounter. Second, the sentences must appear in the correct ratios corresponding to their appearance within the normal usage of the natural language. In other words, common sentences occurring more often than uncommon sentences and in the right proportion.

Creating such a huge training corpus is infeasible. However, working from a smaller corpus creates sparse data problems.

Statistical Hodgepodge Approach. Using this approach, the goodness of a parse may be determined by a collection of mostly unrelated statistical calculations based upon parts of speech, syntactic features, word probabilities, and selected heuristic rules.

A goodness function using such an approach is utilized by the grammar checker in "Office 97" by the Microsoft Corporation. Parses were assigned a score based upon statistical information and heuristic rules. These scores were often called "POD" scores.

Since this hodgepodge approach employs heuristics and does not use a unifying methodology for calculating the goodness measure of parses, there are unpredictable and unanticipated results that incorrectly rank the parses.

Syntactic Bigrams Approach. This approach uses collocations to compute a goodness function. A collocation is two or more words in some adjacent ordering or syntactic relationship. Examples of such include: "strong tea", "weapons of mass destruction", "make up", "the rich and powerful", "stiff breeze", and "broad daylight."

Specifically, syntactic bigrams are two-word collocation. The basic idea is to find the probability of two words of being in a syntactic relationship to each other, regardless of where those words appear in the sentence. The words may be adjacent (e.g., "I drink coffee."), but need not be (e.g., "I love to drink hot black coffee.") For example, the object of the verb "drink" is more likely to be "coffee" or "beer" than "table". This can be used to create a goodness function based on "syntactic bigrams."

If the following four sentences appeared in the training corpus, all four would provide evidence that "coffee" is often the object of the verb "to drink": I drink coffee. I drink black coffee. I love to drink hot black coffee. I drink, on most days of the week, coffee in the morning.

However, because of the huge potential number of word combinations, this approach requires a hefty training corpus.

Transition Probability Approach (TPA). A goodness function may be calculated using a generative grammar approach. Each sentence has a top-down derivation consisting of a sequence of rule applications (transitions). The probability of the parse tree is defined to be the product of the probabilities of the transitions.

There are a number of different ways to assign probabilities to the transitions. For this example, the transition probabilities are conditioned on segtype: Prob(parse)

.function..times..function..times..function..function..function..times..ti- mes..times..function.&.times..times..function..times..times..function. ##EQU00007## Where n.sub.i: is the i.sup.th node trans(n.sub.i): is the transition out of n.sub.i of the form X.fwdarw.Y Z segtype(n.sub.i): is the segtype of n.sub.i .PI..sub.i is the notation to combine (e.g., multiply) over all nodes i in the parse tree

For example, suppose that probabilities are assigned to each transition shown in Table 1 above and those probabilities are based upon some training corpus. The training corpus would contain parsed sentences such that the system can count the number of times each transition occurred. In other words, the system counts the number of times each particular grammar rule was used to generate the parse. The result might be:

TABLE-US-00002 TABLE 2 Transition Count Prob(trans|segtype) s .fwdarw. np vp 80 .8 Sum = 1.0 s .fwdarw. vp 20 .2 np .fwdarw. noun 80 .4 Sum = 1.0 np .fwdarw. noun pp 100 .5 np .fwdarw. noun np 20 .1 vp .fwdarw. verb 40 .4 Sum = 1.0 vp .fwdarw. verb np 20 .2 vp .fwdarw. verb pp 20 .2 vp .fwdarw. verb np pp 20 .2 pp .fwdarw. prep np 10 1 Sum = 1.0 prep .fwdarw. like 10 1 Sum = 1.0 verb .fwdarw. swat 10 .1 Sum = 1.0 verb .fwdarw. flies 50 .5 verb .fwdarw. like 40 .4 noun .fwdarw. swat 100 .5 Sum = 1.0 noun .fwdarw. flies 50 .25 noun .fwdarw. ants 50 .25

Using the PCFG represented by Table 2 above, the probability of a parse tree can be computed below as follows:

.times..times..times..function.>.times..times..function.>.times..tim- es..times..function.>.times..times.>.times..function.>.function.&- gt;.times..times..times..function.>.function.>.times..function.>.- times..times. ##EQU00008##

However, this approach does not define a very accurate goodness function. Alone, a PCFG is generally poor at ranking parses correctly. A PCFG prefers common constructions in a language over less common ones.

Ancestor Dependency-Based Generative Approach (ADBGA). This approach assumes a top-down, generative grammar approach. It defines a formulism for computing the probability of a transition given an arbitrary set of linguistic features. Features might include headword, segtype, and grammatical number, though the formulism is independent of the actual features used. This approach does not attempt to define a particular set of features.

A transition is assumed to have the form: (a.sub.1, a.sub.2, . . . a.sub.g).fwdarw.(b.sub.1, b.sub.2, . . . b.sub.g)(c.sub.1, c.sub.2, . . . c.sub.g) where a.sub.1, a.sub.2, . . . a.sub.g are the features of a parent node b.sub.1, b.sub.2, . . . b.sub.g are the features of a left child c.sub.1, c.sub.2, . . . c.sub.g are the features of a right child

The probability of a transition is Prob(b.sub.1, b.sub.2, . . . b.sub.g, c.sub.1, c.sub.2, . . . c.sub.g|a.sub.1, a.sub.2, . . . a.sub.g)

Using the chain rule, this approach then conditions each feature on the parent feature and all features earlier in the sequence:

.function..times..times..times..times..times..times..times..function..time- s..times..times..times..times..function..times..times..times..times..funct- ion..times..times..times..function..times..times..times..function..times..- times..times..times..times..function..times..times..times. ##EQU00009##

BACKGROUND SUMMARY

It is desirable for a NLP parser to be able to computationally choose the most probable parse from the potentially large number of possible parses. For example, the sentence "Feeding deer prohibited" may be logically interpreted to mean either the act of feeding is prohibited or that a type of deer is prohibited.

A parser typically uses a goodness function to generate a "goodness measure" that ranks the parse trees. Conventional implementations use heuristic ("rule of thumb") rules and/or statistics based on the part of speech of the words in the sentence and immediate syntactic context.

The goodness function is a key component to a NLP parser. By improving the goodness function, the parser improves its accuracy. In particular, the goodness function enables the parser to choose the best parse for an utterance. Each parse may be viewed as a tree with branches that eventually branch to each word in a sentence.

Existing NLP parsers rank each parse tree using conventional goodness measures. To determine the parse with the highest probability of being correct (i.e., the highest goodness measure), each branch of each parse tree is given a probability. These probabilities are generated based upon a large database of correctly parsed sentences (i.e., "training corpus"). The goodness measure of each parse tree is then calculated by combining assigned probabilities of each branch in a parse tree. This conventional statistical goodness approach is typically done with little or no consideration for contextual words and phrases.

SUMMARY

A natural language parse ranker of a natural language processing (NLP) system employs a goodness function to rank the possible grammatically valid parses of an utterance. The goodness function generates a statistical goodness measure (SGM) for each valid parse. The parse ranker orders the parses based upon their SGM values. It presents the parse with the greatest SGM value as the one that most likely represents the intended meaning of the speaker. The goodness function of this parse ranker is highly accurate in representing the intended meaning of a speaker. It also has reasonable training data requirements.

With this parse ranker, the SGM of a particular parse is the combination of all of the probabilities of each node within the parse tree of such parse. The probability at a given node is the probability of taking a transition ("grammar rule") at that point. The probability at a node is conditioned on highly predicative linguistic phenomena. Such phenomena include headwords, "phrase level", and "syntactic history," and "modifying headwords."

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of an exemplary natural language processing system.

FIG. 2 is an illustration of a typical parse tree representing a syntactically valid parse of sample phrase, "flies like ants."

FIG. 3 is another illustration of a typical parse tree representing a syntactically valid parse of sample phrase, "swat flies like ants." This parse tree is annotated to indicate transitions and headwords.

FIGS. 4a and 4b illustrate two exemplary parse trees of two of seven syntactically valid parses of sample phrase, "time flies like an arrow."

FIGS. 5a and 5b show fragments of two pairs of typical parse trees. The parse tree of FIG. 5a does not use headword annotation, but the parse tree of FIG. 5b does.

FIGS. 6a and 6b show fragments of two pairs of typical parse trees. The parse tree of FIG. 6a shows a parse done in accordance with an exemplary grammar. The parse tree of FIG. 6b shows a parse tree that includes a null transition.

FIG. 7 shows fragments of a pair of typical parse trees and illustrates the use of syntactic history.

FIG. 8 shows a typical parse tree of a sample sentence, "Graceland, I like to visit." This figure illustrates the "topicalization" syntactic phenomenon.

FIG. 9 shows a fragment of a genericized parse tree. This figure illustrates what is known and not known at a node.

FIG. 10 is a flowchart illustrating the methodology of an implementation of the training phase of the exemplary parser.

FIG. 11 is a flowchart illustrating the methodology of an implementation of the run-time phase of the exemplary parser.

FIG. 12 is an example of a computing operating environment capable of implementing the exemplary ranking parser for NLP.

DETAILED DESCRIPTION

The following description sets forth a specific embodiment of the ranking parser for natural language processing (NLP) that incorporates elements recited in the appended claims. This embodiment is described with specificity in order to meet statutory written description, enablement, and best-mode requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventor has contemplated that the claimed ranking parser might also be embodied in other ways, in conjunction with other present or future technologies.

The exemplary ranking parser described herein may be implemented by a program submodule of a natural language processing (NLP) program module. It may also be implemented by a device within a NLP device. For example, a parse ranker 34 in FIG. 1 may be a program module implementing the exemplary parser within a NLP program system 20. Alternatively, the parse ranker 34 in FIG. 1 may be a parse ranker 34 in FIG. 1 may be a device implementing the exemplary parser within a NLP system 20. Alternatively still, instructions to implement the exemplary parser may be on a computer readable medium.

Introduction

The exemplary parser of a NLP system employs a goodness function to rank the possible grammatically correct parses of an utterance. The goodness function of the exemplary parser is highly accurate in representing the intended meaning of a speaker. It also has reasonable training data requirements.

With this exemplary parser, the goodness measure of a particular parse is the probability of taking each transition ("transition probability") within the parse tree of that parse. Each transition probability within the tree is conditioned on highly predicative linguistic phenomena. Such phenomena include headwords, "phrase levels," "syntactic biagrams," and "syntactic history".

Herein, the term "linguistic features" is used to generically describe transitions, headwords, phrase levels, and syntactic history.

Statistical Goodness Measure

The statistical goodness measure (SGM) of the exemplary parser uses a generative grammar approach. In a generative grammar approach, each sentence has a top-down derivation consisting of a sequence of rule applications (i.e., transitions). The probability of the parse tree is the product of the probabilities of all the nodes. The probability for a given node is the probability that from the node one would take a specific transition, given the linguistic features.

The SGM of the exemplary parser may be calculated using either of the following equivalent formulas: Prob(parse)=.PI..sub.X Prob(trn(n.sub.X), hw(n.sub.Y), pl(n.sub.Y), sh(n.sub.Y), hw(n.sub.Z), pl(n.sub.Z), sh(n.sub.Z) |hw(n.sub.X), pl(n.sub.X), sh(n.sub.X), segtype(n.sub.X)) Formula A Or . . . Prob(parse)=.PI..sub.X Prob(trn(n.sub.X)|hw(n.sub.X), pl(n.sub.X), sh(n.sub.X), segtype(n.sub.X)) Prob(modhw(n.sub.x)|trn(n.sub.X), hw(n.sub.X)) Formula B where n.sub.X: is the X.sup.th node in a parse tree n.sub.y & n.sub.z: are the Y.sup.th and Z.sup.th nodes and children of the X.sup.th node trn(n.sub.X): is the name of the transition out of n.sub.X of the form X.fwdarw.Y Z hw(n.sub.X): is the headword of n.sub.x pl(n.sub.X): is the phrase level of n.sub.X sh(n.sub.X): is the syntactic history of n.sub.X segtype(n.sub.X): is the segtype of n.sub.X modhw(n.sub.X): is the modifying headword of n.sub.X

Those of ordinary skill in the art understand how to generalize these formulas to one child or to three or more children. In addition, there are well-known techniques for converting a ternary (or higher) rule into a set of binary rules.

The exemplary parser defines phrase levels and labels them. Previous conventional approaches clustered transitions by segtype. For example, transitions focused on noun phrases, transitions focused verb phrases, etc. However, within each such grouping, the rules can be further subdivided into multiple levels. These levels are called "phrase levels" herein. These phrase levels are highly predicative of whether a transition will occur.

A null transition is utilized for each phrase level to account for no modification from one level to the next. The null transition enables a node to move to the next level without being altered. The null transition is assigned probabilities just like other transitions.

The exemplary parser defines each node's syntactic history. Previous conventional approaches conditioned on linguistic phenomena associated with a node, its parent, and/or its children. However, such approaches are overly limiting. Using the exemplary parser, phenomena that are predicative but appear elsewhere in the tree (other than simply a node's immediate decedents or ancestors) are included in the probability calculation.

The probabilities of the exemplary parser are conditioned on transition name, headword, phrase level, and syntactic history.

Since the probabilities are conditioned on the transition name in the exemplary parser instead of just the structure of the rule (e.g. VP.fwdarw.NP VP), the parser may give the same structure different probabilities. In other words, there may be two transitions with the same structure that have different probabilities because their transition names are different.

The probabilities of the SGM of the exemplary parser are computed top down. This allows for an efficient and elegant method for computing the goodness function. In the exemplar, the headwards can also be replaced by their lemmas.

A training corpus of approximately 30,000 sentences is used to initially calculate the conditioned probabilities of factors such as transition name, headword, syntactic bigrams, phrase level, and syntactic history. The sentences in this training corpus have been annotated with ideal parse trees and the annotations contain all the linguistic phenomena on which the parser conditions. Of course, more or less sentences could be used. The more sentences used, the higher the accuracy of the probabilities. In addition, there are many known techniques for dealing with insufficient training data.

The probabilities computation method has two phases: training and run-time. During the training phase, the system examines the training corpus, and pre-computes the probabilities (which may be represented as a "count") required at run-time. At run-time, the goodness function is quickly computed using these pre-computed probabilities (which may be "counts").

Conditioning on Headwords

Consider parse trees 90 and 92 shown in FIG. 5a. Assume the two parse trees are identical except for the transition that created the top-most VP (verb phrase).

In Tree 90 of FIG. 5a, the verb phrase was created using the rule: VPwNPrl: VP.fwdarw.VP NP

VPwNPrl is used to add an object to a verb. For example, "John hit the ball" or "They elected the pope."

In Tree 92 of FIG. 5a, the verb phrase was created using the rule: VPwAVPr: VP.fwdarw.VP AVP

VPwAVPr is used when an adverbial phrase modifies a verb. For example, "He jumped high" or "I ran slowly."



Free Web Sudoku Puzzles.
Solve with your browser.
  6 7 4   9      
    1            
  4     7       1
7 9       2      
  8           4  
      6       1 5
8       9     5  
            7    
      8   1 6 2  
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!