Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Save hundreds on your next Dell purchase Amazing secrets reveale...
Category:
Computers  

Taking Supplements
Category:
Health / Fitness  

Six Rules for Penis Enlargement Beginners
Category:
Health / Fitness  

Brief idea about piles
Category:
Health / Fitness  

How to get more customers just by knowing their name
Category:
Marketing  

Why Choose a Core 2 Duo For Your CPU In Your New High End Comput...
Category:
Computers  

What Have We Learned From Antiaging Research
Category:
Health / Fitness  

Chronic Fatigue Syndrome What Causes This Mysterious Illness
Category:
Health / Fitness  

Benefits of Hypnotherapy
Category:
Business  

Choose the Right Oil to Fight Thyroid Disease
Category:
Health / Fitness  

Buying a Refurbished Laptop Computer
Category:
Computers  

Stay At Home And Lose Weight
Category:
Health / Fitness  

Indoor Air Pollution and Your Health
Category:
Health / Fitness  

How To Start An Ad Agency From Home With No Money Down
Category:
Business  

Don t Under Estimate Your Savings Account
Category:
Business  

Exclusive Solution to A Meaningless Christmas Season
Category:
Home And Family  

The Key to Anti Aging Health
Category:
Health / Fitness  

How to Profit from Other People Articles
Category:
Marketing  

Forty Million Americans Drinking Lead Contaminated Water
Category:
Health / Fitness  

Ideas for Adding some Variety to your Vegetarian Lifestyle
Category:
Health / Fitness  

Carnival Fantasy Great for a quick getaway
Category:
Travel  

How to Make Money through PPC
Category:
Business  

The Effects of Methamphetamine as an Addictive Substance
Category:
Health / Fitness  

The Right Weight Loss Program
Category:
Health / Fitness  

Weight Loss Success Strategies
Category:
Health / Fitness  

Things You Should Know About Urine Infection
Category:
Health / Fitness  

Steps that can be taken to reduce unnatural hair loss in Women
Category:
Health / Fitness  

Secured Loans for Every Need
Category:
Finance / Investment  

Funding A Business With A Bad Credit History
Category:
Business  

What Lies Beneath
Category:
Finance / Investment  

Link Exchange Services reciprocal link checker
Category:
Marketing  

Getting the best mobile phone deal available
Category:
Computers  

8 Steps to Irresistible Email Copy Every Time article 1 9
Category:
Marketing  

Increase Your Sales with Hypnotic Double Meanings
Category:
Business  

How to Generate Massive Income from Affiliates for Your Niche We...
Category:
Marketing  

Window Signs Making Tips
Category:
Business  

Vinyl Banner as a Powerful Marketing Tool
Category:
Business  

What Only One Day a Year for Mom
Category:
Home And Family  

How memory foam mattresses can add comfort to my life
Category:
Business  

Grow Into An Affiliate MLM Network
Category:
Marketing  

Why Choose Harley Davidson Motorcycle Parts
Category:
Business  

Diet Coke Mentos Phenomenon Part 1
Category:
Business  

Downloading Online Movies
Category:
Entertainment / Television  

Buying Beds Fast at the Best Prices
Category:
Home And Family  

How To Choose A Credit Card To Meet Your Needs
Category:
Business  

You Can Become A Super Affiliate Marketer
Category:
Marketing  

Gold Pocket Watch The Unique and Elegant Gift
Category:
Home And Family  

Silicone Awareness Bracelets Are More Than Just A Fashion Statem...
Category:
Home And Family  

New Spyware Threat Costs People Big Money
Category:
Computers  

Planning For the Best Results No Matter What Curves Life May Bri...
Category:
Real Estate  

Timeshare Resorts
Category:
Travel  

Puerto Rico A Vacationer s Paradise
Category:
Travel  

6 POWERFUL VRE Business Models You Can Start Building In 2006 Us...
Category:
Marketing  

Collection Of Cricket Equipment
Category:
Sports  

New hope for IBS sufferers
Category:
Health / Fitness  

What to Look for in Bill Consolidation Programs
Category:
Business  

Winning Ideas for Trade Show Display Success
Category:
Business  

Home Theater Buyers Guide
Category:
Home And Family  

Quick Weight Loss Fact Or Fiction
Category:
Health / Fitness  

Choosing an ID Card System
Category:
Business  

Priceless advice to enjoy luxury without high price whilst waiti...
Category:
Travel  

An Introduction to Antique Furniture Part Two
Category:
Business  

Practical Ways Of Dealing With Credit Card Bad Debt
Category:
Business  

Lower Back Pain Treatment
Category:
Health / Fitness  

Career opportunities in Mobile data services
Category:
Business  

Review of the Epson Stylus Pro 9800 Its First Year in Retrospect...
Category:
Computers  

Gastric Bypass Reduces Hunger in Some Surprising Ways
Category:
Health / Fitness  

5 Quick Steps to a Better Credit Score
Category:
Finance / Investment  

Business resource Business related podcasts
Category:
Business  

Reading Credit Reports
Category:
Finance / Investment  

OVER 1 Million Brits expected to cancel their holidays
Category:
Travel  

Making money with only affiliate marketing
Category:
Business  

Not Enough Doctors
Category:
Health / Fitness  

The Environ Skin Care Treatment Method
Category:
Health / Fitness  

Podcasting Improves Marketing Odds
Category:
Marketing

Speech processing system Number:7,010,483 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Speech processing system

Abstract: A speech processing system is provided which is operable to receive sets of signal values representative of a speech signal generated by a speech source. The system is operable to determine a measure of the quality of the speech signal by performing a statistical analysis of the received sets of signal values. The system stores data defining a predetermined function derived from a signal model which models the speech source and which defines a probability density function which gives, for a given set of model parameters, the probability that the signal model has those model parameters given that the signal model is assumed to have generated the received set of signal values. The system applies a current set of received signal values to the stored probability density function and then draws samples from it using a Gibbs sampler. The system then analyses the samples to determine a measure of the variance of some of the samples and then outputs a signal indicative of the quality of the received speech signal values in dependence upon the determined variance.

Patent Number: 7,010,483 Issued on 03/07/2006 to Rajan


Inventors: Rajan; Jebu Jacob (Bracknell, GB)
Assignee: Canon Kabushiki Kaisha (Tokyo, JP)
Appl. No.: 866854
Filed: May 30, 2001

Foreign Application Priority Data

Jun 02, 2000[GB]0013541
Aug 17, 2000[GB]0020314

Current U.S. Class: 704/228; 704/233; 704/240
Current Intern'l Class: G10L 15/20    (20060101); G10L 21/02    (20060101)
Field of Search: 704/226,227,228,233,234,240 714/746


References Cited [Referenced By]

U.S. Patent Documents
4386237May., 1983Virupaksha et al.
4811399Mar., 1989Landell et al.
4860360Aug., 1989Boggs.
4905286Feb., 1990Sedgwick et al.
5012518Apr., 1991Liu et al.
5315538May., 1994Borrell et al.
5325397Jun., 1994Scholz et al.
5432859Jul., 1995Yang et al.
5432884Jul., 1995Kapanen et al.
5507037Apr., 1996Bartkowiak et al.
5611019Mar., 1997Nakatoh et al.
5742694Apr., 1998Eatwell.
5784297Jul., 1998O'Brien, Jr. et al.
5799276Aug., 1998Komissarchik et al.
5873076Feb., 1999Barr et al.
5884255Mar., 1999Cox.
5884269Mar., 1999Cellier et al.
5963901Oct., 1999Vähätalo.
6018317Jan., 2000Dogan et al.
6044336Mar., 2000Marmarelis et al.
6134518Oct., 2000Cohen et al.
6157909Dec., 2000Mauuary et al.
6215831Apr., 2001Nowack et al.
6226613May., 2001Turin.
6266633Jul., 2001Higgins et al.
6324502Nov., 2001Handel et al.
6374221Apr., 2002Haimi-Cohen.
6377919Apr., 2002Burnett et al.
6397181May., 2002Li et al.
6438513Aug., 2002Pastor et al.
6516090Feb., 2003Lennon et al.
6546515Apr., 2003Vary et al.
6549854Apr., 2003Malinverno et al.
6708146Mar., 2004Sewall et al.
6760699Jul., 2004Weerackody et al.
6879952Apr., 2005Acero et al.
Foreign Patent Documents
0 554 083Aug., 1993EP.
0 631 402Dec., 1994EP.
0 674 306Sep., 1995EP.
0 952 589Oct., 1999EP.
0 996 112Apr., 2000EP.
1 022 583Jul., 2000EP.
1 160 768Dec., 2001EP.
1 034 441Apr., 2003EP.
2 137 052Sep., 1984GB.
2 332 054Jun., 1999GB.
2 332 055Jun., 1999GB.
2 345 967Jul., 2000GB.
2 349 717Nov., 2000GB.
2 356 106May., 2001GB.
2 356 107May., 2001GB.
2 356 313May., 2001GB.
2 356 314May., 2001GB.
2 360 670Sep., 2001GB.
2 361 339Oct., 2001GB.
2 363 557Dec., 2001GB.
2001-44926Feb., 2001JP.
WO 92/2289/1Dec., 1992WO.
WO 98/3863/1Sep., 1998WO.
WO 99/2876/0Jun., 1999WO.
WO 99/2876/1Jun., 1999WO.
WO 99/6488/7Dec., 1999WO.
WO 00/1165/0Mar., 2000WO.
WO 00/3817/9Jun., 2000WO.
WO 00/4537/5Aug., 2000WO.
WO 00/5416/8Sep., 2000WO.


Other References

Quatieri et al., "Magnitude-only estimation of handset nonlinearity with application to speaker recognition," Proceeedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 12-15, 1998, vol. 2, pp. 745 to 748.
Numerical Recipes in C by W. Press, et al., Chapter 7, Cambridge University Press (1992).
"Reversible jump Markov chain Monte Carlo Computation and Bayesian model determination" by Peter Green, Biometrika, vol. 82, pp. 711-732 (1995).
"The Simulation Smoother For Time Series Models", Biometrika, vol. 82, 2, pp. 339-350 (1995).
"Probabilistic inference using Markov chain Monte Carlo methods" by R. Neal. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto (1993).
"Fundamentals of Speech Recognition," Rabiner, et al., Prentice Hall, Englewood Cliffs, New Jersey, pp. 115 and 116, 1993.
"Bayesian Separation and Recovery of Convolutively Mixed Autoregssive Sources", Godsill, et al., ICASSP, Mar. 1999.
"Statistical Properties of STFT Ratios for Two Channel Systems and Application to Blind Source Separation", Balan, et al., Siemens Corporate Research, Princeton, N, pp. 429-434.
Bayesian Approach to Parameter Estimation and Interpolation of Time-Varying Autoregressive Interpolation of Time-Varying Autoregressive Processes Using the Gibbs Sampler, Rajan, et al., IEE Proc.-Vis. Image Signal Process., vol. 44, No. 4, Aug. 1997, pp. 249-256.
"An Introduction to the Kalman Filter", Welch, et al., Dept. of Computer Science, University of North Carolina at Chapel Hill, NC, Sep. 1997.
"Query Expansion for Imperfect Speech: Appliations In Distributed Learning", Srinivasan, et al., Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, 2000, pp. 50-54.
Couvreur, et al., "Wavelet-based Non-Parametric HMM's: Theory and Applications," Proc. International Conference Acoustics, Speech and Signal Processing, Istanbul, vol. 1, Jun. 5-9, 2000, pp. 604-607.
Hopgood, et al., "Bayesian Single Channel Blind Deconvolution Using Parametric Signal and Channel Models," Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, Oct. 17-20, 1999, pp. 151-154.
Andrieu, et al., "Bayesian Blind Marginal Separation of Convolutively Mixed Discrete Sources," IEEE Proc., 1998, pp. 43-52.

Primary Examiner: Lerner; Martin
Attorney, Agent or Firm: Fitzpatrick Cella Harper & Scinto

Claims



The invention claimed is:

1. An apparatus for determining a quality measure indicative of the quality of a speech signal, the apparatus comprising:

a receiver operable to receive a set of speech signal values representative of a speech signal generated by a speech source as distorted by a transmission channel between the speech source and the receiver;

a memory operable to store a predetermined function which includes a first part having first parameters which models said source and a second part having second parameters which models said channel and which gives, for a given set of speech signal values, a probability density for parameters of a predetermined speech model which is assumed to have generated the set of speech signal values, the probability density defining, for a given set of model parameter values, the probability that the predetermined speech model has those parameter values, given that the model is assumed to have generated the set of speech signal values;

an applicator operable to apply the set of received speech signal values to said stored function to give the probability density for said model parameters for the set of received speech signal values;

a processor operable to process said function with said set of received speech signal values applied, to derive samples of at least said first parameters from said probability density;

an analyser operable to analyse at least some of said derived samples of said at least first parameters to determine a quality measure indicative of the quality of the received speech signal values; and

an output operable to output values of said first parameters that are representative of said speech signal generated by said speech source before it was distorted by said transmission channel.

2. An apparatus according to claim 1, wherein said analyser is operable to determine a measure of the variance of said at least some of said derived samples of said at least first parameters to determine said quality measure.

3. An apparatus according to claim 2, wherein said probability density function is in terms of said variance measure and wherein said processor is operable to draw samples of said variance measure from said probability density function.

4. An apparatus according to claim 3, wherein said processor comprises a Gibbs sampler.

5. An apparatus according to claim 3, wherein said analyser is operable to determine a histogram of said drawn samples and wherein said quality measure is determined using said histogram.

6. An apparatus according to claim 5, wherein said analyser is operable to determine said quality measure using a weighted sum of said drawn samples, and wherein the weighting for each sample is determined from said histogram.

7. An apparatus according to claim 1, wherein said processor is operable to draw samples iteratively from said probability density function.

8. An apparatus according to claim 1, wherein said receiver is operable to receive a sequence of sets of speech signal values representative of an input speech signal and wherein said applicator, processor and analyser are operable to perform their respective functions with respect to each set of received speech signal values to determine a quality measure for each set of received signal values.

9. An apparatus according to claim 8, wherein said processor is operable to use the values of parameters obtained during the processing of a preceding set of signal values as initial estimates for the values of the corresponding parameters for a current set of signal values being processed.

10. An apparatus according to claim 8, wherein said sets of signal values in said sequence are non-overlapping.

11. An apparatus according to claim 1, wherein said speech model comprises an auto-regressive process model and wherein said parameters include auto-regressive model coefficients.

12. An apparatus according to claim 1, wherein said speech signal model includes a noise model having a noise parameter and wherein said quality measure is determined using said noise parameter.

13. An apparatus according to claim 1, wherein said processor is operable to determine a histogram of said derived samples and wherein said values of said first parameters are determined from said histogram.

14. An apparatus according to claim 13, wherein said processor is operable to determine said values of said first parameters using a weighted sum of said derived samples, and wherein the weighting for each sample is determined from said histogram.

15. An apparatus according to claim 1, wherein said processor is operable to derive samples of said second parameters and wherein said analyser is operable to determine said quality measure using the derived samples of said second parameters.

16. An apparatus according to claim 1, wherein said function is in terms of a set of raw speech signal values representative of speech generated by said source before being distorted by said transmission channel, wherein the apparatus further comprises a second processor operable to process the received set of signal values with initial estimates of said first and second parameters, to generate an estimate of the raw speech signal values corresponding to the received set of signal values and wherein said applicator is operable to apply said estimated set of raw speech signal values to said function in addition to said set of received signal values.

17. An apparatus according to claim 16, wherein said second processor comprises a simulation smoother.

18. An apparatus according to claim 16, wherein said second processor comprises a Kalman filter.

19. An apparatus according to claim 1, wherein said second part is a moving average model and said second parameters comprise moving average model coefficients.

20. An apparatus according to claim 1, further comprising a comparator responsive to said quality measure and operable to compare signals representative of the received speech signal with prestored models, to generate a comparison result.

21. An apparatus according to claim 20, wherein said signals representative of the speech signal are derived from said stored function.

22. An apparatus according to claim 1, further comprising an encoder operable to encode signals representative of the speech signal in dependence upon the output quality measure.

23. An apparatus for generating annotation data for use in annotating a data file, the apparatus comprising:

a receiver operable to receive a speech annotation;

an apparatus according to claim 1 for generating a quality measure indicative of the quality of the received speech annotation; and

a generator operable to generate annotation data using data representative of the received speech annotation and said quality measure.

24. An apparatus according to claim 23, further comprising a speech recogniser operable to process the speech annotation to identify words and/or phonemes within the speech annotation, wherein said annotation data comprises data identifying said words and/or phonemes.

25. An apparatus according to claim 1.

26. An apparatus according to claim 25, wherein said annotation data defines a phoneme and word lattice.

27. An apparatus for searching a database comprising a plurality of information entries to identify information to be retrieved therefrom, each of said plurality of information entries having an associated annotation and a quality measure indicative of the quality of the annotation;

a receiver operable to receive an input speech query;

an apparatus according to claim 1 for processing said input speech query to generate a quality measure therefor; and

a comparator operable to compare data representative of the input speech query with said annotations in dependence upon the quality measure of said input speech query and the corresponding quality measures of said annotations.

28. An apparatus for searching a database comprising a plurality of annotations which include annotation data and a quality measure indicative of the quality of an annotation used to generate the annotation data, the apparatus comprising:

means for receiving an input audio query;

means for determining a quality measure for the input audio query; and

means for comparing data representative of said input query with the annotation data of one or more of said annotations in dependence upon the quality measure for said input query and the corresponding quality measure for the annotation.

29. An apparatus according to claim 28, wherein said data representative of said input query and said annotation data comprise word and/or phoneme data.

30. An apparatus according to claim 28, wherein said comparing means is operable to compare said query data with said annotation data using a first comparison technique if both said quality measures exceed a predetermined threshold and is operable to compare said query data with said annotation data using a second comparison technique if either or both of said quality measures are below said predetermined threshold.

31. A method of determining a quality measure indicative of the quality of a speech signal, the method comprising the steps of:

receiving, at a receiver, a set of speech signal values representative of a speech signal generated by a speech source as distorted by a transmission channel between the speech source and the receiver;

storing a predetermined function which includes a first part having first parameters which models said source and a second part having second parameters which models said channel and which gives, for a given set of speech signal values, a probability density for parameters of a predetermined speech model which is assumed to have generated the set of speech signal values, the probability density defining, for a given set of model parameter values, the probability that the predetermined speech model has those parameter values, given that the model is assumed to have generated the set of speech signal values;

applying the set of received speech signal values to said stored function to give the probability density for said model parameters for the set of received speech signal values;

processing said function with said set of received speech signal values applied, to derive samples of at least said first parameters from said probability density;

analysing at least some of said derived samples of said at least first parameters to determine a quality measure indicative of the quality of the received speech signal values; and

outputting values of said first parameters that are representative of said speech signal generated by said speech source before it was distorted by said transmission channel.

32. A method according to claim 31, wherein said analysing step determines a measure of the variance of said at least some of said derived samples of said at least first parameters in determining said quality measure.

33. A method according to claim 32, wherein said probability density function is in terms of said variance measure and wherein said processing step draws samples of said variance measure from said probability density function.

34. A method according to claim 33, wherein said processing step uses a Gibbs sampler.

35. A method according to claim 33, wherein said analysing step determines a histogram of said drawn samples and wherein said quality measure is determined using said histogram.

36. A method according to claim 35, wherein said analysing step determines said quality measure using a weighted sum of said drawn samples, and wherein the weighting for each sample is determined from said histogram.

37. A method according to claim 31, wherein said processing step draws samples iteratively from said probability density function.

38. A method according to claim 31, wherein said receiving step receives a sequence of sets of speech signal values representative of an input speech signal and wherein said applying step, processing step, and analysing step are performed with respect to each set of received speech signal values to determine a quality measure for each set of received signal values.

39. A method according to claim 38, wherein said processing step uses the values of parameters obtained during the processing of a preceding set of signal values as initial estimates for the values of the corresponding parameters for a current set of signal values being processed.

40. A method according to claim 38, wherein said sets of signal values in said sequence are non-overlapping.

41. A method according to claim 31, wherein said speech model comprises an auto-regressive process model and wherein said parameters include auto-regressive model coefficients.

42. A method according to claim 31, wherein said speech signal model includes a noise model having a noise parameter and wherein said quality measure is determined using said noise parameter.

43. A method according to claim 31, wherein said processing step determines a histogram of said derived samples and wherein said values of said first parameters are determined from said histogram.

44. A method according to claim 43, wherein said processing step determines said values of said first parameters using a weighted sum of said derived samples, and wherein the weighting for each sample is determined from said histogram.

45. A method according to claim 31, wherein said processing step derives samples of said second parameters and wherein said analysing step determines said quality measure using the derived samples of said second parameters.

46. A method according to claim 31, wherein said function is in terms of a set of raw speech signal values representative of speech generated by said source before being distorted by said transmission channel, wherein the method further comprises a second processing step of processing the received set of signal values with initial estimates of said first and second parameters, to generate an estimate of the raw speech signal values corresponding to the received set of signal values and wherein said applying step applies said estimated set of raw speech signal values to said function in addition to said set of received signal values.

47. A method according to claim 46, wherein said second processing step uses a simulation smoother.

48. A method according to claim 46, wherein said second processing step uses a Kalman filter.

49. A method according to claim 31, wherein said second part is a moving average model and said second parameters comprise moving average model coefficients.

50. A method according to claim 31, further comprising a step of comparing signals representative of the received speech signal with prestored models to generate a comparison result and wherein said comparing step is responsive to said quality measure.

51. A method according to claim 50, wherein said signals representative of the speech signal are derived from said stored function.

52. A method according to claim 31, further comprising a step of encoding signals representative of the speech signal in dependence upon the output quality measure.

53. A method of generating annotation data for use in annotating a data file, the method comprising the steps of:

receiving a speech annotation;

performing the method according to claim 31 to generate a quality measure indicative of the quality of the received speech annotation; and

generating annotation data using data representative of the received speech annotation and said quality measure.

54. A method according to claim 53, further comprising a step of using a speech recognition unit to process the speech annotation to identify words and/or phonemes within the speech annotation, wherein said annotation data comprises said words and/or phonemes.

55. A method according to claim 31.

56. A method according to claim 55, wherein said annotation data defines a phoneme and word lattice.

57. A method of searching a database comprising a plurality of information entries to identify information to be retrieved therefrom, each of said plurality of information entries having an associated annotation and a quality measure indicative of the quality of the annotation, the method comprising the steps of:

receiving an input speech query;

using the method according to claim 31 to process said input speech query to generate a quality measure therefor; and

comparing data representative of the input speech query with said annotations in dependence upon the quality measure of said input speech query and the corresponding quality measures of said annotations.

58. A computer readable medium storing computer executable process steps to cause a programmable computer apparatus to perform the method according to claim 31.

59. Processor implementable process steps for causing a programmable computing device to perform the method according to claim 31.

60. A method of searching a database comprising a plurality of annotations which include annotation data and a quality measure indicative of the quality of an annotation used to generate the annotation data, the method comprising the steps of:

receiving an input audio query;

determining a quality measure for the input audio query; and

comparing data representative of said input query with the annotation data of one or more of said annotations in dependence upon the quality measure for said input query and the corresponding quality measure for the annotation.

61. A method according to claim 60, wherein said data representative of said input query and said annotation data comprise word and/or phoneme data.

62. A method according to claim 60, wherein said comparing step compares said query data with said annotation data using a first comparison technique if both said quality measures exceed a predetermined threshold and compares said query data with said annotation data using a second comparison technique if either or both of said quality measures are below said predetermined threshold.

63. An apparatus for determining a quality measure indicative of the quality of a speech signal, the apparatus comprising:

means for receiving a set of speech signal values representative of a speech signal generated by a speech source as distorted by a transmission channel between the speech source and the receiving means;

a memory for storing a predetermined function which includes a first part having first parameters which models said source and a second part having second parameters which models said channel and which gives, for a given set of speech signal values, a probability density for parameters of a predetermined speech model which is assumed to have generated the set of speech signal values, the probability density defining, for a given set of model parameter values, the probability that the predetermined speech model has those parameter values, given that the model is assumed to have generated the set of speech signal values;

means for applying the set of received speech signal values to said stored function to give the probability density for said model parameters for the set of received speech signal values;

means for processing said function with said set of received speech signal values applied, to derive samples of at least said first parameters from said probability density;

means for analysing at least some of said derived samples of said at least first parameters to determine a quality measure indicative of the quality of the received speech signal values; and

means for outputting values of said first parameters that are representative of said speech signal generated by said speech source before it was distorted by said transmission channel.

64. An apparatus for generating annotation data for use in annotating a data file, the apparatus comprising:

means for receiving a speech annotation;

an apparatus according to claim 63 for generating a quality measure indicative of the quality of the received speech annotation; and

means for generating annotation data using data representative of the received speech annotation and said quality measure.

65. An apparatus for searching a database comprising a plurality of information entries to identify information to be retrieved therefrom, each of said plurality of information entries having an associated annotation and a quality measure indicative of the quality of the annotation;

means for receiving an input speech query;

an apparatus according to claim 63 for processing said input speech query to generate a quality measure therefor; and

means for comparing data representative of the input speech query with said annotations in dependence upon the quality measure of said input speech query and the corresponding quality measures of said annotations.
Description



The present invention relates to an apparatus for and method of determining a quality measure indicative of the quality of an audio signal. The invention particularly relates to a statistical processing of an input speech signal to derive this quality measure.

Being able to provide a measure of the quality of an input speech signal is beneficial in a number of systems. For example, it can be used to control the way in which data files may be retrieved from a database or the way in which the speech signal may be encoded for onward transmission. The speech quality measure may also be used to control the recognition processing operation in, example, a speech recognition system.

The prior art techniques for determining a quality measure of a speech signal rely on comparing the speech signal with a "clean" reference signal. These techniques are also done off-line and are not suited to real-time speech quality determination.

One aim of the present invention is to provide an alternative technique for determining a measure of the quality of an input speech signal. In one embodiment, the determined quality measure is indicative of the signal to noise ratio for the input speech signal.

According to one aspect, the present invention provides an apparatus for determining a quality measure indicative of the quality of an audio signal, the apparatus comprising: a memory for storing a predetermined function which gives a probability density for parameters of a predetermined audio model which is assumed to have generated a set of received audio signal values; means for receiving a set of audio signal values representative of an input audio signal; means for applying a set of received audio signal values to the stored function to give the probability density for the model parameters; means for processing the function with said set of received audio signal values applied to derive samples of parameter values from said probability density; and means for analysing at least some of said derived samples of parameter values to determine a signal indicative of the quality of the received audio signal values.

In one embodiment the audio model comprises an auto-regressive (AR) part which models speech and a moving average (MA) part which models the channel between the speech source and the receiver; and wherein the speech quality measure is derived from parameters of at least one of those parts. For example, the speech quality measure may be derived from the AR parameter values or from the MA parameter values. Alternatively, it may be determined from the variance of some of these parameter values.

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings in which:

FIG. 1 is a schematic view of a computer which may be programmed to operate in accordance with an embodiment of the present invention;

FIG. 2 is block diagram illustrating the principal components of a data file annotation system;

FIG. 3 is a schematic diagram of a word and phoneme lattice for an example audio string input by a user;

FIG. 4 is block diagram illustrating the principal components of a data file retrieval system;

FIG. 5a is a flow diagram illustrating part of the flow control during a retrieval operation using the system shown in FIG. 4;

FIG. 5b is a flow diagram illustrating the remaining part of the flow control of the retrieval system shown in FIG. 4;

FIG. 6 is a block diagram representing a model employed by a statistical analysis unit which forms part of the data file annotation system shown in FIG. 2 and the data file retrieval system shown in FIG. 4;

FIG. 7 is a flow chart illustrating the processing steps performed by a model order selection unit forming part of the statistical analysis unit shown in FIGS. 2 and 4;

FIG. 8 is a flow chart illustrating the main processing steps employed by a Simulation Smoother which forms part of the statistical analysis unit shown in FIGS. 2 and 4;

FIG. 9 is a block diagram illustrating the main processing components of the statistical analysis unit shown in FIGS. 2 and 4;

FIG. 10 is a memory map illustrating the data that is stored in a memory which forms part of the statistical analysis unit shown in FIGS. 2 and 4;

FIG. 11 is a flow chart illustrating the main processing steps performed by the statistical analysis unit shown in FIG. 9;

FIG. 12a is a histogram for a model order of an auto regressive filter model which forms part of the model shown in FIG. 6;

FIG. 12b is a histogram for the variance of process noise modelled by the model shown in FIG. 6;

FIG. 12c is a histogram for a third coefficient of the AR filter model;

FIG. 13 is a block diagram illustrating the main components of an alternative data annotation system; and

FIG. 14 is a schematic block diagram illustrating the form of a user terminal which is operable to retrieve a data file from a database located within a remote server in response to an input voice query.

Embodiments of the present invention can be implemented on computer hardware, but the embodiment to be described is implemented in software which is run in conjunction with processing hardware such as a personal computer, workstation, photocopier, facsimile machine or the like.

FIG. 1 shows a personal computer (PC) 1 which may be programmed to operate an embodiment of the present invention. A keyboard 3, a pointing device 5, a microphone 7 and a telephone line 9 are connected to the PC 1 via an interface 11. The keyboard 3 and pointing device 5 allow the system to be controlled by a user. The microphone 7 converts the acoustic speech signal of the user into an equivalent electrical signal and supplies this to the PC 1 for processing. An internal modem and speech receiving circuit (not shown) may be connected to the telephone line 9 so that the PC 1 can communicate with, for example, a remote computer or with a remote user.

The program instructions which make the PC 1 operate in accordance with the present invention may be supplied for use with an existing PC 1 on, for example, a storage device such as a magnetic disc 13, or by downloading the software from the Internet (not shown) via the internal modem and telephone line 9.

Data File Annotation

The operation of a data file annotation system embodying the present invention will now be described with reference to FIG. 2. The system shown in FIG. 2 allows a user to add a voice annotation to a data file 91 for use in subsequent voice retrieval operations. In use, the user selects a data file to be annotated (which can be any kind of data file such as a video file, an audio file, a multi-media file or the like). The user then speaks the voice annotation towards microphone 7. Corresponding electrical signals output from the microphone 7 are then filtered by a filter 15 which removes unwanted frequencies (in this embodiment frequencies above 8 kHz) from the input signal. The filtered signal is then sampled (at a rate of 16 kHz) and digitised by an analogue to digital converter 17. The digitised speech samples are then stored in a buffer 19. Sequential blocks (or frames) of speech samples are then passed from the buffer 19 to a statistical analysis unit 21 which performs a statistical analysis of each frame of speech samples in sequence to determine a set of auto regressive (AR) coefficients representative of the speech within the frame and a measure of the quality of the input speech. In this embodiment, the quality measure is the variance of the AR coefficients.

The quality measure is output to a speech quality assessor 93 and the AR coefficients are output to a speech recognition unit 97. The speech recognition unit 25 compares the AR coefficients for successive frames of speech with a set of stored speech models (not shown), which may be template based or Hidden Markov model based, to generate a recognition result. In this embodiment, the speech recognition unit 97 outputs words and phonemes corresponding to the spoken annotation input by the user. As shown in FIG. 2, the output words and phonemes are input to a data file annotation unit 99 which also receives an assessment of the speech quality output by the speech quality assessor 93. In this embodiment, the speech quality assessor 93 determines whether or not the input speech is of a high quality (i.e. not disturbed by high levels of background noise) based on the variance data received from the statistical analysis unit 21. In particular, the variance of the AR coefficients should be smaller when the speech input is of a high quality than when there are high levels of noise. The data file annotation unit 99 then generates an annotation for the data file 91 from the words and phonemes output by the speech recognition unit 97 and the speech quality assessment output by the speech quality assessor 93. The data file 91 is then stored in the data file database 101 and the corresponding annotation data is stored in the annotation database 103.

As those skilled in the art will appreciate, the speech quality assessment which is stored with the annotation data is useful for subsequent retrieval operations. In particular, when the user wishes to retrieve a data file 91 from the database 101 (using a voice query), it is useful to know the quality of the speech that was used to annotate the data file and/or the quality of the voice query used to retrieve the data file, since this will affect the retrieval performance. More specifically, if the voice annotation is of a high quality and the user's voice query is also of a high quality, then a stringent search of the annotation database 103 should be performed, in order to reduce the amount of false identifications. In contrast, if the original voice annotation is of a low quality or if the user's voice query is of a low quality, then a less stringent search of the annotation database 103 should be performed so that there is a greater chance of retrieving the correct data file 91. The way in which this search is carried out will be described in more detail below.

In this embodiment, the phoneme and word annotation data for a data file is stored in the annotation database 103 as a phoneme and word lattice. FIG. 3 schematically illustrates the form of the word and phoneme lattice generated for the spoken annotation "picture of the Taj Mahal". As shown, the word and phoneme lattice identifies a number of different phoneme and word strings which correspond to this spoken utterance. The phoneme and word lattice is an acyclic directed graph with a single entry point and a single exit point. It represents different parses of the spoken annotation. It is not simply a sequence of words with alternatives since each word does not have to be replaced by a single alternative, one word can be substituted for two or more words or phonemes and the whole structure can form a substitution for one or more words or phonemes. As those skilled in the art of speech recognition will realise, the use of phoneme data in addition to word data is more robust, because phonemes are dictionary independent and allow the system to cope with out of vocabulary words, such as names, places, foreign words etc. The use of phoneme data is also capable of making the system future proof, since it allows data files which are placed into the database to be retrieved even when the words are not understood by the original automatic speech recognition system.

In this embodiment, the annotation data stored in the annotation database 103 has the following general form:
    • Header
      • time of start
      • flag if word if phoneme if mixed
      • time index associating the location of blocks of annotation data within memory to a given time point
      • word set used (i.e. the dictionary)
      • phoneme set used
      • the language to which the language pertains
      • speech quality assessment
    • block(i) i=0, 1, 2, . . .
      • node Nj j=0, 1, 2, . . .
        • time offset of node from start of block
        • phoneme links(k) k=0, 1, 2, . . .
          • offset to node Nj=Nk-Nj (Nk is node to which link k extends) or if Nk is in block(i+1) offset to node Nj=Nk+Nb-Nj (where Nb is the number of nodes in block(i))
          • phoneme associated with link(k)
        • word links(l) l=0, 1, 2 . . .
          • offset to node Nj=Ni-Nj (Nj is node to which link l extends) or if Nk is in block(i+1) offset to node Nj=Nk+Nb-Nj (where Nb is the number of nodes in block(i))
          • word associated with link(l)


  • The time of start data in the header can identify the time and date of transmission of the data. For example the time of start may include the exact time of the spoken annotation and the date on which it was spoken.

    The flag identifying if the annotation data is word annotation data, phoneme annotation data or if it is mixed is provided since not all of the annotation data in the annotation database 103 will include the combined phoneme and word lattice annotation data discussed above, and in this case, a different search strategy may be used to search this annotation data.

    In this embodiment, the annotation data is divided into blocks in order to allow the search to jump into the middle of the annotation for a given audio data stream. The header therefore includes a time index which associates the location of the blocks of annotation data within the memory to a given time offset between the time of start and the time corresponding to the beginning of the block.

    The header also includes data defining the word set used (i.e. the dictionary), the phoneme set used and the language to which the vocabulary pertains. The header may also include details of the automatic speech recognition system used to generate the annotation data and the appropriate settings thereof which are used during the generation of the annotation. Finally, as discussed above, the header also includes the speech quality assessment which identifies whether or not the spoken annotation is of a high quality.

    The blocks of annotation data then follow the header and identify, for each node in the block, the time offset of the node from the start of the block, the phoneme links which connect that node to other nodes by phonemes and word links which connect that node to other nodes by words. Each phoneme link and word link identifies the phoneme or word which is associated with the link and the offset to the current node. For example, if node N50 is linked to node N55 by a phoneme link, then the offset to node N50 for that link is 5. As those skilled in the art will appreciate, using an offset indication like this allows the division of the continuous annotation data into separate blocks.

    Data File Retrieval

    FIG. 4 is a block diagram illustrating the form of a data file retrieval system which can be used to retrieve the annotation data files from the database 101. This system may be, for example, a personal computer, a hand held device or the like. As shown, in this embodiment, the retrieval system is similar to the speech annotation systems shown in FIG. 2 except that the data file annotation unit 99 is replaced with a data file retrieval unit 102, and a display 105 is provided for displaying the search results. In operation, an input voice query is processed in the same way as the spoken annotation described above. The phoneme and word data corresponding to the user's input query is output from the speech recognition unit 97 to the data file retrieval unit 102. The data file retrieval unit 102 then searches the annotation database 103 using the generated phoneme and word data and a speech quality assessment output by the speech quality assessor 93 for the input query. The results of the search are then output to the user on the display 105.

    FIGS. 5a and 5b are flow charts illustrating the flow control of the retrieval system shown in FIG. 4. As shown, initially in step s101, the system awaits an input query by the user. Upon receipt of the query, the system generates in step s103, phoneme and word data and a quality assessment for the input query. Processing then proceeds to step s105 where the data file retrieval unit 102 performs a word search in the annotation database 103 using the words in the query. The processing then proceeds to step s107 where the data file retrieval unit 102 determines whether or not a match has been found. If it has, then the data file retrieval unit 102 displays the results to the user on the display 105.

    In this embodiment, the system then allows the user to consider the search results and awaits the user's confirmation as to whether or not the results correspond to the data file the user wishes to retrieve. If it is, then the processing proceeds from step sill to the end of the processing and the system returns to its idle state and awaits the next input query. If, however, the user indicates (by, for example, inputting an appropriate voice command) that the search results do not correspond to the desired data file, then the processing proceeds from step sill to step s112, where the data file retrieval unit 102 determines whether or not the user's input query is of a high quality. If it is not, then the processing proceeds to step s113 where the data file retrieval unit 102 uses the results of the word search to select a number of annotations and then performs a "relaxed" phoneme search of the selected annotations. The phoneme search is "relaxed" in the sense that the data file retrieval unit 102 does not discard annotations unless the phonemes of the annotation are very different to the phonemes for the input query.

    If, on the other hand, the system determines at step s112 that the input query is of a high quality, then the processing proceeds to step s114 where the data file retrieval unit 102 again uses the results of the word search to select annotations and then uses a relaxed phoneme search for the selected annotations having a low quality assessment and a "stringent" phoneme search for annotations having a high quality assessment. The phoneme search is "stringent" in the sense that the data file retrieval unit 102 discards annotations quickly in the searching operation if there are significant differences between the annotation phonemes and the query phonemes.

    After the phoneme searches have been performed, the processing proceeds to step s115 where the data file annotation unit 102 determines whether or not a match has been found. If a match has been found then the processing proceeds to step s117 where the results are displayed to the user on the display 105. If the search results are correct, then processing proceeds from step s119 to the end of the processing and the system returns to its idle state and awaits the next input query. If, on the other hand, the user indicates that the search results still do not correspond to the desired data file, then the processing passes to step s121 where the data file retrieval unit 102 queries the user, via the display 105, whether or not a phoneme search should be performed of the whole annotation database 103. If in response to this query, the user indicates that such a search should be performed, then the processing proceeds to step s<


    Free Web Sudoku Puzzles.
    Solve with your browser.
    4   9   8     3 2
    6   1            
    2     1          
        4   3 2      
          4 6 8      
          7 1   9    
              1     7
                5   9
    3 7     5   8   4
    What is it?



    Add Your Site · Terms Of Service · Privacy Policy


    DISCLAIMER
    Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

    For More Specific Information VIEW OUR TERMS OF SERVICE.

    Thank you and Enjoy!