Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Top 10 Reasons NOT to Start Your Own Business
Category:
Business  

How to Help Someone Else Get Organized Without the Headaches
Category:
Business  

Health Club Secrets 5 Monster Tips For Choosing a Health Club
Category:
Health / Fitness  

Chronic Pain Syndrome And Chronic Pain Management And Treatment ...
Category:
Health / Fitness  

The Marketing Shack Express Marketing Ideas
Category:
Marketing  

A Fold that s worth a Thousand Gain
Category:
Business  

Chronic Fatigue Syndrome Myth or Malady
Category:
Health / Fitness  

Use Affiliate Programs for Home Business Income
Category:
Business  

Randomizer Scripts Are all Randomizer Sites Scams
Category:
Business  

How To Avoid These Common Affiliate Mistakes
Category:
Marketing  

Article Writing for the Nervous
Category:
Marketing  

11 Hot Tips to Make Google Adwords Pay Part 3
Category:
Business  

Weight Loss FAQ
Category:
Health / Fitness  

Create a Resource Website for Your Company
Category:
Business  

Making a Living Online
Category:
Marketing  

Trade Show Display Associations Have Ideas You Can Use
Category:
Business  

Asthmatics don t suffer at altitude
Category:
Health / Fitness  

Why are American s Small Businesses Failing at Such Alarming Rat...
Category:
Business  

Have You Fed Your Anxiety Today
Category:
Health / Fitness  

Adipex and the success story of weight loss
Category:
Health / Fitness  

10 Incredible Ways To Sell Your Products Now
Category:
Marketing  

Think Twice About Going To The Emergency Room For Back Pain
Category:
Health / Fitness  

Warning Don t Let Your Business Become a Commodity
Category:
Business  

Avoid Home Business Scams
Category:
Business  

Hybrid cars How They Operate
Category:
Business  

10 Ways To Boost Your E zine Subscribers
Category:
Business  

How To Uncover The Deepest Secrets For Choosing The Potential Af...
Category:
Marketing  

Smoking in the 21st century
Category:
Health / Fitness  

What Is The Big ‘R For Marketing Your Business
Category:
Marketing  

Turn Your Competitors into Collaborators
Category:
Business  

The Language of Success A Different Way to Profit from Your Busi...
Category:
Business  

Are you helping by asking Did you take your meds
Category:
Health / Fitness  

Business Success Without the Blindfold
Category:
Business  

What are Asset Labels Asset Tags Property Labels or Identificati...
Category:
Business  

How To Break Into The World of Internet Business Without A Websi...
Category:
Business  

Can Stress Play a Role In Hair Loss
Category:
Health / Fitness  

How to Wipe Out Overwhelm
Category:
Business  

African Americans and Hispanics Top Phone Users
Category:
Business  

Emotional Strengthening 1 Basic Training for the Alzheimer s Car...
Category:
Health / Fitness  

Dry Skin And Water
Category:
Health / Fitness  

Your Inherited Biological Nutritional Key
Category:
Health / Fitness  

Work At Home Mothers Are You Going Through A Difficult Phase
Category:
Business  

Life After Sugar Complex Carbohydrates Made Simple
Category:
Health / Fitness  

Eye Surgery Providers TLC Laser Eye Center
Category:
Health / Fitness  

What are the symptoms of Mesothelioma
Category:
Health / Fitness  

Does Chiropractic Care Really Make Sense
Category:
Health / Fitness  

All directory small business guide Part one
Category:
Business  

Why is it so hard to get ahead
Category:
Business  

History and Health Benefits of Echinacea
Category:
Health / Fitness  

How to Hire a DUI Attorney in Connecticut
Category:
Business  

OEM or Aftermarket Detailed Version
Category:
Business  

Global Warming
Category:
Health / Fitness  

The Twist and Shout
Category:
Business  

Master This 7 Part Breakout Formula to Start Your Own Business
Category:
Business  

Natural Testosterone Supplements
Category:
Health / Fitness  

Health Care Facilities A Profitable Niche for Your Cleaning Busi...
Category:
Business  

The Whole Truth About Acne Rosacea
Category:
Health / Fitness  

Atheists Agnostics and Evolutionists The Worst Gamblers in the W...
Category:
Entertainment / Television  

Immune Support Products and Why We Need Them
Category:
Health / Fitness  

Vitamins for Youth Health and Healing Check Out Vitamin E
Category:
Health / Fitness  

Natural Mood Enhancer Supplements
Category:
Health / Fitness  

Natural Testosterone Supplements
Category:
Health / Fitness  

Web Hosting The Most Important Aspect of Your Internet Business
Category:
Business  

Using Banner Stands to Increase Trade Show Traffic
Category:
Business  

How to Attract Targeted Leads Simply and Quickly
Category:
Business  

Become Healthier Become Fitter
Category:
Health / Fitness  

Reading Your Financial Statements What Every Entrepreneur Must K...
Category:
Business  

Corporate Career Development Networking
Category:
Business  

Conflict Leadership And The Leadership Talk
Category:
Business  

Information A Top Seller
Category:
Business  

5 Money Making Tips on How To Earn Hundreds of Dollars With Focu...
Category:
Business  

Buying Chainsaws Online
Category:
Health / Fitness  

Ditch Clutter to Tune In Your Intuitive Vision
Category:
Business  

Hardening of the arteries beyond blood vessels
Category:
Health / Fitness  

How to know if you are ready for psychotherapy
Category:
Health / Fitness

Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources Number:6,801,604 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Former DRC Warlord Brought Before ICC Amid Doubts by Brent Latham
     Tanzania Devises Plan to Cope with Avian Flu Outbreak (Part 1/5) by Douglas Mpuga
     Kenyan Finance Minister Continues to Defy Calls to Step Down by Derek Kilner

Title: Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

Patent Number: 6,801,604 Issued on 10/05/2004 to Maes,   et al.


Inventors: Maes; Stephane H. (Danbury, CT), Lubensky; David M. (Brookfield, CT), Sakrajda; Andrzej (White Plains, NY)
Assignee: International Business Machines Corporation (Armonk, NY)
Appl. No.: 10/183,125
Filed: June 25, 2002


Current U.S. Class: 379/88.17 ; 379/88.16; 704/270.1; 709/203; 709/231
Field of Search: 379/88.01-88.04,88.16,88.17,88.23-88.25 704/270-275 717/114,116 709/228-231,201-203,249,250


References Cited [Referenced By]

U.S. Patent Documents
2002/0184373 December 2002 Maes
2002/0194388 December 2002 Boloker et al.
2003/0005174 January 2003 Coffman et al.
2003/0088421 May 2003 Maes et al.
Primary Examiner: Foster; Roland
Attorney, Agent or Firm: F. Chau & Associates, LLC

Parent Case Text



CROSS REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Application Ser. No. 60/300,755, filed on Jun. 25, 2001, which is incorporated herein by reference.
Claims



What is claimed is:

1. A distributed speech processing system, comprising: a conversational application and a task manager that abstracts from the conversational application, the discovery and remote control of audio I/O and speech engine services; an audio I/O processing service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide audio I/O services for the conversational application; and a speech engine service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide speech processing services for the conversational application.

2. The system of claim 1, wherein the audio I/O processing service and speech engine service comprise Web services.

3. The system of claim 1, wherein the control messages are encoded using XML (eXtensible Markup Language) and wherein the control messages are exchanged using SOAP (Simple Object Access Protocol).

4. The system of claim 1, wherein each service comprises interfaces that are described using WSDL (Web Services Description Language).

5. The system of claim 4, wherein WSFL (web services flow language) or an extension thereof is used to dynamically configure the processing flow of the system.

6. The system of claim 1, wherein the speech engine service provides one of automatic speech processing (ASR) services, text-to-speech (TTS) synthesis services, natural language understanding (NLU) services, and a combination thereof.

7. The system of claim 1, wherein the audio I/O processing service provides speech encoding/decoding services, audio recording services, audio playback services, and a combination thereof.

8. The system of claim 1, further comprising a load manager that dynamically allocates and assigns the services for the conversational application, based on control messages generated by the task manager on behalf of the conversational application.

9. The system of claim 1, wherein the services are programmed to negotiate uplink and downlink audio codecs for generating RTP-based audio streams.

10. The system of claim 1, wherein the speech engine services are dynamically allocated to the conversational application on one of a call, session, utterance and persistent basis.

11. The system of claim 1, wherein the services are discoverable using UDDI (Universal Description, Discovery and Integration) or an extension thereof.

12. The system of claim 1, wherein services provided by the speech engine service and audio I/O processing service are defined as a collection of ports.

13. The system of claim 12, wherein types of ports comprise audio in, audio out, control in, and control out.

14. The system of claim 1, wherein the audio I/O service comprises a gateway that connects audio streams from a network to the speech processing services.

15. The system of claim 14, wherein the network comprises a PSTN (public switched telephone network).

16. The system of claim 14, wherein the network comprises a VoIP (voice over IP) network.

17. The system of claim 14, wherein the network comprises a wireless network.

18. The system of claim 1, wherein the distributed speech processing system comprises an interactive voice response (IVR) system, and wherein the system further comprises a telephony gateway, wherein the telephony gateway is abstracted from the conversational application and wherein the telephony gateway receives and processes an incoming call to assign the call to a conversational application.

19. A speech processing web service, comprising: a listener for receiving and parsing control messages that are used for programming the speech processing web service, wherein the control message are encoded using XML (eXtensible Markup Language) and exchanged using SOAP (Simple Object Access Protocol); a business interface layer for exposing speech processing services offered by the web service, wherein the services are described and accessed using WSDL (web services description language); and a business logic layer for providing speech processing services, the speech processing services comprising one of automatic speech recognition, speech synthesis, natural language understanding, acoustic feature extraction, audio encoding/decoding, audio recording, audio playback, and any combination thereof.

20. The speech processing web service of claim 19, wherein a service of the speech processing web service is dynamically allocated and assigned to a conversational application and programmed by the conversational application.

21. The speech processing web service of claim 19, wherein the web service is advertised via UDDI.

22. A method for providing distributed speech processing, comprising the steps of: receiving an incoming call by a client application; assigning the call to an application having a task manager that is abstracted from the application for discovering and controlling speech processing services including audio I/O and speech engine services; the task manager generating a control message to a router/load manager for requesting a speech processing service on behalf of the application to service the incoming call; the router/load manager dynamically allocating a speech processing service to the application and providing an address of the allocated speech processing service to the task manager; the task manager generating a control message for dynamically programming the allocated speech service based on requirements of the application; and the application processing the incoming call using the programmed speech service.
Description



TECHNICAL FIELD

The present invention relates generally to systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are implemented as programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). The invention is further directed to systems and methods for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

BACKGROUND

Telephony generally refers to any telecommunications system involving the transmission of speech information in either wired or wireless environments. Telephony applications include, for example, IP telephony and Interactive Voice Response (IVR), and other voice processing platforms. IP telephony allows voice, data and video collaboration through existing IP telephony-based networks such as LANs, WANs and the Internet as well as IMS (IP multimedia services) over wireless networks. Previously, separate networks were required to handle traditional voice, data and video traffic, which limited their usefulness. Voice and data connections where typically not available simultaneously. Each required separate transport protocols/mechanisms and infrastructures, which made them costly to install, maintain and reconfigure and unable to interoperate. Currently, various applications and APIs are commercially available that that enable convergence of PSTN telephony and telephony over Internet Protocol networks and 2.5G/3G wireless networks. There is a convergence among fixed, mobile and nomadic wireless networks as well as with the Internet and voice networks, as exemplified by 2.5G, 3G and 4G.

IVR is a technology that allows a telephone-based user to input or receive information remotely to or from a database. Currently, there is widespread use of IVR services for telephony access to information and transactions. An IVR system typically (but not exclusively) uses spoken directed dialog and generally operates as follows. A user will dial into an IVR system and then listen to an audio prompts that provide choices for accessing certain menus and particular information. Each choice is either assigned to one number on the phone keypad or associated with a word to be uttered by the user (in voice enabled IVRs) and the user will make a desired selection by pushing the appropriate button or uttering the proper word.

By way of example, a typical banking ATM transaction allows a customer to perform money transfers between savings, checking and credit card accounts, check account balances using IVR over the telephone, wherein information is presented via audio menus. With the IVR application, a menu can be played to the user over the telephone, whereby the menu messages are followed by the number or button the user should press to select the desired option: a. "for instant account information, press one;" b. "for transfer and money payment, press two;" c. "for fund information, press three;" d. "for check information, press four;" e. "for stock quotes, press five;" f. "for help, press seven;" etc.

To continue, the user may be prompted to provide identification information. Over the telephone, the IVR system may playback an audio prompt requesting the user to enter his/her account number (via DTMF or speech), and the information is received from the user by processing the DTMF signaling or recognizing the speech. The user may then be prompted to input his/her SSN and the reply is processed in a similar way. When the processing is complete, the information is sent to a server, wherein the account information is accessed, formatted to audio replay, and then played back to the user over the telephone.

An IVR system may implement speech recognition in lieu of, or in addition to, DTMF keys. Conventional IVR applications use specialized telephony hardware and IVR applications use different software layers for accessing legacy database servers. These layers must be specifically designed for each application. Typically, IVR application developers offer their own proprietary speech engines and APIs (application program interface). The dialog development requires complex scripting and expert programmers and these proprietary applications are typically not portable from vendor to vendor (i.e., each application is painstakingly crafted and designed for specific business logic). Conventional IVR applications are typically written in specialized script languages that are offered by manufacturers in various incarnations and for different hardware platforms. The development and maintenance of such IVR applications requires qualified staff. Thus, current telephony systems typically do not provide interoperability, i.e., the ability of software and hardware on multiple machines from multiple vendors to communicate meaningfully.

VoiceXML is a markup language that has been designed to facilitate the creation of speech applications such as IVR applications. Compared to conventional IVR programming frameworks that employ proprietary scripts and programming languages over proprietary/closed platforms, the VoiceXML standard provides a declarative programming framework based on XML (eXtensible Markup Language) and ECMAScript (see, e.g., the W3C XML specifications (www.w3.org/XML) and VoiceXML forum (www.voicexml.org)). VoiceXML is designed to run on web-like infrastructures of web servers and web application servers (i.e. the Voice browser). VoiceXML allows information to be accessed by voice through a regular phone or a mobile phone whenever it is difficult or not optimal to interact through a wireless GUI micro-browser.

More importantly, VoiceXML is a key component to building multi-modal systems such as multi-modal and conversational user interfaces or mobile multi-modal browsers. Multi-modal solutions exploit the fact that different interaction modes are more efficient for different user interactions. For example, depending on the interaction, talking may be easier than typing, whereas reading may be faster than listening. Multi-modal interfaces combine the use of multiple interaction modes, such as voice, keypad and display to improve the user interface to e-business. Advantageously, multi-modal browsers can rely on VoiceXML browsers and authoring to describe and render the voice interface.

There are still key inhibitors to the deployment of compelling multi-modal applications. Most arise out of the current infrastructure and device platforms. Indeed, the current networking infrastructure is not configured for providing seamless, multi-modal access to information. Indeed, although a plethora of information can be accessed from servers over a communications network using an access device (e.g., personal information and corporate information available on private networks and public information accessible via a global computer network such as the Internet), the availability of such information may be limited by the modality of the client/access device or the platform-specific software applications with which the user is interacting to obtain such information. For instance, current wireless network infrastructure and handsets do not provide simultaneous voice and data access. Middleware, interfaces and protocols are needed to synchronize and manage the different channels. In light of the ubiquity of IP-based networks such as the Internet, and the availability of a plethora a services and resources on the Internet, the advantages of open and interoperable telephony systems are particularly compelling for voice processing applications such as IP telephony systems and IVR.

Another hurdle is that development of multi-modal/conversational applications using current technologies requires not only knowledge of the goal of the application and how the interaction with the users should be defined, but a wide variety of other interfaces and modules external to the application at hand, such as (i) connection to input and output devices (telephone interfaces, microphones, web browsers, palm pilot display); (ii) connection to variety of engines (speech recognition, natural language understanding, speech synthesis and possibly language generation); (iii) resource and network management; and (iv) synchronization between various modalities for multi-modal or conversational applications.

Accordingly, there is strong desire for development of distributed conversational systems having scalable and flexible architectures, which enable implementation of such systems over a wide range of application environments and voice processing platforms.

SUMMARY OF THE INVENTION

The present invention relates generally to systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are implemented as programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)).

The invention is further directed to systems and methods for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

In one preferred embodiment, a SERCP framework, which is used for speech engine remote control and network and system load management, is implemented using an XML-based web service framework wherein speech engines and resources comprise programmable services, wherein (i) XML is used to represent data (and XML Schemas to describe data types); (ii) an extensible messaging format is based on SOAP; (iii) an extensible service description language is based on WSDL, or an extension thereof, as a mechanism to describe the commands/interface supported by a given service; (iv) UDDI (Universal Description, Discovery, and Integration) is used to advertise and locate the service; and wherein (v) WSFL (Web Service Flow Language) is used to provide a generic mechanism from combining speech processing services through flow composition.

A conversational system according to an embodiment of the present invention assumes an application environment in which a conversational application comprises a collection of audio processing engines (e.g., audio I/O system, speech processing engines, etc.) that are dynamically associated with an application, wherein the exchange of audio between the audio processing engines is decoupled from control and application level exchanges and wherein the application generates control messages that configure and control the audio processing engines in a manner that renders the exchange of control messages independent of the application model and location of the engines. The speech processing engines can be dynamically allocated to the application on either a call, session, utterance or persistent basis.

Preferably, the audio processing engines comprise web services that are described and accessed using WSDL (Web Services Description Language), or an extension thereof.

In yet another aspect, a conversational system comprises a task manager, which is used to abstract from the application, the discovery of the audio processing engines and remote control of the engines.

The systems and methods described herein may be used in various frameworks. One framework comprises a terminal-based application (located on the client or local to the audio subsystem) that remotely controls speech engine resources. One example of a terminal based application includes a wireless handset-based application that uses remote speech engines, e.g., a multimodal application in "fat client configuration" with a voice browser embedded on the client that uses remote speech engines. Another example of a terminal-based application comprises a voice application that operates on a client having local embedded engines that are used for some speech processing tasks, and wherein the voice application uses remote speech engines when (i) the task is too complex for the local engine, (ii) the task requires a specialized engine, (iii) it would not be possible to download speech data files (grammars, etc . . . ) without introducing significant delays, or (iv) when for IP, security or privacy reasons, it would not be appropriate to download such data files on the client or to perform the processing on the client or to send results from the client.

Another usage framework for the invention is to enable an application located in a network to remotely control different speech engines located in the network. For example, the invention may be used to (i) distribute the processing and perform load balancing, (ii) allow the use of engines optimized for specific tasks, and/or to (iii) enable access and control of third party services specialized in providing speech engine capabilities.

These and other aspects, features, and advantages of the present invention will become apparent from the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a speech processing system according to an embodiment of the present invention.

FIG. 2 is a block diagram of a speech processing system according to an embodiment of the invention.

FIGS. 3a-3d are diagrams illustrating application frameworks that can be implemented in a speech processing system according to the invention.

FIG. 4 is a block diagram of a speech processing system according to an embodiment of the invention, which uses a conversational browser.

FIG. 5 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 6 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 7 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 8 is a flow diagram of a method for processing a call according to one aspect of the invention.

FIG. 9 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 10 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 11 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 12 is a block diagram of a speech processing system according to an embodiment of the invention.

FIG. 13 is a block diagram illustrating a DSR system that may be implemented in a speech processing system according to an embodiment of the invention.

FIG. 14 is a block diagram of a web service system according to an embodiment of the invention.

FIG. 15 is a diagram illustrating client/server communication using a DSR protocol stack according to an embodiment of the present invention.

FIG. 16 is a diagram illustrating client/server communication of SERCP (speech engine remote control protocol) data exchanges according to an embodiment of the present invention.

FIG. 17 is a block diagram of a web service system according to another embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention is directed to systems and method for implementing universal IP-based and scalable conversational applications and platforms that are interoperable across a plurality of conversational applications, programming or execution models and systems. The term "conversational" and "conversational computing" as used herein refers to seamless, multi-modal (or voice only) dialog (information exchanges) between user and machine and between devices or platforms of varying modalities (I/O capabilities), regardless of the I/O capabilities of the access device/channel, preferably, using open, interoperable communication protocols and standards, as well as a conversational programming model (e.g., conversational gesture-based markup language) that separates the application data content (tier 3) and business logic (tier 2) from the user interaction and data model that the user manipulates. Conversational computing enables humans and machines to carry on a dialog as natural as human-to-human dialog.

Further, the term "conversational application" refers to an application that supports multi-modal, free flow interactions (e.g., mixed initiative dialogs) within the application and across independently developed applications, preferably using short term and long term context (including previous input and output) to disambiguate and understand the user's intention. Preferably, conversational applications utilize NLU (natural language understanding). Multi-modal interactive dialog comprises modalities such as speech (e.g., authored in VoiceXML), visual (GUI) (e.g., HTML (hypertext markup language)), constrained GUI (e.g., WML (wireless markup language), CHTML (compact HTML), HDML (handheld device markup language)), and a combination of such modalities (e.g., speech and GUI). Further, the invention supports voice only (mono-modal) machine driven dialogs and any level of dialog capability in between voice only and free flow multimodal capabilities. As explained below, the invention provides a universal architecture that can handle all these types of capabilities and deployments.

Conversational applications and platforms according to the present invention preferably comprise a scalable and flexible framework that enables deployment of various types of applications and application development environment to provide voice access using various voice processing platforms such as telephony cards and IVR systems over networks/mechanisms such as PSTN, wireless, Internet, and VoIP networks/gateways, etc. A conversational system according to the invention is preferably implemented in a distributed, multi-tier client/server environment, which decouples the conversational applications from distributed speech engines and the telephony/audio I/O components. A conversational platform according to the invention is preferably interoperable with the existing WEB infrastructure to enable delivery of voice applications over telephony, for example, taking advantage of the ubiquity of applications and resources available over the Internet. For example, preferred telephony applications and systems according to the invention enable business enterprises and service providers to give callers access to their business applications and data, anytime, anyplace using any telephone or voice access device.

Referring now to FIG. 1, a block diagram illustrates a conversational system 10 according to an embodiment of the invention. The system 10 comprises a client voice response system 11 that executes on a host machine, which is based, for example, on a AIX, UNIX, or DOS/Windows operating system platform. The client application 11 provides the connectivity to the telephone line (analog or digital), other voice networks (such as IMS, VoIP, etc., wherein the application 11 may be considered as a gateway to the network (or a media processing entity), and other voice processing services (as explained below). Incoming calls/connections are answered by an appropriate client application running on the host machine. More specifically, the host machine can be connected to a PSTN, VoIP network, wireless network, etc., and accessible by a user over an analog telephone line or an IDSN (Integrated Services Digital Network) line, for example. In addition, the host client machine 11 can be connected to a PBX (private branch exchange) system, central office or automatic call distribution center, a VoIP gateway, a wireless support node gateway, etc. The host comprises the appropriate software and APIs that allows the client application 11 to interface to various telephone systems and video phones systems, such as PSTN, digital ISDN and PBX access, VoIP gateway the voice services on the servers. The system 10 is preferably operable in various connectivity environments including, for example, T1, E1, ISDN, CAS, SS7, VoIP, wireless, etc.

The voice response system 11 (or gateway) comprises client enabling code that operates with one or more application servers and conversational engine servers over an IP (Internet Protocol)-based network 13. The IP network 13 may comprise, a LAN, WAN, or a global communication network such as the Internet or wireless network (IMS). In one exemplary embodiment, the host 11 machine comprises an IBM RS/6000 computer that comprises Direct Talk (DT/6000).RTM., a commercially available platform for voice processing applications. Direct Talk.RTM. is a versatile voice processing platform that provides expanded functionality to IVR applications. DirectTalk enables the development and operation of automated customer service solutions for various enterprises and service providers. Clients, customers, employees and other users can interact directly with business applications using telephones connected via public or private networks. DirectTalk supports scalable solutions from various telephony channels operating in customer premises or within telecommunication networks. It is to be understood, however, that the voice response system 11 may comprise any application that is accessible via telephone to provide telephone access to one or more applications and databases, provide interactive dialog with voice response, and data input via DTMF (dual tone multi-frequency). It is to be appreciated that other gateways and media processing entities can be considered.

The system 10 further comprises one or more application servers (or web servers) and speech servers that are distributed over the network 13. The system 10 comprises one or more conversational applications 14 and associated task managers 15. The conversational applications 14 comprise applications such as sales order processing, debt collection, customer service, telephone banking, telephone airline reservations, insurance. The conversational application 14 may comprise a unified messaging application (combined voice and e-mail). The application 14 may reside on an application server (or WEB server). The application 14 can be programmed using industry standards such as Java and VoiceXML (a standards-based programming model for writing interactive voice applications). In other embodiments, the application development environment may comprise conventional IVR programming (state tables or beans, as supported by DirectTalk), imperative programming (e.g., C/C++, Java), scripting (Tel, Perl, . . . ), declarative programming (XML, VoiceXML), scripts and a combination of imperative and declarative programming. The applications 14 are preferably programmed as multi-channel, multi-modal, multi-device and/or conversational free flow applications. Various programming paradigms (non exhaustive) that may be used in the system 10 are described in further detail below with reference to FIGS. 3a-3d, for example.

The task manager 15 can be either common across a plurality of applications or associated with one application (as in the illustrative embodiment of FIG. 1). The task manager 15 is responsible for task execution and acquiring and partitioning resources (e.g., speech engines, data files). Control messages (e.g., XML messages, SOAP message, WSDL/SOAP messages, i.e., based on a web service framework as discussed in detail below) are exchanged between the application 14 and task manager 15 to control, for example, the audio source (e.g., barge-in), "Audio In" events (e.g., ASR), "Audio Out" events (e.g., TTS), configuration (static/dynamic) and registration with a router and load manager 21 (including configuration and exchanges (events and results) between the engines and audio subsystem). Preferably, the application and task manager comprise an XML parser. In one embodiment, the control messages can be passed via Sockets communication. In one preferred embodiment, control messages between the application 14 and task manager 15 are based on a web services framework that implements SOAP/WSDL (as discussed below) or through other APIs or communication protocols. Preferred control messages will be discussed in further detail below.

The system further comprises a plurality of speech servers (or speech engines), which may reside on the same host or different hosts in the network 13. The speech servers comprise, for example, an ASR (automatic speech recognition) server 16, a NL (natural language) parser server 17, a TTS (text-to-speech) server 18, and a SPID (speaker identification/verification) server 19, and any other processors that are used for pre/post processing the audio (e.g., uplink/downlink codecs for encoding/decoding audio data transmitted between the audio subsystem and engines), and the results of other engines.

The ASR server 16 comprises a speech recognition engine for enabling the system to understand spoken utterances of a user for the purpose of processing commands and data input received from the user. The NL parser 17, which is employed for natural language understanding applications, receives data streams from the ASR server 16 and processes the data.

The TTS server 18 comprises a TTS engine to convert text to speech in the telephony environment. The TTS server 18 enables the application 14 to play synthesized prompts at runtime from a text string selected by the application 14, which is useful in circumstances for which a prerecorded prompt is not available. The TTS 18 reduces the need for large number or prerecorded prompts, which can be effectively replaced by dynamically changing text strings. The TTS 18 preferably supports barge-in so that synthesized speech can be interrupted by a caller in the same manner as a prerecorded prompt.

The SPID server 19 (speaker identification and/or verification engine) is used for speaker recognition when the system supports biometric identification/verification.

It is to be understood that the system 10 may comprises other speech servers such as NLG (natural language generation) engines and pre/post processing engines/codecs, etc.

The system 10 further comprises an audio I/O system 20 (or "TEL") which, in the exemplary embodiment of FIG. 1, comprises a telephony card on a gateway). The audio I/O system 20 comprises an audio I/O system and voice access port and platform. Essentially, in the embodiment of FIG. 1, the TEL 20 is a driver. An advantage of the invention, in particular the architecture of FIG. 1 with the audio subsystem (TEL) 20 as a web service, is that the invention enables abstraction of the gateway 11. This allows easy integration of complex engine configurations and platforms/execution environments within gateways that are proprietary and require significant efforts to integrate with or often even impossible to do without close collaboration with the gateway manufacturer. In the present invention, the interface behaves like a driver or web service and any platform/runtime environment can interface to the driver or web service.

In other embodiments, such as when the system 10 comprises direct IVR programming, the audio I/O platform 20 is part of a custom server for the application (as explained below with reference to FIGS. 9 and 10).

The TEL 20 comprises a gateway (e.g., telephony platform) that connects voice audio streams from a network to the various speech engines (possibly with protocol and media conversion e.g. from PSTN to VoIP), such as speech recognition and TTS. The gateway platform comprises an audio subsystem card (e.g. telephony cards) and appropriate devices and software for playing prerecorded/synthesized audio prompts to callers, recording a caller's voice, capturing number information (DTMF), etc. The TEL 20 includes various codecs (DSR optimized codecs or conventional codecs (e.g. AMR)) and DSP support (e.g., for additional preprocessing like Barge-in, speech activity detection, speech processing (noise subtraction, cepstral mean subtraction, etc.) for uplink and downlink coding/decoding. Preferably, TEL 20 provides support for multiple connections/channels (e.g., telephone calls or VoIP/IMS sessions). The TEL 20 preferably supports "Barge-in" and supports full-duplex audio to record and play audio at the same time.

A driver module 12 comprises one or more software drivers for allowing the host system to operate the TEL 20. Telephony/gateway and IVR/conversational exchanges occur between the voice response system 11, drivers 12 and audio I/O system 20.

In general, the router/load manager 21 provides functions such as assigning available speech engines to the applications and assigning applications to ports. More specifically, the router/load manager 21 associates ports of TEL 20 to the application 14. Further, in response to requests from the application 14 (via the task manager 15), the router/load manager 21 associates speech engines to the application 14 on an utterance, session or persistent basis. Each application and speech server registers its state with the router/load manager 21.

FIG. 2 is a high-level diagram of an embodiment of the system of FIG. 1, which illustrates a method for sending audio and control messages between different components of the system 10. In FIG. 2, a client application 25 (comprising conversational application 14 and associated task manager 15), speech server 26 and router 21 exchange control messages via control bus 13a. The control bus 13a comprises the portion of the IP network 13 between ports that are assigned to control protocols. In one embodiment, the control bus comprises a collection of point-to-multi point connections that pass XML messages over TCP sockets, wherein the messages comprise strings with fixed length fields and a list of tag-value pairs. In this embodiment, the application 14 is programmed using, e.g., VoiceXML, FDM (form-based dialog management), C++, Java, scripts, and may comprise an XML parser.

The speech server 26 comprises a plurality of speech engines (ASR, TTS, SPID, and audio subsystem (comprising pre/post processors, codecs) which pass audio data over an audio bus 13b. The audio bus 13b comprises the portion of the IP network 13 between ports that exchange, e.g., RTP streamed audio to and from the TEL 20 system. The audio bus 13b comprises a collection of point-to-multi point connections passing audio and begin/end indicators. In one embodiment, TCP sockets are connected when speech begins and then terminate on the end of speech. RTP based audio exchanges may also be implemented.

In the system 10, audio streams are exchanged between the different speech servers, e.g., TEL 20, SPID server 19, TTS server 19, ASR server 16 (and other engines) and control messages are exchanged between the task manager 15, router 21, speech servers, and TEL 20 and between engines. The task manager 14 programs the engines using, e.g., SOAP/WSDL messages (as discussed below). The task manager maintains the context for engines that are programmed asynchronously by algorithm strings and are assumed stateless. Sockets and HTTP can be used for the messaging following the same principles.

Various protocols may be used for streaming audio and exchanging control messages. Preferably, the implementation of distributed conversational systems according to the present invention is based, in part, on a suitably defined conversational coding, transport and control protocols, as described for example in U.S. patent application Ser. No. 10/104,925, filed on Mar. 21, 2002, entitled "Conversational Networking Via Transport, Coding and Control Conversational Protocols," which is commonly assigned and fully incorporated herein by reference.

Briefly, in a preferred embodiment, distributed transport protocols comprise, for example, RT-DSR (real-time distributed speech recognition protocols) that implement DSR optimized and DSR non-optimized speech codecs. Such protocols are preferably VoIP-based protocols (such as SIP or H.323), which implement RTP and RTSP. Such protocols preferably support end-point detection and barge-in and exchange of speech meta-information such as speech activity (speech-no-speech), barge-in messages, end of utterance, possible DTMF exchanges, front-end setting and noise compensation parameters, client messages (settings of audio-sub-system, client events, externally acquired parameters), annotations (e.g. partial results), application specific messages. Such meta information can be transmitted out-of-band or sent or received interleaved with "audio in or "audio out" data. Furthermore, control protocols include session control protocols, protocols for exchanging of speech meta-information, and SERCP (speech engine remote control protocols). Preferably, SERCP is based on SOAP, using algorithm string instructions as described below. The engines are preferably programmed to listen for input, perform a particular task, and send the results to a particular port. Such protocols further provide control of the audio I/O system, such as programming a DTMF recognizer.

Furthermore, in a preferred embodiment, the routing functions are performed using SOAP messaging, as are the exchanges between engines. One or more ports of the audio I/O system are associated to an application. The engines are associated with applications on either an utterance, session or persistent basis. Each engine server registers its state with the router/load manager. The application sends request to the router/load manager (via the task manager) for a particular functionality or resource. As a result, the router/load manager passes the address of the engines to the task manager and the audio I/O system. The task manager ships SOAP instructions to the engines. The audio I/O system ships RTP audio to and from the engine address. After processing, the engines ship their results or RTP in the same manner. Preferred coding, transport and control protocols for implementation in a distributed telephony system according to the invention will be discussed in detail below with reference to FIG. 13, for example.

FIGS. 5 and 6 are diagrams illustrating distributed systems/applications according to other embodiments of the invention. The architecture of the exemplary system depicted in FIG. 5 is similar to the system 10 of FIG. 1, except that the system comprises a plurality of client application hosts 70, 71, 72, each running a plurality of applications and associated task managers and speech engines. In the framework of FIG. 5, the router/load manager 21 can select any available speech engine across the different server/machines. For instance, as specifically depicted in FIG. 5, the task manager on remote host 70 can access and program the speech engines on remote hosts 71 and 72, as selected and assigned by the router/load manager 21.

The system architecture of FIG. 6 comprises a plurality of client application hosts 73, 74, 75, each running a plurality of applications and associated speech engines. In the framework of FIG. 6, the applications associated with a given host can use only those speech servers residing on the host. In FIG. 5, the task managers can control engines that are located on different machines networks. The task manager as such abstracts the engine control from the application. In the embodiment of FIG. 6, engines are on the same machine, cluster or network as the application. It is to be understood that when the engines are local, it is not necessary to have a task manager drive the remote engines through SERCP and the application can directly talk (via local APIs) to drive the local engines.

The diagrams of FIGS. 3a, 3b, 3c and 3d illustrate various application programming frameworks that may be implemented in a telephony system according to the invention. It is to be understood that the conversational architectures described herein may be implemented using one or more of the following application authoring frameworks. In FIG. 3a, the application framework comprises a VoiceXML browser application which accesses and renders VoiceXML documents received from a WAS (Web Application Server) and which passes received data to the WAS for backend processing. Preferably, any suitable speech browser that is capable of processing VoiceXML scripts may be employed herein. A preferred embodiment of the VoiceXML browser 31 is described, for example, in the International Appl. No. PCT/US99/23008, entitled "Conversational Browser and Conversational Systems", which has been filed in the United States National Phase and assigned U.S. Ser. No. 09/806,544, and which is commonly assigned an incorporated herein by reference. This application also describes a conversational markup language. A speech browser is preferably capable of parsing the declarative framework (including any imperative specification) of a VoiceXML page (or any other form of conversational ML) and render the conversational UI of the target content or transaction to a user.

The application framework illustrated in FIG. 3b comprises a FDM (form-based dialog manager) that manages a plurality of forms (e.g., scripts) and communicates with backend logic for processing the forms and accessing backend data. Any suitable form-based dialog manager, such as a web-based FDM, which employs NLU (natural language understanding) to provide a "free-flow" dialog, and which can operate in a distributed environment, may be implemented herein. In one preferred embodiment, techniques that may be used for providing NLU-based FDM are disclosed in the following references: Papineni, et al., "Free-flow Dialog Management Using Forms," Proc. Eurospeech, 1999; and K. Davies et al., "The IBM Conversational Telephony System For Financial Applications", Proc. Eurospeech, Budapest, 1999, which are incorporated herein by reference. In addition, the techniques disclosed in U.S. Pat. No. 6,246,981, issued to Papineni, et al., entitled "Natural Language Task-Oriented Dialog Manager and Method", which is incorporated herein by reference, may be used for providing NLU-based FDM.

The application framework illustrated in FIG. 3c comprises an XML-based FDM application 34. The application 34 comprises a VoiceXML browser 35, a DOM (document object model) layer 36 (that provides at least access to the interaction events and allows update of the presentation (through page push or DOM mutation), a wrapper layer 37 a multi-modal shell 38 and FDM 39. In this framework, the FDM 39 uses the VoiceXML browser 35 for audio I/O management and speech engine functions. Following a Web programming model, the FDM 39 submits messages for backend calls and scripts. The multi-modal shell 38 supports launching of objects or managing the forms and sending snippets to the VoiceXML browser 35. It provides the line between the VoiceXML browser 35 and the FDM 39 for audio I/O and engine management. This application framework supports context sharing objects. The DOM layer 36 comprise supporting mechanisms for controlling the VoiceXML browser 35 and mechanisms for event notification. The wrapper 37 comprises an interface and filters to the VoiceXML browser 35 (e.g., the wrapper 37 implement a DOM filter and interfaces). The DOM implementation provides an efficient and universal mechanism for exchanging events and manipulation the presentation layer for existing VoiceXML browsers, without having to modify the code of such VoiceXML browsers. The application framework 34 and various functions and implementation of the DOM and wrapper layers are described in detail in U.S. patent application Ser. No. 10/007,092, filed on Dec. 4, 2001, entitled "Systems and Methods For Implementing Modular DOM (Document Object Model)-Based Multi-Modal Browsers", and U.S. Provisional Application Serial No. 60/251,085, filed on Dec. 4, 2000, which are both fully incorporated herein by reference.

The application framework illustrated in FIG. 3d comprises a DOM-based multi-modal application 40. Various frameworks for application 40 are described in the above incorporated U.S. patent application Ser. No. 10/007,092 and No. 60/251,085. Briefly, the multi-modal browser application 40 comprises a multi-modal shell 47 that maintains the state of the application, manages the synchronization between the supported browsers (e.g., GUI browser 41 and VoiceXML browser 44), and manages the interface with the WAS and backend. The multi-modal browser application 40 comprises a GUI browser 41 and associated DOM interface 42 and wrapper layer 43, as well as a voice browser 44 and associated DOM interface 45 and wrapper layer 46. It is to be understood that notwithstanding that two channels are shown in FIG. 3d, additional channels can be supported (e.g., WML (wireless markup language, etc.) The DOM interfaces 42 and 45 preferably provide mechanisms to enable the GUI browser 41 and voice browser 44 to be at least DOM Level 2 compliant. The DOM interfaces 42 and 45 comprise supporting mechanisms for controlling the respective browsers (presentation update) and mechanisms for event notification and exchange.

Each wrapper 43 and 46 comprises interfaces and filters to the different views (browsers) (e.g., the wrappers implement a DOM filter and interfaces). The wrappers support granularity of the synchronization between the different channels by filtering and buffering DOM events. Further, the wrappers 43 and 46 preferably implement the support for synchronization protocols for synchronizing the browsers.

Instead of wrappers, it is possible to use other interfaces or mechanisms that rely on the same principles. For example, it is possible to load in the user agents (or in the pages) an ECMAScript library that capture DOM events that result from the user interaction and handle them by sending them to the multimodal shell.

The synchronization protocols implement the information exchange behind an MVC framework: when the user interacts on a View (via a (controller) browser), the action impacts the Model (supported by the multi-modal shell 41) that updates the Views. Preferably, the synchronization framework enables transport of DOM commands to remote browsers from the multi-modal shell and the transport of DOM events from the remote browser to the Multi-modal shell. The synchronization framework preferably enables page push instructions to remote browsers and can also enable DOM manipulation to update the presentation. Further, the synchronization framework preferably enables (i) status queries from MM Shell to the different views, (ii) status queries from views to MM Shell, (iii) registration of new views, (iv) discovery between the view and MM shell, (v) description and configuration of a new view, and (vi) clean disconnect of views from the Multi-modal shell. The synchronization framework preferably supports replication of the Multi-modal shell state. The synchronization framework clearly identifies the associated view (Channel type, User and Session). In other embodiments, the synchronization framework preferably supports negotiation on what component plays the role of the multi-modal shell and negotiation/determination of where to perform what function. The synchronization framework is compatible with existing network infrastructure/protocol stack and gateways, and the synchronization protocols can bind to HTTP (and similar protocols like WSP) and TCP/IP. Preferably, the synchronization fit the evolution towards web services; XML protocols.

It is to be appreciated and understood that the systems and methods described herein can support numerous runtime platform execution models and numerous programming model. Therefore, programming paradigms such as NL, Script, state table, java, VoiceXMl, single authoring and multiple authoring frameworks can be implemented in the systems described herein. For example, the present invention supports programming models that are premised on the concept of "single-authoring" wherein content is expressed in a "user-interface" (or modality) neutral manner. In particular, the present invention can support interaction-based programming models that separate application programming into content aspects, presentation aspects and interaction aspects. An example of a single authoring, interaction-based programming paradigm, that can be implemented herein is described in U.S. patent application Ser. No. 09/544,823, filed on Apr. 6, 2000, entitled: "Methods and Systems For Multi-Modal Browsing and Implementation of A Conversational Markup Language", which is commonly assigned and fully incorporated herein by reference.

As described in the above-incorporated U.S. Ser. No. 09/544,823, one embodiment of IML preferably comprises a high-level XML (eXtensible Markup Language)-based script for representing interaction "dialogs" or "conversations" between user and machine, which is preferably implemented in a modality-independent, single authoring format using a plurality of "conversational gestures." The conversational gestures comprise elementary dialog components (interaction-based elements) that characterize the dialog interaction with the user. Each conversational gesture provides an abstract representation of a dialog independent from the characteristics and UI offered by the device or application that is responsible for rendering the presentation material. In other words, the conversational gestures are modality-independent building blocks that can be combined to represent any type of intent-based user interaction. A gesture-based IML, for example, allows an application to be written in a manner which is independent of the content/application logic and presentation (i.e., gesture-based CML encapsulates man-machine interaction in a modality-independent manner).

The use of a single authoring, modality independent application (e.g., gesture-based IML as described above) together with a multi-modal shell in the application framework of FIG. 3d advantageously provides tight synchronization between the different Views supported by the multi-modal browser. Techniques for processing multi-modal documents (single and multiple authoring) via multi-modal browsers are described in the above-incorporated patent application U.S. Ser. No. 09/544,823, as well as U.S. patent application Ser. No. 09/507,526, entitled: "Systems And Methods For Synchronizing Multi-Modal Interactions", which is commonly assigned and fully incorporated herein by reference. These applications describes architectures and protocols for building a multi-modal shell.

FIG. 4 is a block diagram of a conversational system according to an embodiment of the present invention which implements a DOM-based multi-modal browser and NLU to provide "free-flow" or conversational dialog (details of which are disclosed in U.S. patent application Ser. No. 10/156,618, filed May 28, 2002, entitled: "Methods and System For Authoring of Mixed-Initiative Multi-Modal Interactions and Related Browsing Mechanisms," which is commonly assigned and incorporated herein by reference. The task manager 15, router 21, ASR 16, TTS 18 and network 13 interface and communicate as described, for example, with reference to FIG. 1. The exemplary system comprises a multi-modal browser comprising the VoiceXML browser 44 and associated interfaces 35, 46, an additional view (GUI, WML, etc.) 50 and associated interfaces 51, 52, and a multi-modal shell 53. The multi-modal shell 53 coordinates and synchronizes (via synchronization module 54) the information exchange between the registered channel-specific browser applications 44 and 50 (and possibly other views) via synchronization protocols (e.g., Remote DOM or SOAP). The multi-modal shell 53 interacts with backend logic comprising Web server 69, Event Handler 60 and canonicalizer 61. The multi-modal shell 53 parses a multi-modal application/document received from the WAS 59, and builds the synchronization between the registered browser applications via a registration table 55 (e.g., an interaction logic layer DOM Tree and associated instances that include the data model and interaction, and possibly, the interaction history) and then sends the relevant modality specific information (e.g., presentation markup language) comprising the multi-modal application/document to each registered browser application for rendering based on its interaction modality. A transcoder 56 enables the conversion of, e.g., IML scripts to VoiceXML scripts 32 and other scripts supported by the multi-modal browser. A FDM 57 is accessible by the multi-modal shell 53 to perform functions such as focus determination and context resolution. A NLU system 58 that provides NL parsing/classification enable the generation of semantic representations of spoken utterances and, thus, free-flow, mixed initiative dialog.

FIGS. 7 and 8 are diagrams illustrating methods for processing a call in a telephony system according to an embodiment of the invention. More specifically, FIG. 7 is a high-level diagram illustrating methods for initiating a conversational system according to an embodiment of the invention, as well as the flow of audio and control messages between components of the conversational system. FIG. 8 is a flow diagram illustrating methods for processing a call or incoming voice communication (i.e. start of a new voice session). The diagram of FIG. 7 illustrates various telephony components comprising a controller 80, speech server 81 and audio I/O source 82. Each system component 80, 81 and 83 can reside on a separate machine. For instance, the audio I/O source (comprising the TEL) may reside on a PC having client application enabling software (e.g., Dialogic, NMS, DT). The controller 80 (which comprise the conversational application (App) and task manager (TMR) may reside on a remote machine over a network and the speech server 81 (which comprises speech engines ASR, TTS, SPID, NLU and other processing engines, if needed) may reside a separate remote machine over the network. As illustrated, control messages are passed between the controller 80 and TEL host 82 and between the controller 80 and speech server 81, using control messaging protocols as described herein. Further, audio data is streamed between the TEL host 82 and the speech server 81, using RT-DSR protocols as discussed herein.

Each host 80, 81, 82 comprises a PM (process manager) 83 for initializing the conversational system. Each process manager 83 comprises a daemon that is installed on each machine, which is launched when the machine is booted or rebooted to initiate the different components. In one embodiment, the PM is installed as a deamon (per host, without remote access). There are various processes that are launched at initiation. For instance, the process manager launches a router. For a router on a client, a one-to many configuration is initiated. For a router on a server, a many-to many configuration is initiated. The client is configured with a list of routers, and selected according to a predefined scheduling scheme. Another process relates to server activity. The process manager launches a server and the server connects to a well-known router port from the configuration. The server will send a registration message to the router (travel.asr.0.hosty:222:333). Another process relates to client activity. When a client is launched, a ClientRouter Object is launched and processed. The router will notify the ClientRouter Object of what resources are available (darpa.asr, us.tts . . . ). The ClientRouter Object can request a server of a particular type. Each application will register with the router (via the task manager) the application type and the application's Call ID.

Methods for processing an incoming call will now be discussed with reference to the system of FIG. 1, as well as the flow diagram of FIG. 8. Initially, a call (new incoming voice communication, i.e., establishment of a new voice session which, in telephony is a call and which on VoIP, could be a SIP session, etc.) is received by the client voice response system 11 (step 90). The client determines a Call ID based on, e.g., the configuration (e.g., number to application mapping, application profiles, state tables, the telephone number dialed, etc). Call control exchanges/signaling occur to pass the incoming call information (e.g., Call ID, IP address, etc.) from the client 11 to the Tel 20 via the driver 21 (step 91). The TEL 20 sends an application instance request (with Call ID, address) to the router/load manager 21 (step 92). The router/load manager 21 attempts to find and assigns an application that can take the call (step 93).

When the application is assigned, an application presentation layer is loaded (via a backend access) (step 94). The router/load manager 21 then passes to the application 14 (via the associated task manager 15) the appropriate TEL address for, e.g., the duration of the call (session based) (step 95). The applicati


Free Web Sudoku Puzzles.
Solve with your browser.
        6 3      
      9   7 6    
    7 5       9 8
  2   1 4       6
  5           8  
9       5 6   1  
3 9       5 2    
    6 8   1      
      6 3        
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!