Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Foreclosure is a compound yet very effective recovery system
Category:
Business  

Amazing Antioxidants
Category:
Health / Fitness  

Amazing Antioxidants
Category:
Health / Fitness  

Avoiding Resume Elimination at the Initial Scanning Scan is Vita...
Category:
Business  

How To Determine Which Cell Service Is Best For You
Category:
Business  

A Short History of the Wristwatch
Category:
Business  

Growing Your Own Herbs
Category:
Home And Family  

Herbal Acne Home Cures
Category:
Health / Fitness  

Creating Fresh Content for Search Engines
Category:
Marketing  

That Talking Thing will either make or break a relationship
Category:
Home And Family  

Avoid the Most Common Mistakes in Affiliate Marketing
Category:
Business  

Know the Signs of Childhood Asthma
Category:
Health / Fitness  

The Easiest Weight Loss Program Ever
Category:
Health / Fitness  

How to Expand your Business by Leaps and Bounds
Category:
Business  

Personal Accident Claim The Successful Route
Category:
Business  

Free Advertising
Category:
Marketing  

Free Advertising
Category:
Marketing  

Chicken and the Egg
Category:
Business  

Herbs for hair growth
Category:
Health / Fitness  

Organic Gardening
Category:
Home And Family  

Does Your Cleaning Business Have a Mission Statement
Category:
Business  

Internet Banking Are you online
Category:
Finance / Investment  

3 Things All Affiliate Marketers Need To Survive Online
Category:
Marketing  

How to use your subject to grab the attention of your optin news...
Category:
Marketing  

Choosing the Right Network Marketing Company 4 surprising steps
Category:
Marketing  

Diabetic diet plan guide
Category:
Health / Fitness  

6 POWERFUL VRE Business Models You Can Start Building In 2006 Us...
Category:
Business  

Free Cell Phone Ring Tones Jingling Vibes For Any Occasion
Category:
Entertainment / Television  

Free Ringtone Downloads Dazzling Tunes For Your Pleasure
Category:
Entertainment / Television  

Why choose MLM Leads
Category:
Business  

Vending Machines provide an excellent income
Category:
Business  

Discovers The Secret To The Most Popular Way Of Making Money
Category:
Business  

Internet Marketing Information Overload
Category:
Marketing  

Your New Cat Why Are the First 24 Hours So Important Part 3
Category:
Home And Family  

SearchInform 3 0 Consolidating information from various sources
Category:
Computers  

Health Insurance How to Find An Affordable Quote
Category:
Home And Family  

Brand You The Top Five Ways To Build Your Brand Online
Category:
Marketing  

Acne Treatment
Category:
Health / Fitness  

Home Business Entrepreneurs Banking On Increased Income
Category:
Business  

Hypnotherapy in Bedfordshire
Category:
Health / Fitness  

An Alaska Cruise Offers Unlimited Fun
Category:
Travel  

Guide To Ceiling Fan Blades
Category:
Home And Family  

Personal Injury Specialist No Win No Fee
Category:
Finance / Investment  

reduce tension
Category:
Business  

How to Use Free Articles to Create Massive Traffic Within Minute...
Category:
Marketing  

LASIK a Cure for Blurry Vision
Category:
Health / Fitness  

The Truth About Debt Consolidation
Category:
Business  

Don t Wait for a Mate Feather Your Nest Now Part 2
Category:
Home And Family  

Hawaii Vacation Accommodation and Holiday Homes in Oahu Maui Kau...
Category:
Travel  

Mortgage Lenders Making The Right Choice
Category:
Business  

Hawaii Vacation Accommodation and Holiday Homes in Oahu Maui Kau...
Category:
Travel  

Changing Face Of Holidays In The UK
Category:
Travel  

Make Your Business Memorable with Business Cards
Category:
Marketing  

Network Marketing The Organic Way
Category:
Marketing  

8 Ways to Grow Your Business During a Summer Lull
Category:
Marketing  

You Don t Need to be a Computer Scientist to Profit Online
Category:
Marketing  

Information Retrieval Systems IRS and Search Engines SEO
Category:
Marketing  

Plasma TVs are Hot
Category:
Computers  

The Top Providers on the Web
Category:
Health / Fitness  

Winning the Skin War Best Acne Skin Care
Category:
Health / Fitness  

Boost Your Income and Hits Today
Category:
Business  

Bad Credit Loans Made Easier by Pre Approval
Category:
Business  

Vitamin supplements by Nguang Nguek Fluek
Category:
Health / Fitness  

How you Can Save Money if you Book Hotels in Central Rome
Category:
Travel  

Universal Life Insurance guide 101
Category:
Finance / Investment  

FINE or VICE Cash Loans
Category:
Finance / Investment  

Why Blogs are so popular
Category:
Marketing  

Office Supplies and Client Relation
Category:
Business  

Buying a Hidden Spy Camera
Category:
Business  

Understanding Flower Bulbs
Category:
Home And Family  

Parenting 101 Get Into a Parenting Class
Category:
Home And Family  

Lanzarote Tourist
Category:
Travel  

A Visitors Guide to Paris France
Category:
Travel  

Personal Accounts Choosing Your Bank
Category:
Business  

Protect Yourself Against Viruses
Category:
Computers

Method and apparatus for multiplying and accumulating complex numbers in a digital filter Number:6,823,353 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Method and apparatus for multiplying and accumulating complex numbers in a digital filter

Abstract: The invention provides a method and apparatus for performing complex digital filters. According to one aspect of the invention, a method for performing a complex digital filter is described. The complex digital filter is performed using a set of data samples and a set of complex coefficients. In addition, the complex digital filter is performed using a inner and outer loop. The outer loop steps through a number of corresponding relationships between the set of complex coefficients and the set of data samples. The inner loop steps thorough each complex coefficient in the set of complex coefficients. Within the inner loop, the data sample corresponding to the current complex coefficient (the complex coefficient currently identified by the inner loop) is determined according to the current corresponding relationship (the corresponding relationship currently identified by the outer loop). Then, in response to receiving an instruction, eight data elements are read and used to generate a currently calculated complex number. These eight data elements were previously stored as packed data and include two representations of each of the components of the current complex coefficient and its current corresponding data sample. Each of these data elements is either the positive or negative of the component they represent. As a result of the manner in which these eight data elements are stored, the currently calculated complex number represents the product of the current complex coefficient and its current corresponding data sample. The currently calculated complex number is then added to the current output packed data.

Patent Number: 6,823,353 Issued on 11/23/2004 to Fischer,   et al.


Inventors: Fischer; Stephen A. (Rancho Cordova, CA); Mennemeier; Larry M. (Boulder Creek, CA); Peleg; Alexander D. (Haifa, IL); Dulong; Carole (Saratoga, CA); Kowashi; Eiichi (Ibaraki, JP)
Assignee: Intel Corporation (Santa Clara, CA)
Appl. No.: 211203
Filed: August 2, 2002


Related U.S. Patent Documents

Application NumberFiling DatePatent NumberIssue Date
760969Jan., 20016470370
905506Jul., 19976237016
575778Dec., 19956058408
523211Sep., 1995

Current U.S. Class: 708/622
Current International Class: G06F 17/10 (20060101)
Field of Search: 708/622,511,524


References Cited [Referenced By]

U.S. Patent Documents
3202805 August 1965 Amdahl et al.
3711692 January 1973 Batcher
3723715 March 1973 Chen et al.
4161784 July 1979 Cushing et al.
4344151 August 1982 White
4393468 July 1983 New
4418383 November 1983 Doyle et al.
4498177 February 1985 Larson
4707800 November 1987 Montrone et al.
4771379 September 1988 Ando et al.
4779218 October 1988 Jauch
4989168 January 1991 Kuroda et al.
5095457 March 1992 Jeong
5111422 May 1992 Ullrich
5187679 February 1993 Vassiliadis et al.
5222037 June 1993 Taniquchi
5227994 July 1993 Mitsuharu
5241492 August 1993 Girardeau, Jr.
5243624 September 1993 Paik et al.
5262976 November 1993 Young et al.
5293558 March 1994 Narita et al.
5321644 June 1994 Schibinger
5325320 June 1994 Chiu
5381357 January 1995 Wedgewood et al.
5420815 May 1995 Nix et al.
5442799 August 1995 Murakami et al.
5457805 October 1995 Nakamura
5473557 December 1995 Harrison et al.
5487022 January 1996 Simpson et al.
5500811 March 1996 Corry
5506865 April 1996 Weaver, Jr.
5509129 April 1996 Guttag et al.
5517438 May 1996 Dao-Troung et al.
5528529 June 1996 Seal
5566101 October 1996 Kodra
5576983 November 1996 Shiokawa
5675526 October 1997 Peleg et al.
5677862 October 1997 Peleg et al.
5742538 April 1998 Guttag et al.
5896543 April 1999 Garde
5983253 November 1999 Fischer et al.
6058408 May 2000 Fischer et al.
6237016 May 2001 Fischer et al.
6470370 October 2002 Fischer et al.

Other References

J Shipnes, Graphics Processing with the 88110 RISC Microprocessor, IEEE (1992), pp. 169-174. .
MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Inc. (1991). .
Errata to MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Motorola Inc. (1992), pp. 1-11. .
MC88110 Programmer's Reference Guide, Motorola, Inc. (1992), p. 1-4. .
i860TM Microprocessor Family Programmer's Refernce Manual, Intel Corporation (1992), Ch. 1, 3, 8, 12. .
R.B. Lee, Accelerating Multimedia With Enhanced Microprocessors, IEEE Micro (Apr. 1995), pp. 22-32. .
TMS320C2x User's Guide, Texas Instruments (1993) pp. 3-2 through 3-11; 3-28 through 3-34; 4-1 through 4-22; 4-41; 4-103; 4-119 through 4-120; 4-122; 4-150 through 4-151. .
L. Gwennap, New PA-RISC Processor Decodes MPEG Video, Microprocessor Report (Jan. 1994), pp. 16, 17. .
SPARC Technology Business, UltraSPARC Multimedia Capabilities On-Chip Support for Real-Time Video and Advanced Graphics, Sun Microystems (Sep. 1994). .
Y. Kawakami et al., LSI Applications: A Single-Chip Digital Signal Processor for Voiceband Applications, Solid State Circuits Conference, Digest of Technical Papers; IEEE International (1980). .
B. Case, Philips Hopes to Displace DSPs with VLIW, Microprocessor Report (Dec. 1994), pp. 12-15. .
L. Gwennap, UltraSparc Adds Multimedia Instructions, Microprocessor Report (Dec. 1994), pp. 16-18. .
N. Margulis, i860 Microprocessor Architecture, McGraw Hill, Inc. (1990), Ch. 6, 7, 8, 10, 11. .
Pentium Processor User's Manual, vol. 3: Architecture and Programming Manual, Intel Corporation (1993), Ch. 1, 3, 4, 6, 8, and 18. .
Desktop Video Data Handbook, Philips Semiconductors (1993), pp. iii-v and 3-311 through 3-319. .
Jack, K., Video Demystified, A Handbook for the Digital Engineer, (1955), pp. vii-x and 197-256..

Primary Examiner: Mai; Tan V.
Attorney, Agent or Firm: Blakely, Sokoloff, Taylor & Zafman LLP

Parent Case Text



This application is a continuation of application Ser. No. 09/760,969, filed Jan. 16, 2001 now U.S. Pat. No. 6,470 370, which claims priority to a divisional application Ser. No. 08/905,506, filed Jul. 31, 1997, now Issued U.S. Pat. No. 6,237,016, which claims priority to application Ser. No. 08/575,778, filed Dec. 20, 1995, now Issued U.S. Pat. No. 6,058,408, which is a continuation-in-part of and claims priority to Ser. No. 08/523,211, filed Sep. 5, 1995, now abandoned.
Claims



What is claimed is:

1. An apparatus comprising: a number of multipliers to receive a number of source operands that include packed data based on a single instruction, wherein the packed data in a first source operand includes two representations of each component of a first complex coefficient and the packed data in a second source operand includes two representations of each component of a second complex coefficient, the number of multipliers to generate a number of intermediate results based on multiplication of the components of the complex coefficients; and a number of adders to generate a complex number result having a real component and an imaginary component based on addition of the number of intermediate results.

2. The apparatus of claim 1, wherein each component of the first and second complex coefficient is represented by N bits and the components of the complex number result represented by 2N bits.

3. The apparatus of claim 1, wherein only one component of the first and second complex coefficients is negative.

4. An apparatus comprising: a memory to store a single instruction that identifies packed data operands having stored therein at least eight data elements, wherein the packed data operands include two representations of each component of a first complex coefficient and two representations of each component of a second complex coefficient; and a processor to execute the single instruction, wherein the execution of the single instruction comprises, generating a complex number representing a product of the first complex coefficient and the second complex coefficient using the two representations of each component of the first and second complex coefficients.

5. The apparatus of claim 4, wherein the processor is to add a real component and an imaginary component of the complex number to a first data element and a second data element of an accumulation packed data item.

6. The apparatus of claim 5, wherein the processor is to shift both the first and second data elements of the accumulation packed data item to the right by N bits.

7. The apparatus of claim 4, wherein each of the at least eight data elements is represented by N bits and components of the complex number are represented by 2N bits.

8. The apparatus of claim 4, wherein only one data element in the at least eight data elements is negative.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to the field of computer systems. More specifically, the invention relates to operations on complex numbers.

2. Background Information

Many devices in use today (e.g., modems, radar, TV, telephone, etc.) transmit data using in phase and out of phase signals (e.g., orthogonal signals). This data is typically processed using complex numbers (e.g., the real number is used for the in phase signal, while the imaginary number is used for the out of phase signal). The multiplication of two complex number (e.g., r.sub.1 i.sub.1 and r.sub.2 i.sub.2) is performed according to Equation 1 shown below:

The multiplication of complex numbers is required in operations such as, the multiply-accumulate operation (see Equation 2 below). In Equation 2, a(n) and b(n) represent the n.sup.th complex numbers in two series of complex numbers:

Digital discrete time filters, such as a FIR filter and an IIR filter, require many multiply-accumulate operations. A FIR filter is an operation which is used in applications, such as real time digital signal processing applications (e.g., complex demodulation and equalization found in high speed data modems; ghost canceling in terrestrial broadcasting), for recovery of the transmitted information from the signal. The equation for the FIR filter is shown below as Equation 3: ##EQU1##

With reference to Equation 3, the complex variable y(k) represents the current output sample of the filter, the input value c(n) represents the n.sup.th filter coefficient of the filter, the constant L is the number of coefficients in c(n), and the input value x(k-n) represents the n.sup.th past value of the input sequence (also termed as "samples"). The output of the filter is a weighted average of the past L complex samples. Typically, there are more samples than there are coefficients. For the computation of the k.sup.th output sample y(k), the first complex coefficient corresponds to the k.sup.th sample, the second corresponds to the (k-1).sup.th sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the k.sup.th output sample of the filter. For the computation of the (k+1).sup.th output sample y(k+1), the first complex coefficient corresponds to the (k+1).sup.th sample, the second complex coefficient corresponds to the k.sup.th sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the (k+1).sup.th output of the filter. Thus, the correspondence between the samples and the complex coefficients is slide up one for each successive output sample. As a result, FIR filters are typically coded using an outer and an inner loop. The outer loop steps through the successive outputs (the different corresponding relationships between the samples and complex coefficients), while the inner loop steps through the complex coefficients and current corresponding samples to perform the multiply-accumulate.

When a FIR filter is first begun, there are insufficient samples to compute the entire length (L) of the filter (i.e., index k-n into the input samples x() is negative). In such situations, the missing samples are typically substituted with zero, the first sample, or some other relevant input.

The equation for the IIR filter is shown below as Equation 4: ##EQU2##

With reference to Equation 4, the input value d(i) represents the i.sup.th filter coefficient of the filter, and the constant M is the number of coefficients in d(i).

One prior art technique for supporting multiply-accumulate operations is to couple a separate digital signaling processor (DSP) to an existing general purpose processor (e.g., The Intel.RTM. 486 manufactured by Intel Corporation of Santa Clara, Calif. The general purpose processor allocates jobs to the DSP.

One such prior art DSP is the TMS320C2x DSP manufactured by Texas Instruments, Inc. of Dallas, Tex. A prior art method for performing a complex multiply-accumulate operation on this DSP is to perform the multiply and add operations to generate the real component and add that real component to an accumulation value representing the accumulated real component, and then perform the multiply and add operations to generate the imaginary component and add that imaginary component to an accumulation value representing the accumulated imaginary component. A pseudo code representation of the inner loop of the FIR filter is shown below in Table 1.

TABLE 1 ZAC ;ACC <= 0, other setup code to initialize pointers YRSTART ;Loop label LT *x++ ;T <= x.i(n) MPY *c++ ;P <= T* c.i(n) LT *x++ ;T <= x.r(n) MPYS *c++ ;ACC <= ACC - P,P <= T* c.r(n) APAC lc-- ;ACC <= ACC + P, decrement loop counter register BANZ YRSTART ;Jump back to beginning of loop if lc is not zero SA *y++ ;Store y.r ZAC ;ACC <= 0, reset the pointers here. YISTART ; LT *x++ ;T <= x.i(n) MPY *c++ ;P <= T* c.r(n) LT *x++ ;T <= x.r(n) MPYA *c++ ;ACC <= ACC + P,P <= T*c.i(n) APAC lc-- ;ACC <= ACC + P BANZ YISTART SA *y

One limitation of the TMS320C2x DSP is its limited efficiency when performing complex number multiplication and FIR filters. As illustrated by the above pseudo code, the algorithm is basically serial in nature. Thus, it requires approximately 10 instructions to accumulate the result of multiplying together two complex numbers.

Multimedia applications (e.g., applications targeted at computer supported cooperation (CSC--the integration of teleconferencing with mixed media data manipulation), 2D/3D graphics, image processing, video compression/decompression, recognition algorithms and audio manipulation) require the manipulation of large amounts of data which may be represented in a small number of bits. For example, graphical data typically requires 16 bits and sound data typically requires 8 bits. Each of these multimedia application requires one or more algorithms, each requiring a number of operations. For example, an algorithm may require an add, compare and shift operations.

To improve efficiency of multimedia applications (as well as other applications that have the same characteristics), prior art processors provide packed data formats. A packed data format is one in which the bits typically used to represent a single value are broken into a number of fixed sized data elements, each of which represents a separate value. For example, a 64-bit register may be broken into two 32-bit elements, each of which represents a separate 32-bit value. In addition, these prior art processors provide instructions for separately manipulating each element in these packed data types in parallel. For example, a packed add instruction adds together corresponding data elements from a first packed data item and a second packed data item. Thus, if a multimedia algorithm requires a loop containing five operations that must be performed on a large number of data elements, it is desirable to pack the data and perform these operations in parallel using packed data instructions. In this manner, these processors can more efficiently process multimedia applications.

However, if the loop of operations contains an operation that cannot be performed by the processor on packed data (i.e., the processor lacks the appropriate instruction), the data will have to be unpacked to perform the operation. For example, if the multimedia algorithm requires an add operation and the previously described packed add instruction is not available, the programmer must unpack both the first packed data item and the second packed data item (i.e., separate the elements comprising both the first packed data item and the second packed data item), add the separated elements together individually, and then pack the results into a packed result for further packed processing. The processing time required to perform such packing and unpacking often negates the performance advantage for which packed data formats are provided. Therefore, it is desirable to incorporate in a computer system a set of packed data instructions that provide all the required operations for typical multimedia algorithms. However, due to the limited die area on today's general purpose microprocessors, the number of instructions which may be added is limited. Therefore, it is desirable to invent instructions that provide both versatility (i.e. instructions which may be used in a wide variety of multimedia algorithms) and the greatest performance advantage.

SUMMARY

The invention provides a method and apparatus for performing complex digital filters is described. According to one aspect of the invention, a method for performing a complex digital filter is described. The complex digital filter is performed using a set of data samples and a set of complex coefficients. In addition, the complex digital filter is performed using a inner and outer loop. The outer loop steps through a number of corresponding relationships between the set of complex coefficients and the set of data samples. Each of these corresponding relationships is used by the digital filter to generate an output which is stored in the form of a packed data item. Each output packed data item has a first and second data element respectively storing the real and imaginary components of the filter's complex output. The inner loop steps thorough each complex coefficient in the set of complex coefficients. Within the inner loop, the data sample corresponding to the current complex coefficient (the complex coefficient currently identified by the inner loop) is determined according to the current corresponding relationship (the corresponding relationship currently identified by the outer loop). Then, in response to receiving an instruction, eight data elements are read and used to generate a currently calculated complex number. These eight data elements were previously stored as packed data and include two representations of each of the components of the current complex coefficient and its current corresponding data sample. Each of these data elements is either the positive or negative of the component they represent. As a result of the manner in which these eight data elements are stored, the currently calculated complex number represents the product of the current complex coefficient and its current corresponding data sample. The currently calculated complex number is then added to the current output packed data. As a result, the current output packed data stores the sum of the complex numbers generated in the current inner loop. According to another aspect of the invention, a machine-readable medium is described. This machine-readable medium has stored thereon data representing sequences of instructions which, when executed by a processor, cause that processor to perform the above described method.

According to another aspect of the invention, a method for updating complex coefficients used in a digital filter is described. This updating is performed using a set of complex data, a set of complex coefficients, an error distance, and a rate of convergence. A loop is implemented to step thorough each complex coefficient in the set of complex coefficients. Within the loop, the complex data sample corresponding to the current complex coefficient (the complex coefficient currently identified by the inner loop) is determined. In addition, a instruction is executed that causes eight data elements to be read and used to generate a currently calculated complex number. These eight data elements were previously stored as packed data and include two representations of each of the components of the error distance and the current corresponding complex data sample. Each of these data elements is either the positive or negative of the component they represent. As a result of the manner in which these eight data elements are stored, the currently calculated complex number represents the product of the error distance and the complex conjugate of the current corresponding data sample. The real and imaginary components of the currently calculated complex number are then shifted right by the rate of convergence to generate a current complex factor. The real and imaginary components of this current complex factor are subtracted from the respective real and imaginary components of the current complex coefficient to generate the updated components of the current complex coefficient.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may best be understood by referring to the following description and accompanying drawings which illustrate the invention. In the drawings:

FIG. 1 shows a block diagram illustrating an exemplary computer system according to one embodiment of the invention;

FIGS. 2A-2B illustrates the operation of the packed multiply-add instruction according to one embodiment of the present invention;

FIG. 3 illustrates a technique for performing a multiply-accumulate operation on two numbers according to one embodiment of the invention;

FIG. 4 illustrates the operation of a pack instruction according to one embodiment of the invention;

FIG. 5 illustrates the operation of an unpack instruction according to one embodiment of the invention;

FIG. 6 illustrates the operation of a packed add instruction according to one embodiment of the invention;

FIG. 7 illustrates the operation of a packed shift instruction according to one embodiment of the invention;

FIG. 8a illustrates a technique for storing data in one of the described formats which allows for efficient complex number multiplication according to one embodiment of the invention;

FIG. 8b illustrates a second technique for storing data in one of the described formats which allows for efficient complex number multiplication according to one embodiment of the invention;

FIG. 9 illustrates a technique for storing data in another of the described formats which allow for efficient complex number multiplication according to one embodiment of the invention;

FIG. 10 illustrates a technique for performing a complex FIR filter according to one embodiment of the invention;

FIG. 11 illustrates the technique for updating the complex coefficients according to one embodiment of the invention;

FIG. 12A is a general block diagram illustrating the use of a digital filter for ghost canceling a TV broadcast signal according to one embodiment of the invention;

FIG. 12B is a general block diagram illustrating the use of a digital filter for transmitting data to another computer according to one embodiment of the invention; and

FIG. 12C is a general block diagram illustrating the use of a digital filter for transforming mono audio into stereo audio with phase shift according to one embodiment of the invention.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the invention.

According to one aspect of the invention, a method and apparatus for storing complex data in formats which allow efficient complex multiplication operations to be performed and for performing such complex multiplication operations is described. In one embodiment of the invention, complex data is arranged in a manner which allows the multiplication of two complex numbers to be accomplished with one instruction. In addition, the result of this multiplication can be accumulated in a second instruction. In this manner, a multiply-accumulate operation is performed on two complex numbers in two instructions. According to another aspect of this invention, a method and apparatus for performing complex digital filters is generally described.

According to another aspect of the invention, a computer system generally having a transmitting unit, a processor, and a storage device is described. The storage device is coupled to the processor and has stored therein a routine. When executed by the processor, the routine causes the processor to perform a digital filter on unfiltered data items using complex coefficients to generate an output data stream. The transmitting unit is coupled to the processor for transmitting out of the computer system analogs signals that generated based on this output data stream. According to another aspect of the invention, a similar computer system is described. However, the storage device of this computer system has stored a digital filter routine that includes a least mean square routine for updating the set of complex coefficients used by the digital filter. According to another aspect of the invention, a computer system generally having a set of speakers, a conversion unit, a processor, and a storage device is described. The storage device is coupled to the processor and has stored therein a routine. When executed by the processor, the routine causes the processor to perform a complex digital filter on unfiltered data items, which represent mono audio signals, to generate an output data stream representing stereo audio signals with three dimensional sound displacement. The conversion unit is coupled to the speakers and the storage device to provide analog signals to the speakers for conversion into sound waves. The analog signals being generated based on the output data stream generated by the complex digital filter.

FIG. 1 shows a block diagram illustrating an exemplary computer system 100 according to one embodiment of the invention. The exemplary computer system 100 includes a processor 105, a storage device 110, and a bus 115. The processor 105 is coupled to the storage device 110 by the bus 115. In addition, a number of user input/output devices, such as a keyboard 120 and a display 125, are also coupled to the bus 115. The processor 105 represents a central processing unit of any type of architecture, such as a CISC, RISC, VLIW, or hybrid architecture. In addition, the processor 105 could be implemented on one or more chips. The storage device 110 represents one or more mechanisms for storing data. For example, the storage device 110 may include read only memory (ROM), random access memory (RAM), magnetic disk storage mediums, optical storage mediums, flash memory devices, and/or other machine-readable mediums. The bus 115 represents one or more busses (e.g., PCI, ISA, X-Bus, EISA, VESA, etc.) and bridges (also termed as bus controllers). While this embodiment is described in relation to a single processor computer system, the invention could be implemented in a multi-processor computer system. In addition, while this embodiment is described in relation to a 64-bit computer system, the invention is not limited to a 64-bit computer system.

In addition to other devices, one or more of a network 130, a TV broadcast signal receiver 131, a fax/modem 132, a digitizing unit 133, and a sound unit 134 may optionally be coupled to bus 115. The network 130 represents one or more network connections (e.g., an ethernet connection). While the TV broadcast signal receiver 131 represents a device for receiving TV broadcast signals, the fax/modem 132 represents a fax and/or modem for receiving and/or transmitting analog signals representing data. As previously described, such signals often need to be filtered using a digital filter. The digitizing unit 133 represents one or more devices for digitizing images (e.g., a scanner, camera, etc.). The sound unit 134 represents one or more devices for inputting and/or outputting sound (e.g., microphones, speakers, magnetic storage devices, optical storage devices, etc.)

FIG. 1 also illustrates that the storage device 110 has stored therein complex data 135 and software 136. Complex data 135 represents data stored in one or more of the formats described herein. Software 136 represents the necessary code for performing any and/or all of the techniques described with reference to FIGS. 3, 8a, 8b, 9, and 10. Of course, the storage device 110 preferably contains additional software (not shown), which is not necessary to understanding the invention.

FIG. 1 additionally illustrates that the processor 105 includes a decode unit 140, a set of registers 141, an execution unit 142, and an internal bus 143 for executing instructions. Of course, the processor 105 contains additional circuitry, which is not necessary to understanding the invention. The decode unit 140, registers 141 and execution unit 142 are coupled together by internal bus 143. The decode unit 140 is used for decoding instructions received by processor 105 into control signals and/or microcode entry points. In response to these control signals and/or microcode entry points, the execution unit 142 performs the appropriate operations. The decode unit 140 may be implemented using any number of different mechanisms (e.g., a look-up table, a hardware implementation, a PLA, etc.). While the decoding of the various instructions is represented herein by a series of if/then statements, it is understood that the execution of an instruction does not require a serial processing of these if/then statements. Rather, any mechanism for logically performing this if/then processing is considered to be within the scope of the implementation of the invention.

The decode unit 140 is shown including packed data instruction set 145 for performing operations on packed data. In one embodiment, the packed data instruction set 145 includes the following instructions: a packed multiply-add instruction(s) (PMADD) 150, a pack instruction(s) (PACK) 155, an unpack/interleave instruction(s) (PUNPCK) 160, a packed shift instruction(s) 165, an PXOR instruction(s) (PXOR) 170, a packed add instruction(s) (PADD) 175, a packed subtract instruction(s) (PSUB) 180, and a move instruction(s) 185. The operation of each of these instructions is further described herein. While these packed data instructions can be implemented to perform any number of different operations, in one embodiment these packed data instructions are those described in "A Set of Instructions for Operating on Packed Data," filed on Aug. 31, 1995, Ser. No. 08/521,360. Furthermore, in one embodiment the processor 105 is a pipelined processor (e.g., the Pentium processor) capable of completing one or more of these packed data instructions per clock cycle (ignoring any data dependencies and pipeline freezes). In addition to the packed data instructions, processor 105 can include new instructions and/or instructions similar to or the same as those found in existing general purpose processors. For example, in one embodiment the processor 105 supports an instruction set which is compatible with the Intel Architecture instruction set used by existing processors, such as the Pentium processor. Alternative embodiments of the invention may contain more or less, as well as different, packed data instructions and still utilize the teachings of the invention.

The registers 141 represent a storage area on processor 105 for storing information, including control/status information, integer data, floating point data, and packed data. It is understood that one aspect of the invention is the described instruction set for operating on packed data. According to this aspect of the invention, the storage area used for storing the packed data is not critical. The term data processing system is used herein to refer to any machine for processing data, including the computer system(s) described with reference to FIG. 1.

FIG. 2A illustrates the operation of the packed multiply-add instruction according to one embodiment of the present invention. FIG. 2A shows, in a simplified format, the operation of the multiply-add instruction on a first operand 210 and a second operand 220. The term operand is interchangeably used herein to refer to the data on which an instruction operates or the storage area (e.g., register, memory location, etc.) in which that data can be found. The first operand 210 is packed data item containing A.sub.3, A.sub.2, A.sub.1, and A.sub.0 as its data elements, while the second operand 220 is packed data item containing B.sub.3, B.sub.2, B.sub.1, and B.sub.0 as it data elements. The described embodiment of the multiply-add instruction multiplies together corresponding data elements of the first and second operands generating four intermediate results (e.g., A.sub.3 B.sub.3, A.sub.2 B.sub.2, A.sub.1 B.sub.1, and A.sub.0 B.sub.0). These intermediate results are summed by pairs producing two results (e.g., A.sub.3 B.sub.3 +A.sub.2 B.sub.2 and A.sub.1 B.sub.1 +A.sub.0 B.sub.0) that are packed into their respective elements of a result 230. Thus, the result 230 is packed data item including a first data element storing A.sub.3 B.sub.3 +A.sub.2 B.sub.2 and a second data element storing A.sub.1 B.sub.1 +A.sub.0 B.sub.0. Thus, the described embodiment of the multiply-add instruction performs, in parallel, two "multiply-add operations". In one embodiment, each data element of the first and second operands contains 16-bits, while each intermediate result and each data element in the result 230 contains 32-bits. This increase in the number of bits allows for increased precision.

FIG. 2B illustrates a circuit for the multiply-add instruction according to one embodiment of the invention. A control unit 240 processes the control signal for the multiply-add instruction. The control unit 240 outputs signals on an enable line 242 to control a packed multiply-adder 244.

The packed multiply-adder 244 has the following inputs: a first operand 250 having bits [63:0], a second source operand 252 having bits [63:0], and the enable line 242. The packed multiply-adder 244 includes four 16.times.16 multiplier circuits: a first multiplier 260, a second multiplier 262, a third multiplier 264 and a fourth multiplier 266. The first multiplier 260 has as inputs bits [15:0] of the first and second operands. The second multiplier 262 has as inputs bits [31:16] of the first and second operands. The third multiplier 264 has as inputs bits [47:32] of the first and second operands. The fourth multiplier 266 has as inputs bits [63:48] of the first and second operands.

The 32-bit intermediate results generated by the first multiplier 260 and the second multiplier 262 are received by a first adder 270, while the 32-bit intermediate results generated by the third multiplier 264 and the fourth multiplier 266 are received by a second adder 272. These adders add their respective 32-bit inputs. In one embodiment, these adders are composed of four 8-bit adders with the appropriate propagation delays. However, alternative embodiments could implement these adders in any number of ways (e.g., two 32-bit adders).The output of the first adder 270 (i.e., bits [31:0] of the result) and the output of the second adder 272 (i.e., bits [63:32] of the result) are combined into the 64-bit result and communicated to a result register 280. The result is then communicated out a result bus 290 for storage in the appropriate register.

While one circuit implementation of the multiply-add instruction has been provided, alternative embodiments could implement this instruction in any number of ways. For example, alternative embodiments could use different sized multipliers (e.g., 8.times.16, 8.times.8) and include the additional adder circuitry to perform the necessary passes through the multipliers. As another example, alternative embodiments could include circuitry capable of doing only one multiply-add operation at a time. In such embodiments, the two multiply-add operations would have to be performed serially.

FIG. 3 illustrates a technique for performing a multiply-accumulate operation on two numbers according to one embodiment of the invention. In this application, data is represented by ovals, while instructions are represented by rectangles.

At step 300, a complex number A and a complex number B are stored in a first packed data item 310 and a second packed data item 320. The first packed data item 310 stores data elements representing the complex number A in a first format (such that the data elements are Ar, Ai, Ar, Ai), while the second packed data item 320 stores data elements representing the complex number B in a second format (such that the data elements are Br, -Bi, Bi, Br). Of course, one or both of these numbers could be real numbers. In such situations, the real number(s) would be stored in these complex formats by storing zero as the imaginary components. In fact, this is useful for a number of applications.

As shown by step 330, the multiply-add instruction is performed on the first packed data item 310 and the second packed data item 320 to generate a resulting packed data item 340. Thus, the multiply-add instruction causes the processor 105 to read the first packed data item 310 and the second packed data item 320, and to perform the multiply-add operations. As a result of the multiply-add instruction, the resulting packed data item contains a first data element storing ArBr-AiBi (the real component of multiplying together complex numbers A and B) and a second data element storing ArBi+AiBr (the imaginary component of multiplying together complex numbers A and B).

Thus, by arranging data representing complex numbers in the appropriate formats, the multiplication of two complex number may be performed in a single multiply-add instruction. This provides a significant performance advantage over prior art techniques of performing complex multiplication. Of course, the advantages of this invention are greater when many such complex multiplication operations are required.

FIG. 3 also shows an accumulation packed data item 345. The accumulation packed data item 345 has two 32-bit data elements. If this is the first multiply-accumulate operation, the data elements of the accumulation packed data item 345 are zero. However, if previous multiply-accumulate operations have been performed, the data elements of the accumulation packed data item 345 store the accumulation of the real and imaginary results of the previous multiply-accumulate operations.

At step 350, a packed add dword instruction is performed on the resulting packed data item 340 and the accumulation packed data item 345. The results of this packed add instruction are stored back in the data elements of the accumulation packed data item 345. If the data elements of the accumulation packed data item 345 were storing zero, the data elements now store ArBr-AiBi and ArBi+AiBr, respectively. Otherwise, the data elements now store the accumulated total of the real and imaginary component results, respectively. In this manner, the accumulation of the complex multiplication is stored.

Of course, if only the product of complex numbers is required, then step 350 and the accumulation packed data item 345 are not required.

While two formats for storing data represented as complex numbers are shown in FIG. 3, other formats allow complex multiplication to be performed in a single multiply-add instruction and are within the scope of the invention. For example, the complex data can be stored as Ar, -Ai, Ar, Ai and Br, Bi, Bi, Br. As another example, the complex data could be rearranged (e.g., formats Ar, Ai, Br, -Ai and Bi, Br, Ar, Bi). Thus, one aspect of the invention is storing data representing complex numbers in a manner which allows complex multiplication to be performed in a single multiply-add instruction.

Alternative embodiments may employ a multiply-subtract instruction in addition to or instead of the multiply-add instruction. The multiply-subtract instruction is the same as the multiply-add operation, except the adds are replaced with subtracts. Thus, the described embodiment of the multiply-subtract instruction performs, in parallel, two "multiply-subtract operations". One circuit implementation of this instruction would be to make the first adder 270 and the second adder 272 capable of adding or subtracting. In this implementation, based on whether the current instruction is a multiply/add or multiply/subtract instruction, the first adder/subtractor 270 and the second adder/subtractor 272 would add or subtract their respective 32-bit inputs.

The multiplication of two complex numbers may also be performed in a single multiply-subtract instruction by storing the data in the appropriate formats (e.g., formats Ar, Ai, Ar, -Ai and Br, Bi, Bi, Br). Thus, another aspect of the invention is storing data representing complex numbers in formats which allow complex multiplication to be performed in a single multiply-subtract instruction. If both the multiply-add and multiply-subtract instructions are implemented, the data may be stored in formats to allow the multiply-add instruction to calculate the real components of complex multiplications (e.g., formats Ar, Ai, Cr, Ci and Br, -Bi, Dr, Di) and the multiply-subtract instruction to calculate the imaginary components of the complex multiplications (e.g., formats Ar, Ai, Cr, Ci and Bi, Br, Di, Dr). In this example, two complex numbers are respectively multiplied by two other complex numbers in parallel using two instructions. Thus, another aspect of the invention is storing data representing complex numbers in formats that allow complex multiplication to be performed efficiently by using multiply-add and multiply-subtract operations.

Of course, alternative embodiments may implement variations of these instructions. For example, alternative embodiments may include an instruction which performs at least one multiply-add operation or at least one multiply-subtract operation. As another example, alternative embodiments may include an instruction which performs at least one multiply-add operation in combination with at least one multiply-subtract operation. As another example, alternative embodiments may include an instruction which performs multiply-add operation(s) and/or multiply-subtract operation(s) in combination with some other operation.

The step 300 of storing represents a variety of ways of storing the first and second packed data items in the appropriate formats. For example, the complex data may already be stored on a CD ROM (represented by the storage device 110) in the described formats. In which case, step 300 may be performed by copying the complex data from the CD ROM into the main memory (also represented by the storage device 110), and then into registers (not shown) on the processor 105. As another example, the fax/modem 132 (see FIG. 1) connecting the computer system 100 to network 130 may receive complex data and store it in the main memory in one or more of the formats described herein--storing two representations of each of the components of the complex data such that it may be read in as packed data item in the described formats. This complex data may then be accessed as packed data and copied into registers on the processor 105. Since the data is stored in the disclosed formats, the processor 105 can easily and efficiently perform the complex multiplication (e.g., the processor 105 can access the first packed data item 310 in a single instruction). Although these formats for storing complex numbers require more storage space, the performance advantage for complex multiplication is worth the additional storage space in some situations.

If some or all of the data representing the complex numbers is stored in the storage device 110 according to the prior art format (e.g., Ar, Ai, Br, Bi), the processor 105 must rearrange this data before performing the multiply-add instruction. For example, the data may be stored on a CD ROM in the prior art format and the routine which loads it into main memory may be implemented to store it in the described formats. As another example, the modem may store (in the main memory) the complex data it receives in the prior art format. In which case, the processor 105 will need to read this complex data from main memory and rearrange it accordingly. Prearranging or rearranging the data in the above described formats can be efficiently accomplished using instructions from the packed data instruction set 145.

In one embodiment of the invention, the processor 105, executing the packed data instructions, can operate on packed data in several different packed data formats. For example, in one embodiment, packed data can be operated on in one of four formats: a "packed byte" format (e.g., PADDb), a "packed word" format (e.g., PADDw), a "packed double word" (dword) format (e.g., PADDd); or a "packed quad word" (qword) format (e.g., PADDq). The packed byte format includes eight separate 8-bit data elements; the packed word format includes four separate 16-bit data elements; the packed dword format includes two separate 32-bit data elements; and the packed quad word format includes one 64-bit data element. While certain instructions are discussed below with reference to one or two packed data formats, the instructions may be similarly applied the other packed data formats of the invention. Additionally, many of the instructions of packed data instruction set 145 can operate on signed or unsigned data and can be performed with or without "saturation". If an operation is performed using saturation, the value of the data element is clamped to a predetermined maximum or minimum value when the result of the operation exceeds the range of the data element. Exceeding the range of the data element is also referred to as data overflow or underflow. The use of saturation avoids the effects of data overflow or underflow. If the operation is performed without saturation, the data may be truncated or may indicate a data overflow or underflow in another manner.

FIG. 4 illustrates the operation of the pack instruction according to one embodiment of the invention. In this example, the pack instruction converts data from packed words into packed bytes--the pack word instruction (PACKSSw). The low-order byte of each packed word data element in a first operand 410 are packed into the low-order bytes of a result 430 as shown. The low-order byte of each packed word data element in a second operand 420 are packed into the high-order bytes of the result 430 as shown. In an alternate embodiment, the high-order bytes of each data element in the first and second operands are packed into the result. The instruction PACKSS performs a pack operation with signed saturation.

FIG. 5 illustrates the operation of the unpack instruction according to one embodiment of the invention. In one embodiment, the unpack instruction interleaves the low-order data elements from a first operand 510 and a second operand 520. The numbers inside each packed data item identifies the data elements for purposes of illustration. Thus, data element 0 of the first operand 510 is stored as data element 0 of a result 530. Data element 0 of the second operand 520 is stored as data element 1 of the result 530. Data element 1 of the first operand 510 is stored as data element 2 of the result 530 and so forth, until all data elements of the result 530 store data elements from either the first operand 510 or the second operand 520. The high-order data elements of both the first and second operand are ignored. By choosing either the first operand 510 or the second operand 520 to be all zeroes, the unpack may be used to unpack packed byte data elements into packed word data elements, or to unpack packed word data elements into packed dword data elements, etc. In an alternate embodiment, the high-order bytes of each packed data item are interleaved into the result.

FIG. 6 illustrates the operation of the packed add instruction according to one embodiment of the invention. FIG. 6 illustrates a packed add word operation (PADDw). The data elements of a first operand 610 are added to the respective packed data elements of a second operand 620 to generate a result 630. For example, data element 0 of the first operand 610 is added to data element 0 the second operand 620 and the result is stored as data element 0 of the result 630. The packed subtract instruction acts in a similar manner to the packed add instruction, except subtractions are performed.

FIG. 7 illustrates the operation of a packed shift instruction according to one embodiment of the invention. One embodiment of the invention includes instructions for shifting data elements right or left and for both arithmetic and logical shifts. The shift operation shifts the bits of each individual data element by a specified number of bits in a specified direction. FIG. 7 illustrates a packed shift right arithmetic double word operation (PSRAd). FIG. 7 shows a first operand 710 having to 32-bit data elements representing Ar (Ar.sub.HIGH and Ar.sub.LOW) and Ai (Ai.sub.HIGH and Ai.sub.LOW), respectively. A second operand 720 stores an unsigned 64-bit scalar data element indicating the shift count. In FIG. 7, the shift count value is 16 in base 10 notation. Thus, in the example shown in FIG. 7, each data element in the first operand 710 is shifted right by 16 bits to generate a result 730. Since the shift shown in FIG. 7 is arithmetic, the 16 high-order bits left open by the shift operation are filled with the initial value of the sign bit of the respective data element. In contrast, a logical shift fills the high or low-order bits (depending on the direction of the shift) of the data element with zeroes. Since the shift in the illustration is to the right by 16-bits, the second operand can be logically thought of as having four 16-bit data elements--data element 2 is Ar.sub.HIGH and data element 0 is Ai.sub.HIGH. In an alternative embodiment, the second operand is a packed data item in which each data element indicates a shift count by which the corresponding data element in the first operand 710 is shifted.

The PXOR instruction performs a logical exclusive OR on respective data elements from two packed data items to generate data elements in a result. Exclusive OR operations are well known in the art. Alternative embodiments also provide several other packed logical instructions, such as packed AND, OR, and ANDNOT i


Free Web Sudoku Puzzles.
Solve with your browser.
5     7   1 8 6  
    4           3
              4  
    6     4   8 5
  8     5     3  
4 7   3     1    
  4              
9           3    
  1 3 9   8     6
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!