Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Modulation circuit with integrated microelectro-mechanical system (MEMS) components
Patent Number: 7,417,511 Issued on 08/26/2008 to Menke

Title: Direct digital interpolative synthesis
Patent Number: 7,417,510 Issued on 08/26/2008 to Huang

Title: Spread spectrum modulation of a clock signal for reduction of electromagnetic interference
Patent Number: 7,417,509 Issued on 08/26/2008 to Kultgen

Title: Multiple RF path amplifiers
Patent Number: 7,417,508 Issued on 08/26/2008 to Quaglietta

Title: Bias circuit for power amplifier having a low degradation in distortion characteristics
Patent Number: 7,417,507 Issued on 08/26/2008 to Yamamoto,   et al.

Title: Amplifier having switchable negative feedback
Patent Number: 7,417,506 Issued on 08/26/2008 to Klein,   et al.

Title: CMOS amplifiers with frequency compensating capacitors
Patent Number: 7,417,505 Issued on 08/26/2008 to Forbes,   et al.

Title: Startup and shutdown click noise elimination for class D amplifier
Patent Number: 7,417,504 Issued on 08/26/2008 to Strydom,   et al.

Title: Method for high efficiency audio amplifier
Patent Number: 7,417,503 Issued on 08/26/2008 to Xu,   et al.

Title: Selectable power supply for audio amplifier
Patent Number: 7,417,502 Issued on 08/26/2008 to Cochrane

Title: Variable inductor, oscillator including the variable inductor and radio terminal comprising this oscillator, and amplifier including the variable inductor and radio terminal comprising this am
Patent Number: 7,417,501 Issued on 08/26/2008 to Fujimoto,   et al.

Title: Control of an adjustable gain amplifier
Patent Number: 7,417,500 Issued on 08/26/2008 to Arnott

Title: Gain controlled amplifier and cascoded gain controlled amplifier based on the same
Patent Number: 7,417,499 Issued on 08/26/2008 to Moon,   et al.

Title: PWM modulator and class-D amplifier having the same
Patent Number: 7,417,497 Issued on 08/26/2008 to Lee

Title: Demodulator circuit of RFID system
Patent Number: 7,417,496 Issued on 08/26/2008 to Kang,   et al.

Title: Reconfigurable frequency filter
Patent Number: 7,417,495 Issued on 08/26/2008 to Li

Title: Voltage generator in a flash memory device
Patent Number: 7,417,493 Issued on 08/26/2008 to Lee

Title: Internal voltage generator of semiconductor integrated circuit
Patent Number: 7,417,490 Issued on 08/26/2008 to Kim

Title: Regulation circuit for inductive charge pump
Patent Number: 7,417,488 Issued on 08/26/2008 to Ahmed,   et al.

Title: Voltage-current conversion circuit, amplifier, mixer circuit, and mobile appliance using the circuit
Patent Number: 7,417,486 Issued on 08/26/2008 to Koutani,   et al.

Title: Differential energy difference integrator
Patent Number: 7,417,485 Issued on 08/26/2008 to Vecera

Title: Level shifter with boost and attenuation programming
Patent Number: 7,417,484 Issued on 08/26/2008 to Voo

Title: Wide-band wide-swing CMOS gain enhancement technique and method therefor
Patent Number: 7,417,483 Issued on 08/26/2008 to Wong,   et al.

Title: Adaptive voltage scaling for an electronics device
Patent Number: 7,417,482 Issued on 08/26/2008 to Elgebaly,   et al.

Title: Controlling signal states and leakage current during a sleep mode
Patent Number: 7,417,481 Issued on 08/26/2008 to Ahsanullah,   et al.

Title: Duty cycle correction circuit whose operation is largely independent of operating voltage and process
Patent Number: 7,417,480 Issued on 08/26/2008 to Boerstler,   et al.

Title: Duty detection circuit and method for controlling the same
Patent Number: 7,417,479 Issued on 08/26/2008 to Kitayama

Title: Delay line circuit
Patent Number: 7,417,478 Issued on 08/26/2008 to Kim,   et al.

Title: Power-on-reset circuit with output reset to ground voltage during power off
Patent Number: 7,417,476 Issued on 08/26/2008 to Hung

Title: Circuit and method for generating power up signal
Patent Number: 7,417,475 Issued on 08/26/2008 to Byeon,   et al.

Title: Clock frequency division methods and circuits
Patent Number: 7,417,474 Issued on 08/26/2008 to Jamal

Title: Multi-channel integrated circuit
Patent Number: 7,417,472 Issued on 08/26/2008 to Tumer,   et al.

Title: Voltage comparator having hysteresis characteristics
Patent Number: 7,417,471 Issued on 08/26/2008 to Gong,   et al.

Title: Phase frequency detector with a novel D flip flop
Patent Number: 7,417,470 Issued on 08/26/2008 to Riley

Title: Compensation for leakage current from dynamic storage node variation by the utilization of an automatic self-adaptive keeper
Patent Number: 7,417,469 Issued on 08/26/2008 to Cheng,   et al.

Title: Dynamic and differential CMOS logic with signal-independent power consumption to withstand differential power analysis
Patent Number: 7,417,468 Issued on 08/26/2008 to Verbauwhede,   et al.

Title: Flip-flop circuit and frequency divider using the flip-flop circuit
Patent Number: 7,417,466 Issued on 08/26/2008 to Akahori

Title: N-domino output latch
Patent Number: 7,417,465 Issued on 08/26/2008 to Lundberg,   et al.

Title: Bi-directional signal transmission system
Patent Number: 7,417,464 Issued on 08/26/2008 to Crawford

Title: Wireline transmission circuit
Patent Number: 7,417,463 Issued on 08/26/2008 to Danesh,   et al.

Title: Variable external interface circuitry on programmable logic device integrated circuits
Patent Number: 7,417,462 Issued on 08/26/2008 to Wong,   et al.

Title: Multi-standard transmitter
Patent Number: 7,417,460 Issued on 08/26/2008 to De Laurentiis,   et al.

Title: On-die offset reference circuit block
Patent Number: 7,417,459 Issued on 08/26/2008 to Wilson,   et al.

Title: Gate driving circuit and display apparatus having the same
Patent Number: 7,417,458 Issued on 08/26/2008 to Ahn,   et al.

Title: Scalable non-blocking switching network for programmable logic
Patent Number: 7,417,457 Issued on 08/26/2008 to Pani,   et al.

Title: Dedicated logic cells employing sequential logic and control logic functions
Patent Number: 7,417,456 Issued on 08/26/2008 to Verma,   et al.

Title: Programmable function generator and method operating as combinational, sequential and routing cells
Patent Number: 7,417,455 Issued on 08/26/2008 to Verma,   et al.

Title: Low-swing interconnections for field programmable gate arrays
Patent Number: 7,417,454 Issued on 08/26/2008 to Rahman,   et al.

Title: System and method for dynamically executing a function in a programmable logic array
Patent Number: 7,417,453 Issued on 08/26/2008 to Goodnow,   et al.

Title: Techniques for providing adjustable on-chip termination impedance
Patent Number: 7,417,452 Issued on 08/26/2008 to Wang,   et al.

Title: Leakage power management with NDR isolation devices
Patent Number: 7,417,451 Issued on 08/26/2008 to Kawa

Title: Testing combinational logic die with bidirectional TDI-TMS/TDO chanel circuit
Patent Number: 7,417,450 Issued on 08/26/2008 to Whetsel

Title: Wafer stage storage structure speed testing
Patent Number: 7,417,449 Issued on 08/26/2008 to Posey,   et al.

Title: System to calibrate on-die temperature sensor
Patent Number: 7,417,448 Issued on 08/26/2008 to Lim,   et al.

Title: Probe cards employing probes having retaining portions for potting in a retention arrangement
Patent Number: 7,417,447 Issued on 08/26/2008 to Kister

Title: Probe for combined signals
Patent Number: 7,417,446 Issued on 08/26/2008 to Hayden,   et al.

Title: Probing method and prober for measuring electrical characteristics of circuit devices
Patent Number: 7,417,445 Issued on 08/26/2008 to Sakagawa,   et al.

Title: Method and apparatus for inspecting integrated circuit pattern
Patent Number: 7,417,444 Issued on 08/26/2008 to Shinada,   et al.

Title: Determination of effective resistance between a power sourcing equipment and a powered device
Patent Number: 7,417,443 Issued on 08/26/2008 to Admon,   et al.

Title: Method and apparatus for testing tunnel magnetoresistive effect element, manufacturing method of tunnel magnetoresistive effect element and tunnel magnetoresistive effect element
Patent Number: 7,417,442 Issued on 08/26/2008 to Hachisuka,   et al.

Title: Methods and systems for guarding a charge transfer capacitance sensor for proximity detection
Patent Number: 7,417,441 Issued on 08/26/2008 to Reynolds

Title: Methods and systems for the rapid detection of concealed objects
Patent Number: 7,417,440 Issued on 08/26/2008 to Peschmann,   et al.

Title: Impedance conversion circuit and integrated circuit including thereof
Patent Number: 7,417,439 Issued on 08/26/2008 to Hirabayashi,   et al.

Title: Battery voltage measurement apparatus
Patent Number: 7,417,438 Issued on 08/26/2008 to Miyamoto

Title: Vehicle battery testing assembly
Patent Number: 7,417,437 Issued on 08/26/2008 to Torres

Title: Selectable tap induction coil
Patent Number: 7,417,436 Issued on 08/26/2008 to Chesser,   et al.

Title: Method for generating a homogeneous magnetization in a spatial examination volume of a magnetic resonance installation
Patent Number: 7,417,435 Issued on 08/26/2008 to Diehl

Title: Magnetic resonance imaging system with iron-assisted magnetic field gradient system
Patent Number: 7,417,434 Issued on 08/26/2008 to Overweg

Title: Method, examination apparatus and antenna array for magnetic resonance data acquisition
Patent Number: 7,417,433 Issued on 08/26/2008 to Heid,   et al.

Title: Asymmetric ultra-short gradient coil for magnetic resonance imaging system
Patent Number: 7,417,432 Issued on 08/26/2008 to Overweg

Title: Coil array for magnetic resonance imaging with reduced coupling between adjacent coils
Patent Number: 7,417,431 Issued on 08/26/2008 to Lanz,   et al.

Title: Continuous moving-table MRI contrast manipulation and/or update of scanning parameters
Patent Number: 7,417,430 Issued on 08/26/2008 to Aldefeld,   et al.

Title: Moving table MRI with subsampling in parallel
Patent Number: 7,417,429 Issued on 08/26/2008 to Fuderer,   et al.

Title: Fibre tracking magnetic resonance imaging
Patent Number: 7,417,428 Issued on 08/26/2008 to Hoogenraad,   et al.

Title: Magnetic resonance data acquisition method and apparatus
Patent Number: 7,417,427 Issued on 08/26/2008 to Porter

Microprocessor including return prediction unit configured to determine whether a stored return address corresponds to more than one call instruction Number:6,973,563 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Microprocessor including return prediction unit configured to determine whether a stored return address corresponds to more than one call instruction

Abstract: Various embodiments of methods and systems for implementing a return address prediction mechanism in a microprocessor are disclosed. In one embodiment, a return address prediction mechanism includes a return storage and a controller. The return storage includes one or more entries. One of the entries includes a count and a first return address that corresponds to a recently detected call operation. The controller is configured to receive a new return address that corresponds to a call instruction and to compare the new return address to the return address stored in the entry. If the new return address equals the return address in the entry, the controller may be configured to increase the value of the count in the entry. By increasing the count instead of allocating a new entry in the return storage to the new return address, the effective size of the return storage may be increased.

Patent Number: 6,973,563 Issued on 12/06/2005 to Sander


Inventors: Sander; Benjamin T. (Austin, TX)
Assignee: Advanced Micro Devices, Inc. (Sunnyvale, CA)
Appl. No.: 037403
Filed: January 4, 2002

Current U.S. Class: 712/242; 712/239
Intern'l Class: G06F 009/42
Field of Search: 712/233,234,239,241,242


References Cited [Referenced By]

U.S. Patent Documents
5732272Mar., 1998Gochee.
5768576Jun., 1998Hoyt et al.
5822575Oct., 1998Tran.
6170054Jan., 2001Poplingher.
6269436Jul., 2001Tran et al.


Other References

"Stacks," www.funducode.com/freec/datastructure/datastruct1.htm, Apr. 22, 2001, pp. 1-4.

Primary Examiner: Chan; Eddie
Assistant Examiner: Huisman; David J.
Attorney, Agent or Firm: Meyertons Hood Kivlin Kowert & Goetzel, P.C., Kivlin; B. Noël

Claims



1. A microprocessor, comprising:

an instruction cache configured to fetch instructions for execution; and

a return prediction unit coupled to said instruction cache, wherein said return prediction unit comprises:

a return storage including a first entry configured to store a count value and a return address of a previously detected call instruction; and

a controller coupled to the return storage and configured to compare the return address stored in the first entry to a new return address of a subsequently detected call instruction, and in response to determining that the stored return address is the same as the new return address, to modify the count value to indicate that the stored return address corresponds to more than one call instruction;

wherein if the count value indicates that the stored return address corresponds to more than one call instruction, the controller is configured to provide a predicted return address to the instruction cache for fetching by providing the stored return address and correspondingly modifying the count value.

2. The microprocessor as recited in claim 1, wherein in response to determining that the return address stored in the first entry is the same as the new return address, the controller is configured to modify the count value without allocating a new entry within the return storage.

3. The microprocessor as recited in claim 1, wherein the controller is configured to modify the count value to indicate that the stored return address corresponds to more than one call instruction by increasing the count value.

4. The microprocessor as recited in claim 1, wherein if the count value indicates that the stored return address corresponds to more than one call instruction, the controller is configured to correspondingly modify the count value by decreasing the count value.

5. The microprocessor as recited in claim 1, wherein if the new return address is not the same as the stored return address, the controller is configured to allocate a new entry within the return storage for the new return address.

6. The microprocessor as recited in claim 5, wherein the new entry is configured to store a count value and wherein the controller is configured to initialize the count value of the new entry to a minimum count value.

7. The microprocessor as recited in claim 1, wherein if the new return address is the same as the stored return address and the count value equals a maximum count value, the controller is configured to allocate a new entry within the return storage for the new return address.

8. The microprocessor as recited in claim 1, wherein the return storage is implemented as a stack structure, and wherein the first entry is identified by a top of stack pointer.

9. The microprocessor as recited in claim 8, wherein if the new return address is not the same as the stored return address, the controller is configured to allocate a new entry for the new return address and to modify the top of stack pointer to identify the new entry.

10. The microprocessor as recited in claim 8, wherein in response to a branch prediction being made, the controller is configured to save a copy of a current value of the top of stack pointer.

11. The microprocessor as recited in claim 10, wherein the controller is further configured to save a copy of the count value associated with the first entry if the first entry is identified by the top of stack pointer when the branch prediction is made.

12. The microprocessor as recited in claim 1, wherein each entry in the return storage is configured to store a respective count value.

13. The microprocessor as recited in claim 1, wherein fewer than all entries in the return storage are configured to store a respective count value.

14. The microprocessor as recited in claim 1, wherein the controller is further configured to provide the predicted return address to the instruction cache for fetching in response to a return instruction being detected.

15. The microprocessor as recited in claim 14, wherein if the count value associated with the stored return address is equal to a minimum value after being correspondingly modified, the controller is further configured to remove the first entry from the return storage.

16. The microprocessor as recited in claim 1, wherein if the count value indicates that the stored return address corresponds to a single call operation, the controller is configured to provide the return address by providing the stored return address and removing the first entry from the return storage.

17. A method, comprising:

storing a count value and a return address of a previously detected call instruction in a first entry of a return storage;

comparing the return address stored in the first entry to a new return address of a subsequently detected call instruction;

in response to determining that the stored return address is the same as the new return address, modifying the count value to indicate that the stored return address corresponds to more than one call instruction; and

if the count value indicates that the stored return address corresponds to more than one call instruction, providing a predicted return address for fetching by providing the stored return address and correspondingly modifying the count value.

18. The method as recited in claim 17, wherein modifying the count value to indicate that the stored return address corresponds to more than one call instruction occurs without allocating a new entry within the return storage.

19. The method as recited in claim 17, wherein modifying the count value to indicate that the stored return address corresponds to more than one call instruction includes increasing the count value.

20. The method as recited in claim 17, wherein correspondingly modifying the count value if the count value indicates that the stored return address corresponds to more than one call instruction includes decreasing the count value.

21. The method as recited in claim 17, further comprising allocating a new entry within the return storage for the new return address in response to determining that the stored return address is the same as the new return address.

22. The method as recited in claim 21, wherein the new entry is configured to store a count value and wherein the method further comprises initializing the count value of the new entry to a minimum count value.

23. The method as recited in claim 17, further comprising allocate a new entry within the return storage for the new return address in response to determining that the new return address is the same as the stored return address and the count value equals a maximum count value.

24. The method as recited in claim 17, wherein the return storage is implemented as a stack structure, and wherein the first entry is identified by a top of stack pointer.

25. The method as recited in claim 24, further comprising allocating a new entry for the new return address and modifying the top of stack pointer to identify the new entry in response to determining that the new return address is not the same as the stored return address.

26. The method as recited in claim 24, further comprising saving a copy of a current value of the top of stack pointer in response to a branch prediction being made.

27. The method as recited in claim 26, further comprising saving a copy of the count value associated with the first entry in response to determining that the first entry is identified by the top of stack pointer when the branch prediction is made.

28. The method as recited in claim 17, wherein providing the predicted return address to the instruction cache for fetching occurs in response to a return instruction being detected.

29. The method as recited in claim 28, further comprising removing the first entry from the return storage in response to determining that the count value associated with the stored return address is equal to a minimum value after being correspondingly modified.

30. A system, comprising:

a system memory; and

a microprocessor coupled to the system memory, wherein the microprocessor includes:

an instruction cache configured to fetch instructions for execution; and

a return prediction unit coupled to said instruction cache, wherein said return prediction unit comprises:

a return storage including a first entry configured to store a count value and a return address of a previously detected call instruction; and

a controller coupled to the return storage and configured to compare the return address stored in the first entry to a new return address of a subsequently detected call instruction, and in response to determining that the stored return address is the same as the new return address, to modify the count value to indicate that the stored return address corresponds to more than one call instruction;

wherein if the count value indicates that the stored return address corresponds to more than one call instruction, the controller is configured to provide a predicted return address to the instruction cache for fetching by providing the stored return address and correspondingly modifying the count value.

31. The system as recited in claim 30, wherein in response to determining that the return address stored in the first entry is the same as the new return address, the controller is configured to modify the count value without allocating a new entry within the return storage.

32. The system as recited in claim 30, wherein the controller is configured to modify the count value to indicate that the stored return address corresponds to more than one call instruction by increasing the count value.

33. The system as recited in claim 30, wherein if the count value indicates that the stored return address corresponds to more than one call instruction, the controller is configured to correspondingly modify the count value by decreasing the count value.

34. The system as recited in claim 30, wherein if the new return address is not the same as the stored return address, the controller is configured to allocate a new entry within the return storage for the new return address.

35. The system as recited in claim 30, wherein the return storage is implemented as a stack structure, wherein the first entry is identified by a top of stack pointer, and wherein if the new return address is not the same as the stored return address, the controller is configured to allocate a new entry for the new return address and to modify the top of stack pointer to identify the new entry.
Description



BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention is related to the field of microprocessors and, more particularly, to return address prediction mechanisms.

2. Description of the Related Art

Superscalar microprocessors are capable of executing multiple instructions per clock cycle. However, in order to take advantage of this ability, a superscalar microprocessor needs to be provided with a steady stream of instructions. If a microprocessor's instruction stream stalls, its pipeline may contain "bubbles" or gaps and, as a result, fewer instructions may be executed each cycle. Consequentially, if the rate at which instructions are being fetched declines below a certain threshold, a processor's performance may be reduced.

When a processor executes a typical set of instructions, control flows sequentially from one instruction to the next. In this situation, supplying the processor with instructions may simply involve fetching the next sequential instruction in the set. However, if certain instructions alter the control flow, determining which instruction to fetch next becomes more difficult. For example, a branch instruction is an instruction that causes the next instruction to be fetched from one of at least two possible addresses. One address is the address immediately following the branch instruction. This address is referred to as the "next sequential address." The second address is specified by the branch instruction, and is referred to as the "branch target address" or simply the "target address." Branch instructions typically select between the target address and the next sequential address based on a particular condition flag that is set by a previously executed instruction.

Since branches potentially alter the control flow in an instruction sequence, they create difficulty in superscalar microprocessors because the actual outcome of the branch may not be determined until the branch exits the pipeline. As a result, the location of the instruction following the branch instruction (at either the next sequential address or the branch target address) may not be known until the branch has executed. To avoid bubbles in the pipeline, however, the next instruction needs to be fetched as soon as possible after the branch instruction is fetched. In order to handle this situation, most pipelined processors employ some sort of branch prediction so that they can speculatively begin fetching instructions based on a predicted outcome of the branch without having to wait for the branch to actually resolve. Branch instructions are typically predicted when the branch instruction is decoded or when the branch instruction is fetched, depending on the branch prediction scheme and the configuration of the microprocessor.

One consequence of using branch prediction is that some predictions may be incorrect. If a branch prediction is incorrect, or "mispredicted" (as determined when the branch instruction executes), then the instructions that were speculatively fetched after the branch instruction was fetched are discarded from the instruction processing pipeline and the correct instructions are fetched.

A particularly difficult type of branch instruction to predict is a subroutine return instruction (e.g., RET, the return instruction defined for the x86 microprocessor architecture). For example, some superscalar microprocessor architectures include a "stack" area in memory. One of the uses of a stack is for passing information between a program and a subroutine called by that program. One such item of information that may be passed between a program and a subroutine may be the address of the program instruction that should be executed when the subroutine finishes execution (i.e., the return address). Typically, the return address identifies the instruction following the call to the subroutine in the calling program. A return from a subroutine may be initiated by a subroutine return instruction (or, in some architectures, an indirect jump instruction) that pops the return address off the stack. However, since instructions may be executed in parallel, the value that is at the top of the stack when a return instruction is fetched or decoded is not necessarily the address that will be at the top of the stack when the return instruction is executed. As a result, unlike some other types of branches, the target address of a return instruction cannot be determined until the return instruction is actually executed, frustrating branch prediction.

In order to be able to avoid stalls when return instructions are encountered, many microprocessors use return stacks or return address buffers to predict return addresses. These mechanisms store return addresses whenever a call instruction is detected. When a return instruction is detected, the top value of the return stack is typically removed from the return address storage and provided as the predicted return address for the detected return instruction.

Return address prediction mechanisms typically include several entries in order to be able to handle nested subroutine calls in which one subroutine calls another. One potential problem that may be encountered in many return address prediction mechanisms is that the return address storage may overflow if subroutines are nested too deeply. For example, if X is the number of call instructions in a particular code segment, Y is the number of return instructions in that segment, and Z is the number of return address storage locations in the return address storage, the return address storage may overflow if X-Y>Z. Typically, when the return address storage overflows, the oldest stored return address is overwritten. Thus, whenever the return instruction that corresponds to the overwritten entry is detected, the return address prediction mechanism may not be able to correctly predict the return address. As a result, return address prediction may suffer if the return address storage overflows.

SUMMARY

Various embodiments of methods and systems for implementing a return address prediction mechanism in a microprocessor are disclosed. In one embodiment, a return address prediction mechanism includes a return storage and a controller. The return storage may include one or more entries. One of the entries may include a count and a first return address that corresponds to a recently detected call operation. The controller may be coupled to the return storage and configured to receive a new return address that corresponds to a call instruction. The controller may be configured to compare the new return address to the return address stored in the entry. If the new return address equals the return address in the entry, the controller may be configured to increase the value of the count in the entry instead of allocating a new entry to the new return address. In contrast, if the new return address does not equal the return address in the entry, the controller may be configured to allocate a new entry for the new return address. Additionally, if the new return address equals the return address in the entry and the value of the count equals a maximum count value, the controller may be configured to allocate a new entry for the new return address. When allocating a new entry to the new return address, the controller may be configured to initialize a count in the new entry to a minimum count value.

In some embodiments, the return storage may be implemented as a stack structure. A top of stack pointer may identify the next entry to use when predicting a return address. The controller may be configured to compare each new return address to the return address stored in the entry identified by the top of stack pointer. If the controller allocates a new entry, the controller may modify the top of stack pointer so that it identifies the new entry. The controller may be configured save a copy of a current value of the top of stack pointer whenever a branch prediction is made. The controller may also save a copy of the value of the count in the entry at the top of the stack. If the branch prediction is incorrect, the controller may restore the stored value of the pointer to the top of the stack and/or the stored value of the count.

In several embodiments, each entry in the return storage may include a respective count. However, fewer than all of the entries in the return storage may include a respective count in some embodiments.

In response to a return instruction being detected, the controller may be configured to provide the most recently detected return address as a predicted return address. Additionally, the controller may be configured to decrease a value of the count associated with the entry storing the most recently detected return address. If the value of the count associated with the most recently detected return address is equal to a minimum value after being decreased, the controller may be configured to remove the most recently detected return address's entry from the return storage.

In another embodiment, a method of predicting return addresses involves detecting a call instruction in an instruction stream, comparing a first return address stored in a first entry in a return storage to a second return address that is associated with the call instruction, and if the first return address equals the second return address, increasing a first count associated with the first entry. If the second return address does not equal the first return address, a second entry in the return storage may be allocated to the second return address.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows one embodiment of a microprocessor.

FIG. 2 is a block diagram of one embodiment of a return address prediction mechanism.

FIG. 3A is a flowchart showing one embodiment of a method of saving return addresses in response to detecting a call instruction in an instruction stream.

FIG. 3B shows one embodiment of a method of providing a predicted return address in response to detecting a return instruction.

FIG. 3C is a flowchart illustrating one embodiment of a method of checkpointing to prevent a return address prediction mechanism from being corrupted by mispredicted branches.

FIG. 4 shows an example of one embodiment of a computer system that may include a return address prediction mechanism.

FIG. 5 shows another embodiment of a computer system that may include a return address prediction mechanism.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. Note, the headings are for organizational purposes only and are not meant to be used to limit or interpret the description or claims. Furthermore, note that the word "may" is used throughout this application in a permissive sense (i.e., having the potential to, being able to), not a mandatory sense (i.e., must). The term "include" and derivations thereof mean "including, but not limited to." The term "connected" means "directly or indirectly connected," and the term "coupled" means "directly or indirectly connected."

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1—Microprocessor

FIG. 1 is a block diagram of one embodiment of a superscalar microprocessor 200 that may include a return address prediction mechanism. Before proceeding with a detailed description of the return address prediction mechanism employed within microprocessor 200, general aspects regarding other subsystems employed within the exemplary superscalar microprocessor 200 of FIG. 1 will be described.

Superscalar microprocessor 200 may include a prefetch/predecode unit 202 and a branch prediction unit 220 coupled to an instruction cache 204. An instruction alignment unit 206 may be coupled between instruction cache 204 and a plurality of decode units 208A-208D (referred to collectively as decode units 208). Each decode unit 208A-208D may be coupled to a respective reservation station unit 210A-210D (referred to collectively as reservation stations 210), and each reservation station 210A-210D may be coupled to a respective functional unit 212A-212D (referred to collectively as functional units 212). Decode units 208, reservation stations 210, and functional units 212 may also be coupled to a reorder buffer 216, a register file 218 and a load/store unit 222. A data cache 224 may be coupled to load/store unit 222, and a microcode unit 209 may be coupled to instruction alignment unit 206. In one embodiment, the microprocessor 200 may be designed to be compatible with the x86 architecture.

Instruction cache 204 may temporarily store instructions prior to their dispatch to decode units 208. Instruction code may be provided to instruction cache 204 by prefetching code from a main memory (not shown) through prefetch/predecode unit 202. Instruction cache 204 may be implemented in a set-associative, a fully-associative, or a direct-mapped configuration.

Prefetch/predecode unit 202 may prefetch instruction code from the main memory for storage within instruction cache 204. In one embodiment, prefetch/predecode unit 202 may be configured to burst code from the main memory into instruction cache 204. Prefetch/predecode unit 202 may employ a variety of specific code prefetching techniques and algorithms.

In the embodiment shown in FIG. 1, each of the decode units 208 may include decoding circuitry for decoding certain instructions. Some instructions may be directly decoded by decode units 208. The remaining instructions may be executed by invoking microcode unit 209. For example, when a microcoded instruction is encountered, microcode unit 209 may parse and serialize the instruction into a series of instructions that can be directly decoded by decode unit 208. In addition, each decode unit 208A-208D may route displacement and immediate data to a corresponding reservation station unit 210A-210D. Output signals from the decode units 208 may include bit-encoded execution instructions for the functional units 212 as well as operand address information, immediate data and/or displacement data.

The superscalar microprocessor of FIG. 1 supports out of order execution. Reorder buffer 216 may keep track of the original program sequence for register read and write operations, facilitate register renaming, allow for speculative instruction execution and branch misprediction recovery, and facilitate precise exceptions. It is noted that in some embodiments, the functionality of the reorder buffer 216 may be distributed among other microprocessor components, such as reservation stations 210. Upon decode of an instruction that involves the update of a register, a temporary storage location within reorder buffer 216 may be reserved to store speculative register states (or, if some of the functionality of the reorder buffer 216 is distributed among the other microprocessor components, a register renaming map may be used to store speculative register states). In some embodiments, reorder buffer 216 may be implemented in a first-in-first-out configuration wherein speculative results move to the "bottom" of the buffer as they are validated and written to the register file, thus making room for new entries at the "top" of the buffer. By storing speculative register states until the instructions that generated those states are validated, the results of speculatively-executed instructions along a mispredicted path may be invalidated in the buffer before they are written to register file 218 if a branch prediction is incorrect.

The bit-encoded execution instructions and immediate data provided at the outputs of decode units 208A-208D may be routed directly to respective reservation station units 210A-210D. In one embodiment, each reservation station unit 210A-210D may be capable of holding instruction information (e.g., bit encoded execution bits as well as operand values, operand tags and/or immediate data) for several pending instructions awaiting issue to the corresponding functional unit. Instructions aligned and dispatched through decode unit 208A may be passed to reservation station unit 210A and subsequently to functional unit 212A for execution. Similarly, instructions aligned and dispatched to decode unit 208B may be passed to reservation station unit 210B and into functional unit 212B, and so on. It is noted that while the embodiment of FIG. 1 shows that decode units 208A-208D are each associated with a dedicated reservation station unit 210A-210D, and that each reservation station unit 210A-210D is similarly associated with a dedicated functional unit 212A-212D, other configurations are possible. For example, in one embodiment, a single reservation station or scheduler may schedule operations for all of the functional units. In such an embodiment, the integrated scheduler may incorporate all or some of the functionality provided by reorder buffer 216.

Upon decode of a particular instruction, if a required operand is a register location, register address information may be routed to reorder buffer 216 (or, in an alternative embodiment, to a register renaming map) and register file 218. The register file 218 may contain the architected registers of the microprocessor. For example, in the x86 architecture, the x86 register file includes eight 32-bit real registers (e.g., EAX, EBX, ECX, EDX, EBP, ESI, EDI and ESP). Reorder buffer 216 (or a register renaming map) may contain temporary storage locations for results that change the contents of these registers, allowing out of order execution. A temporary storage location of reorder buffer 216 is reserved for each instruction which, upon decode, is determined to modify the contents of one of the real registers. Therefore, at various points during execution of a particular program, reorder buffer 216 may have one or more locations that contain the speculatively executed contents of a given register. Following decode of a given instruction, it may be determined that reorder buffer 216 has a previous location or locations assigned to a register used as an operand in the given instruction. If so, the reorder buffer 216 may forward to the corresponding reservation station either: 1) the value in the most recently assigned location, or 2) a tag for the most recently assigned location if the value has not yet been produced by the functional unit that will eventually execute the previous instruction. If the reorder buffer 216 has a location reserved for a given register, the operand value (or tag) may be provided from reorder buffer 216 rather than from register file 218. If there is no location reserved for a required register in reorder buffer 216, the value may be taken directly from register file 218. If the operand corresponds to a memory location, the operand value may be provided to the reservation station unit through load/store unit 222.

Reservation station units 210A-210D may be provided to temporarily store instruction information to be speculatively executed by the corresponding functional units 212A-212D. As stated previously, each reservation station unit 210A-210D may store instruction information for up to three pending instructions. Each of the four reservation stations 210A-210D may contain locations to store bit-encoded execution instructions to be speculatively executed by the corresponding functional unit and the values of operands to be operated on by those instructions. If a particular operand is not available, a tag for that operand may be provided from reorder buffer 216 and stored within the corresponding reservation station until the result has been generated (e.g., by completion of the execution of a previous instruction). It is noted that when an instruction is executed by one of the functional units 212A-212D, the result of that instruction may be passed directly to any reservation station units 210A-210D that are waiting for that result at the same time the result is passed to update reorder buffer 216 (this technique is commonly referred to as "result forwarding"). Instructions are issued to functional units for execution after the values of any required operand(s) are made available. That is, if an operand associated with a pending instruction within one of the reservation station units 210A-210D has been tagged with a location of a previous result value within reorder buffer 216 that corresponds to an instruction which modifies the required operand, the instruction is not issued to the corresponding functional unit 212 until the operand result for the previous instruction has been obtained. Accordingly, the order in which instructions are executed may not be the same as the order of the original program instruction sequence. Reorder buffer 216 may maintain data coherency in situations where read-after-write dependencies occur.

In one embodiment, each of the functional units 212 may be configured to perform integer arithmetic operations of addition and subtraction, as well as shifts, rotates, logical operations, and branch operations. A floating point unit (not shown) may also be used to accommodate floating point operations.

Each of the functional units 212 may also provide information regarding the execution of conditional branch instructions to the branch prediction unit 220. If a branch prediction was incorrect, branch prediction unit 220 flushes instructions subsequent to the mispredicted branch that have entered the instruction processing pipeline and redirects prefetch/predecode unit 202. The redirected prefetch/predecode unit 302 may then begin fetching the correct set of instructions from instruction cache 204 or main memory. In such situations, the results of instructions in the original program sequence that occurred after the mispredicted branch instruction may be discarded, including those which were speculatively executed and temporarily stored in load/store unit 222 and reorder buffer 216.

Results produced by functional units 212 may be sent to the reorder buffer 216 if a register value is being updated or to the load/store unit 222 if the contents of a memory location are being changed. If the result is to be stored in a register, the reorder buffer 216 may store the result in the location that was reserved for the value of the register when the instruction was decoded. As stated previously, results may also be broadcast to reservation station units 210A-210D where pending instructions may be waiting for the results of previous instruction executions to obtain the required operand values.

Generally speaking, load/store unit 222 provides an interface between functional units 212A-212D and data cache 224. In one embodiment, load/store unit 222 may be configured with a load/store buffer with several storage locations for data and address information for pending loads or stores. Decode units 208 arbitrate for access to the load/store unit 222. If the buffer is full, a decode unit may wait until the load/store unit 222 has room for the pending load or store request information. The load/store unit 222 may also perform dependency checking for load instructions against pending store instructions to ensure that data coherency is maintained.

Data cache 224 is a high speed cache memory provided to temporarily store data being transferred between load/store unit 222 and the main memory subsystem. Like the instruction cache 204 described above, the data cache 224 may be implemented in a variety of specific memory configurations, including a set associative configuration. Additionally, data cache 224 and instruction cache 204 may be implemented in a unified cache in some embodiments.

FIGS. 2-4—Return Address Prediction Mechanism

Turning now to FIG. 2, one embodiment of a return prediction unit 250 is shown. In some embodiments of microprocessor 200, return prediction unit 250 may be included within branch prediction unit 220. Generally, return prediction unit 250 may be configured to provide return address predictions for return instructions as they are detected. In embodiments of microprocessor 200 employing the x86 microprocessor architecture, return instructions include the RET instruction and the IRET instruction. Generally, the term "return instruction" may refer to any instruction that causes a subroutine to return to a calling program.

Return prediction unit 250 stores return addresses associated with call instructions within return storage 252, and predicts the return address associated with a return instruction based on the stored return addresses. It is noted that in embodiments of microprocessor 200 employing the x86 microprocessor architecture, call instructions include the CALL instruction and the INT instruction.

It is noted that return storage 252 may employ multiple registers as its storage locations, or the storage locations may be rows of a storage array. Each entry in the return storage 252 may include a return address (e.g., a return program counter (PC)) and a count. The return address is the return address associated with the call instruction associated with the return stack entry. The count indicates how many times a particular return address should be provided as a predicted return address. For example, many recursive subroutines reuse the same call site. In a typical return stack, each time a recursive subroutine calls itself, a new entry is pushed onto the return stack. In contrast, instead of allocating a new entry for each recursive call, the return prediction unit 250 may increase the count associated with the return address for the recursive subroutine each time a recursive call (i.e., a call that has the same predicted return address as the previously detected call) is detected. Including a count in a return storage entry may increase the effective size of the return storage 252 if nested subroutine calls are encountered. Increasing the effective size of the return storage 252 may in turn decrease the likelihood of the return storage 252 overflowing.

In many embodiments, the return storage 252 may be implemented as a stack structure. Generally, a stack structure is a Last-In, First-Out (LIFO) structure in which values are placed on the stack in a certain order and are removed from the stack in the reverse order. Therefore, the top of the stack contains the last item placed on the stack. The action of placing a value on the stack is known as a "push," and requesting that a push be performed is a "push command." Similarly, the action of removing a value from the stack is referred to as a "pop," and requesting that a pop be performed is a "pop command." When a push command is performed, the pointer to the top of the stack may be decremented by the size (in bytes) of the value specified by the push command. The value is then stored at the address pointed to by the decremented pointer value. When a pop command is performed, a number of bytes specified by the pop command are copied from the top of the stack to a destination specified by the pop command, and then the pointer to the top of the stack may be incremented by the number of bytes.

In some embodiments, the return storage 252 may be implemented as a linked list structure. A pointer may indicate which entry should be used to provide the next predicted return address, and each entry may maintain one or more pointers to the next entries that should be used to predict return addresses. In one embodiment, each entry may include several pointers that each correspond to a value of that entry's count. For example, if an entry being used to provide a predicted return address has a current count of three (e.g., indicating that the entry should be used to provide a return address three times), the entry may be used to provide the return address and the count may be decremented to two. The third pointer in that entry may be used to identify the next entry in the list, and thus that pointer may now be used as the pointer to the first entry of the linked list. Similarly, if that entry is later used to provide a predicted return address again, its count may be decremented to one and the second pointer in that entry may identify the next entry to be used. In such an embodiment, nonconsecutive subroutine calls that use the same return address may be stored together in a single entry, increasing the effective size of return storage 252. Generally, the larger the size of the return addresses, the more such an embodiment may increase the effective size of return storage 252. In alternative embodiments, the order in which the return storage entries should be used to predict return addresses may be maintained in a separate stack structure whose entries each store tags that identify entries in the return storage 252. The order of the tags in this other stack structure may indicate the order in which the entries in return storage 252 should be accessed.

Return prediction unit 250 may also include a return control unit 254, an adder circuit 256, a multiplexer 258, and a comparator block 260. Return control unit 254 is configured to control the storage of data within return storage 252. Adder circuit 256 and multiplexer 258 may be configured to generate a return address for a particular call instruction during a clock cycle in which the call instruction is detected. Comparator block 260 compares the value of the return address generated by adder circuit 256 to one or more of the addresses already stored in return storage 252 in order to determine whether to increase a particular entry's count or to allocate a new entry when a call instruction is detected by the instruction cache 204 or decode units 208.

Return control unit 254 may be coupled to return storage 252 via a data bus 262, a pointer bus 264, and a counter bus 276. Data bus 262 allows the reading and writing of data into each of the storage locations (or entries) within return storage 252. Control signals conveyed along with the data upon data bus 262 may indicate a read or write operation as well as which entry or entries are selected for the indicated operation. Pointer bus 264 may convey a pointer indicative of the "top" or "first" entry or storage location in return storage 252. The top or first entry of return storage 252 may be the entry within return storage 252 that contains the most recently allocated call instruction data. In some embodiments, the entry above the entry indicated by pointer 264 may not contain valid data. The count bus 276 may allow the control of the count in each return entry. Control signals transmitted along the count bus may be used to identify one or more of the entries and to cause the count(s) associated with each selected entry to be increased, decreased, cleared, etc. For example, if the count is maintained in a counter, the control signals may cause the counter to be incremented, decremented, or reset. Similarly, if the count is maintained in a shift register, the control signals may cause bits to be shifted into or out of the shift register.

Additionally, return control unit 254 receives a plurality of buses from other units within microprocessor 200. A call bus 266 and a return bus 268 may convey call and return signals from branch prediction unit 220. When asserted, the call and return signals indicate that a call and return instruction (respectively) has been detected by branch prediction unit 220. Call and return instructions may be stored as predicted branches within the branch prediction structure, along with an indication that the instruction is a call or return instruction. The branch prediction structure may include a plurality of storage locations indexed by the instruction fetch address. Branches and call instructions may be detected according to information stored in each entry and predicted according to that information. If a call instruction is detected within a set of instructions fetched by instruction cache 204, then the call signal may be asserted to return prediction unit 250. Similarly, if a return instruction is detected within a set of instructions fetched by instruction cache 204, then the return signal may be asserted to return prediction unit 250.

Upon receipt of the asserted call signal from branch prediction unit 220, return control unit 254 may either allocate an entry to the call instruction or, if the predicted return address for that call instruction is the same as a return address already contained in the return storage 252, increase the count associated with that entry. For example, if the return storage 252 is implemented as a stack, the new return address may be compared to the return address at the top of the stack. If the new return address equals the return address at the top of the stack, the count associated with the top of the stack may be increased instead of allocating a new entry at the top of the stack.

The return address may be calculated by adder circuit 256 from the offset of the call instruction within the fetched line (the offset is transferred to return control unit 254 from branch prediction unit 220 upon call bus 266) and from the address being fetched (transferred to multiplexer 258 from instruction cache 204). Multiplexer 258 may be controlled by return control unit 254 to select the address conveyed by instruction cache 204 in this case. Comparator block 260 compares the return address calculated by the adder circuit 256 to the return address contained in one or more of the entries in the return storage 252 (e.g., in a return stack embodiment, the comparator block 260 may compare the new return address to the return address contained in the top entry in the stack). If the two return addresses are the same, comparator block 260 provides a signal to the return control unit 254 indicating their equality and the return control unit 254 asserts a signal on the count bus 276 that increases the count associated with the entry that already contains a copy of that return address. If the two addresses are not the same, return control unit 254 may allocate a new entry in the return storage 252. The allocated entry is determined according to the pointer on pointer bus 264, and the entry becomes the top or first entry in the return storage 252. The return address calculated by the adder circuit 256 is stored into the allocated entry within return storage 252. Additionally, the count associated with the new entry may be initialized to a value indicating that the return address should be provided one time before being removed from the return storage 252.

In one embodiment, if the count associated with a return address reaches its maximum value (e.g., the largest value the count is capable of indicating), the control unit 254 may be configured to allocate a new entry the next time a call instruction that has that return address is detected.

Upon receipt of the asserted return signal from branch prediction unit 220, return control unit 254 predicts a return address for the return instruction. The return address may be predicted to be the return address stored within the entry nearest the top or first entry in the return storage 252 (as indicated by the pointer upon pointer bus 264). The selected return address may be conveyed upon return address prediction bus 270 to instruction cache 204 for use in fetching subsequent instructions.

The number of times a particular return address is provided as a predicted return address depends on the value of the count associated with the entry storing that return address. If the count indicates that the return address should be provided once, the return address's entry may be removed from the return storage the first time that return address is used as a predicted return address. On the other hand, if the count indicates that the return address should be provided several times, the control unit 254 may decrease the count each time that return address is provided instead of removing its entry from the return storage 252. The control unit may not remove the return address from the return storage 252 until its count indicates that it has been provided the specified number of times.

Looking back at FIG. 1, it is noted that the branch prediction structure within branch prediction unit 220 is a speculative structure that may not store an indication for each call or return instruction that may be encountered. In one embodiment, an indication of whether or not a particular call or return instruction was detected by branch prediction unit 220 may be conveyed with each call and return instruction through the instruction processing pipeline of microprocessor 200. When a decode unit 208 decodes a call or return instruction and the associated indication signifies that the instruction was not detected by branch prediction unit 220, then the decode unit may assert a call signal upon a decode call bus 272 or a decode return bus 274 to return prediction unit 250. During a clock cycle in which these signals are asserted, the instruction fetching portion of the instruction processing pipeline within instruction cache 204 and instruction alignment unit 206 may be stalled.

When decode units 208 decode a call or return instruction not detected by branch prediction unit 220, return control unit 254 may allocate or remove an entry or increase or decrease a count for the detected call or return instruction just as it would for call or return instructions detected by branch prediction unit 220. Multiplexer 258 may be directed to accept the PC address from decode units 208 in the case of a call instruction so that the return address for the instruction may be calculated. The instruction offset may be calculated according to the particular decode unit 208 which decodes the call or return instruction. The offset may be conveyed by that decode unit to return control unit 254.

Entries may be removed from the return storage 252 in several ways. In one embodiment, the contents of storage locations within return storage 252 between the storage location indicated by pointer bus 264 and the storage location just before the one being deleted may be copied to the next lower storage location. In other words, storage locations between the storage location indicated by pointer bus and the storage location before the one being deleted may be shifted down by one location. The storage location is thereby overwritten by another entry and is deleted from return storage 252.

Because microprocessor 200 is configured to speculatively execute instructions out of order, branch mispredictions may indicate that portions of return storage 252 are storing incorrect information. As used herein, the term "mispredicted branch" refers to a branch, call, or return instruction for which the target address has been mispredicted. A branch may be mispredicted because the speculatively generated target address is incorrect. Additionally, a branch may be mispredicted because it was predicted to be taken (i.e., the next instruction to be executed resides at the target address) and the branch is found to be not taken. Alternatively, a branch may be mispredicted because it was predicted to be not taken (i.e., the next instruction to be executed resides in memory contiguous to the branch instruction) and the branch is found to be taken.

When a branch instruction is mispredicted, information associated with instructions within the instruction processing pipeline that are subsequent to the mispredicted branch may be incorrect. As noted above, the instructions subsequent to the mispredicted branch are flushed from the pipeline. Return storage 252 may be purged of information related to the flushed instructions. In some embodiments, purging information from the return storage 252 may simply involve invalidating all of the entries in the return storage 252.

In other embodiments, return prediction unit 250 may be configured to flush information from the return storage 252 without invalidating the entire storage. For example, in order to recover the contents of return storage 252 upon detection of a mispredicted branch, the return prediction unit 250 may employ a checkpointing mechanism to save certain contents of the return storage 252 before they are modified by the calls and returns in a predicted control path.

In some embodiments, saving the contents of the return storage 252 may involve saving a copy of the current value of the pointer to the top or first entry in the return storage 252 and/or a copy of the return address stored in that entry. In a return stack embodiment of return storage 252, merely saving the current value of the pointer to the top or first entry may allow the return storage 252 to be restored as long as more call instructions than return instructions are detected in the mispredicted path (e.g., as long as more entries are pushed onto the stack than are popped off of the stack). However, if more entries are popped from the return stack than are pushed onto the return stack, the mispredicted path may corrupt the return storage 252. Additional protection against corruption may be provided by saving the value of the return address currently in the top or first entry whenever a branch prediction is detected.

Additionally, the current value of the count associated with the top or first entry may be saved as part of the checkpointing process. This prevents a mispredicted path from corrupting the count associated with that entry. So long as the mispredicted path does not alter any entries below or after the checkpointed entry, the return storage 252 may not be corrupted by the mispredicted path.

Thus, if portions of the return storage 252 are checkpointed when a branch prediction is made and the prediction turns out to be incorrect, the saved contents of the stack may be restored. For example, a misprediction signal upon branch misprediction conductor 278 may indicate that a mispredicted branch has been detected. Depending on the embodiment, the misprediction signal may be conveyed from branch prediction unit 220, reorder buffer 216, or functional unit 212 (note that in some embodiments, only one of these components may be configured to convey the misprediction signal to the return prediction unit 250). In response to receiving the misprediction signal, the control unit 254 may restore the saved contents of the return storage 252. As described above, this may effectively undo the effects of the speculation on the return storage 252 (depending on the modifications made to the return storage 252 in response to the call and return instructions detected along the mispredicted path). If the prediction is validated, the saved checkpointing information may be discarded.

It is noted that different embodiments may support numbers of outstanding branch predictions at any given time. For example, in one embodiment, the return prediction unit 250 may be able to checkpoint for several outstanding branch predictions, while another embodiment may only provide checkpointing for a single outstanding branch prediction. Also, some embodiments may provide additional repair mechanisms such as branch tags and validity indications. Branch tags may track which new entries were allocated in the return storage 252 or which preexisting entries were modified during any particular predicted path. Validity bits may allow "deleted" entries to be saved until the branch prediction resolves. This may provide additional protection against return storage corruption caused by a mispredicted path.

Instead of speculatively updating the return storage 252 (e.g., by updating before a branch prediction has resolved), the contents of the return storage 252 may be updated non-speculatively at retire time in some embodiments. Since the return storage is not speculatively updated, there may be no need to provide checkpointing. However, situations may arise where a call instruction is detected along a predicted path and the corresponding return instruction is encountered before the return storage has been updated based on the call instruction.

As noted earlier, in some embodiments, the entire return storage 252 may be invalidated when a mispredicted return instruction is detected. An additional situation where the entire storage may be invalidated is if the contents of instruction cache 204 are invalidated. For example, instruction cache 204 may be invalidated due to a task switch. Since the contents of return storage 252 may be invalid for the new task, the return storage's entries may be invalidated. The contents of the return storage 252 may also be invalidated if the return prediction unit 250 incorrectly predicts a return address (as determined when the return instruction for which the prediction was made completes execution).

Return storage 252 includes a finite number of entries, and may therefore become full before any entries are deleted. When a call instruction is detected and return storage 252 is full of valid entries, the oldest entry (e.g., the entry stored at the "bottom" of the stack or the end of a linked list) may be deleted. In one embodiment, the pointer upon pointer bus 254 may wrap around to the oldest storage location within return storage 252 and that location may be allocated to the newly detected call instruction. In another embodiment, all storage locations within return storage 252 may be shifted down one location (similar to when an entry is deleted due to return instruction retirement) and the top or first storage location may be allocated to the new return instruction. The pointer upon pointer bus 264 may be unmodified in this embodiment.

It is noted that certain instructions within various microprocessor architectures may generate an "exception." Like an interrupt, an exception causes program execution to jump to an exception handling routine. If an instruction generates an exception, the return storage 252 may be checkpointed as described above. It is further noted that, when multiple call and/or return instructions are detected simultaneously by decode units 208, return control unit 254 may be configured to select the first instruction detected in program order. The other call and/or return instructions may be purg


Free Web Sudoku Puzzles.
Solve with your browser.
    1     9 8 5  
      5       4  
3   6   2   1    
        4       9
    4       2    
6       1        
    5   6   4   1
  6       3      
  9 2 4     6    
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!