Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Motor driving apparatus
Patent Number: 7,151,348 Issued on 12/19/2006 to Ueda,   et al.

Title: Process for manufacturing a film-type packaging material
Patent Number: 7,144,603 Issued on 12/05/2006 to Nageli,   et al.

Title: Method of removing PECVD residues of fluorinated plasma using in-situ H.sub.2 plasma
Patent Number: 7,150,796 Issued on 12/19/2006 to Smith,   et al.

Title: Fan speed control
Patent Number: 7,151,349 Issued on 12/19/2006 to Williamson,   et al.

Title: Curable coating composition and coating film forming method
Patent Number: 7,144,605 Issued on 12/05/2006 to Kanakura,   et al.

Title: Storage and recovery of data based on change in MIS transistor characteristics
Patent Number: 7,149,104 Issued on 12/12/2006 to Horiuchi

Title: Method and apparatus for controlling visual enhancement of luminent devices
Patent Number: 7,151,345 Issued on 12/19/2006 to Sanchez

Title: Method of treatment using electroporation mediated delivery of drugs and genes
Patent Number: 6,763,264 Issued on 07/13/2004 to Hofmann

Title: Server module for modularly designed server
Patent Number: 7,143,183 Issued on 11/28/2006 to Nie

Title: Percutaneous entry system and method
Patent Number: 6,761,725 Issued on 07/13/2004 to Grayzel,   et al.

Title: Single transistor vertical memory gain cell
Patent Number: 7,149,109 Issued on 12/12/2006 to Forbes

Title: Electric-motored floor-surface polisher
Patent Number: 7,155,768 Issued on 01/02/2007 to Morita,   et al.

Title: Audio frequency response processing system
Patent Number: 7,152,082 Issued on 12/19/2006 to McGrath

Title: Fish hook setting device
Patent Number: 7,152,360 Issued on 12/26/2006 to Neufeld

Title: Method, apparatus, program and recording medium for memory access serialization and lock management
Patent Number: 6,938,131 Issued on 08/30/2005 to Ogasawara

Title: Composition having antibacterial and antifungal properties
Patent Number: 7,144,921 Issued on 12/05/2006 to Bhattacharyya,   et al.

Title: Securable temporary manhole cover
Patent Number: 7,153,057 Issued on 12/26/2006 to Lucas

Title: Coarse frequency detector system and method thereof
Patent Number: 7,145,398 Issued on 12/05/2006 to Dalton,   et al.

Title: Phase alignment transmit diversity system for radio communications systems
Patent Number: 6,763,225 Issued on 07/13/2004 to Farmine,   et al.

Title: Nonvolatile memory device with multi-bit memory cells having plural side gates
Patent Number: 6,936,888 Issued on 08/30/2005 to Katayama,   et al.

Title: Curved belt conveyor
Patent Number: 7,150,352 Issued on 12/19/2006 to Cotter,   et al.

Title: Messaging protocol for interactive delivery system
Patent Number: 7,146,628 Issued on 12/05/2006 to Gordon,   et al.

Title: Sealed battery and method for manufacturing sealed battery
Patent Number: 7,150,936 Issued on 12/19/2006 to Tukawaki,   et al.

Title: Workover unit and method of utilizing same
Patent Number: 7,150,327 Issued on 12/19/2006 to Surjaatmadja

Title: Rotating angle detector and apparatus thereof for detecting the rotating position of a rotor
Patent Number: 6,937,008 Issued on 08/30/2005 to Matsuzaki,   et al.

Title: Method and apparatus for concurrent engineering and design synchronization of multiple tools
Patent Number: 7,143,341 Issued on 11/28/2006 to Kohli

Title: Power supply system and operating method thereof
Patent Number: 7,150,930 Issued on 12/19/2006 to Tanaka

Title: Current driver and display device
Patent Number: 7,145,379 Issued on 12/05/2006 to Date,   et al.

Title: Crystal structure of yqeJ and uses thereof
Patent Number: 7,155,346 Issued on 12/26/2006 to Olland,   et al.

Title: Filtering apparatus of circulating flush toilet
Patent Number: 7,155,750 Issued on 01/02/2007 to Imaizumi

Title: Color image processing method and color image processing apparatus
Patent Number: 7,142,710 Issued on 11/28/2006 to Hung

Title: Method for manufacturing in-plane lattice constant adjusting substrate and in-plane lattice constant adjusting substrate
Patent Number: 7,150,788 Issued on 12/19/2006 to Koinuma,   et al.

Title: Device for storing and transporting unit loads
Patent Number: 7,150,374 Issued on 12/19/2006 to Camps

Title: Method for safe handling of unstable hydride gases
Patent Number: 7,150,353 Issued on 12/19/2006 to Lord

Title: Assembly system for monitoring proper fastening of an article of assembly at more than one location
Patent Number: 6,763,573 Issued on 07/20/2004 to Walt, II,   et al.

Title: Chemical amplification type positive resist composition
Patent Number: 6,893,794 Issued on 05/17/2005 to Akita,   et al.

Title: Air venting apparatus for milk bottle
Patent Number: 7,150,370 Issued on 12/19/2006 to Pyun

Title: Fuel cell gas separator
Patent Number: 7,150,931 Issued on 12/19/2006 to Jaffrey

Title: Chip structure and process for forming the same
Patent Number: 6,762,115 Issued on 07/13/2004 to Lin,   et al.

Title: Flip chip dip coating encapsulant
Patent Number: 7,150,390 Issued on 12/19/2006 to Johnson,   et al.

Title: Fatty acid-free liquid dye composition comprising at least one oxidation base and 2-methyl-1, 3-propanediol, dyeing process, and device
Patent Number: 7,150,765 Issued on 12/19/2006 to Desenne

Title: Optical disc drive having a function of preventing an optical disc from being pushed down by turbulence in the air flow generated by the rotation of the optical disc
Patent Number: 7,155,731 Issued on 12/26/2006 to Manabe,   et al.

Title: Method of operating a navigation system
Patent Number: 7,149,626 Issued on 12/12/2006 to Devries,   et al.

Title: Golf ball with varying land surfaces
Patent Number: 6,884,183 Issued on 04/26/2005 to Sullivan

Title: Information terminal device
Patent Number: 7,146,559 Issued on 12/05/2006 to Sakuma

Title: Image processing system to control vehicle headlamps or other vehicle equipment
Patent Number: 7,149,613 Issued on 12/12/2006 to Stam,   et al.

Title: Auto-extending/retracting electrically isolated conductors in a segmented drill string
Patent Number: 7,150,329 Issued on 12/19/2006 to Chau

Title: Method, apparatus, and system for implementing view caching in a framework to support web-based applications
Patent Number: 7,146,617 Issued on 12/05/2006 to Mukundan,   et al.

Title: Analogues of camptothecin, their use as medicaments and the pharmaceutical compositions containing them
Patent Number: 6,762,301 Issued on 07/13/2004 to Bigg,   et al.

Title: Microcomputer chips with interconnected address and data paths
Patent Number: 6,757,759 Issued on 06/29/2004 to Jones,   et al.

Title: Writing insert with non-destructive final inspection
Patent Number: 7,147,394 Issued on 12/12/2006 to Mock

Title: Precision circle center finder and multifunctional construction trade tool
Patent Number: 7,162,808 Issued on 01/16/2007 to Martin

Title: Multi-mechanistic accommodating intraocular lenses
Patent Number: 7,150,759 Issued on 12/19/2006 to Paul,   et al.

Title: Polycarbosilane adhesion promoters for low dielectric constant polymeric materials
Patent Number: 6,761,975 Issued on 07/13/2004 to Chen,   et al.

Title: Decorative sheet and decorative material
Patent Number: 6,761,979 Issued on 07/13/2004 to Yokochi,   et al.

Title: Preparation and use of mixed mode solid substrates for chromatography adsorbents and biochip arrays
Patent Number: 7,144,743 Issued on 12/05/2006 to Boschetti,   et al.

Title: Water-borne polymeric complex and anti-corrosive composition
Patent Number: 6,762,238 Issued on 07/13/2004 to Yang,   et al.

Title: Bi-directional ball seat system and method
Patent Number: 7,150,326 Issued on 12/19/2006 to Bishop,   et al.

Title: Extensible stylesheet designs using meta-tag and/or associated meta-tag information
Patent Number: 7,146,564 Issued on 12/05/2006 to Kim,   et al.

Title: Method and system for dynamic display of marketing campaigns on display locations via a network
Patent Number: 7,146,567 Issued on 12/05/2006 to Duczmal,   et al.

Title: Method for interconnecting adjacent expandable pipes
Patent Number: 7,150,328 Issued on 12/19/2006 to Marketz,   et al.

Title: Polyurethane solutions containing alkoxysilane structural units
Patent Number: 6,762,241 Issued on 07/13/2004 to Blum,   et al.

Title: Aluminum-free borosilicate glass and applications thereof
Patent Number: 7,144,835 Issued on 12/05/2006 to Kass,   et al.

Title: Real-time distribution of imaging metrics information
Patent Number: 7,158,252 Issued on 01/02/2007 to Kunz

Title: Driving-force distribution control system for four-wheel-drive vehicles
Patent Number: 7,151,991 Issued on 12/19/2006 to Iida

Title: Non-Hazardous oxidative neutralization of aldehydes
Patent Number: 7,145,043 Issued on 12/05/2006 to Zhu

Title: Treatment of substrates to enhance the quality of printed images thereon using azetidinium and/or guanidine polymers
Patent Number: 6,761,977 Issued on 07/13/2004 to Nigam

Title: Hub drive and method of using same
Patent Number: 7,150,340 Issued on 12/19/2006 to Beck,   et al.

Title: Planar light source device and liquid-crystal display device
Patent Number: 7,004,612 Issued on 02/28/2006 to Takahashi,   et al.

Title: Electrodialysis method and apparatus for trace metal analysis
Patent Number: 7,144,735 Issued on 12/05/2006 to Saini

Title: Structured data communication with backwards compatibility
Patent Number: 7,146,556 Issued on 12/05/2006 to Hardie,   et al.

Title: Method of sampling from a multiphase fluid mixture, and associated sampling apparatus
Patent Number: 7,024,951 Issued on 04/11/2006 to Germond

Title: Multi-unit pyrotechnic initiation system
Patent Number: 6,763,764 Issued on 07/20/2004 to Avetisian,   et al.

Title: Two-layer electrical substrate for optical devices
Patent Number: 6,765,275 Issued on 07/20/2004 to Pendse,   et al.

Title: Plural layer woven electronic textile, article and method
Patent Number: 7,144,830 Issued on 12/05/2006 to Hill,   et al.

Reducing number of rejected snoop requests by extending time to respond to snoop request Number:7,386,681 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Reducing number of rejected snoop requests by extending time to respond to snoop request

Abstract: A cache, system and method for reducing the number of rejected snoop requests. A "stall/reorder unit" in a cache receives a snoop request from an interconnect. The snoop request is entered in the first available latch of the stall/reorder unit unless the stall/reorder unit is full in which case the new snoop request is transmitted to a second unit configured to transmit a request to retry resending the new snoop request. Snoop requests have a higher priority than requests from processors and snoop requests are selected by the arbitration mechanism over processor requests unless the arbitration mechanism requests otherwise ("stall request") to the stall/reorder unit. By snoop requests having a higher priority than processor requests, the number of snoop requests rejected is reduced. By having the arbitration mechanism issue a stall request, the processor will not be starved.

Patent Number: 7,386,681 Issued on 06/10/2008 to Guthrie,   et al.


Inventors: Guthrie; Guy L. (Austin, TX), Shen; Hugh (Austin, TX), Starke; William J. (Round Rock, TX), Williams; Derek E. (Austin, TX)
Assignee: International Business Machines Corporation (Armonk, NY)
Appl. No.: 11/056,679
Filed: February 11, 2005


Current U.S. Class: 711/146 ; 710/107; 710/109; 710/112; 711/141; 711/142; 711/143
Field of Search: 711/141-143,146 710/107,109,112


References Cited [Referenced By]

U.S. Patent Documents
4961136 October 1990 Sato
5426765 June 1995 Stevens et al.
5559988 September 1996 Durante et al.
5572703 November 1996 MacWilliams et al.
5673414 September 1997 Amini et al.
5737758 April 1998 Merchant
5774700 June 1998 Fisch et al.
5996036 November 1999 Kelly
6192458 February 2001 Arimilli et al.
6263389 July 2001 LaBerge
6470427 October 2002 Arimilli et al.
6473833 October 2002 Arimilli et al.
6487643 November 2002 Khare et al.
6587923 July 2003 Benveniste et al.
6615325 September 2003 Mailloux et al.
6779086 August 2004 Arimilli et al.
6848023 January 2005 Teramoto
2003/0131203 July 2003 Berg et al.
2004/0068623 April 2004 Augsburg et al.
Primary Examiner: Shah; Sanjiv
Assistant Examiner: Savla; Arpan
Attorney, Agent or Firm: Roberts-Gerhardt; Diana L. Voigt, Jr.; Robert A. Winstead, P.C.

Claims



The invention claimed is:

1. A cache, comprising: a stall/reorder unit configured to receive snoop requests and store information about said snoop requests in a plurality of queues; a plurality of latches, wherein said plurality of latches store clock cycle information indicative of durations that information about said snoop requests reside in associated queues of said plurality of queues; an arbiter coupled to said stall/reorder unit, wherein said arbiter selects a processor request or a snoop request to be serviced, wherein if said arbiter selects said processor request, then information about said snoop request which was not selected for servicing is maintained in an associated queue of said plurality of queues for a next clock cycle; and a realignment unit coupled to said stall/reorder unit, wherein if said arbiter selects said snoop request to be serviced, then said realignment unit receives one or more counter bits associated with said selected snoop request that indicates a number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues; wherein if said one or more counter bits indicates that said number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues is less than n clock cycles, then said realignment unit stores a result for said selected snoop request in a queue for said n clock cycles minus said number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues, wherein n is a positive integer.

2. The cache as recited in claim 1, wherein said stall/reorder unit further comprises: a control unit, wherein said control unit increments a counter in a latch associated with said snoop request not selected thereby indicating that information about said snoop request not selected will have resided in said associated queue of said plurality of queues for an additional period of time.

3. The cache as recited in claim 1, wherein said realignment unit transmits said result for said selected snoop request to an interconnect.

4. The cache as recited in claim 1, wherein said plurality of queues and said plurality of latches reside within said stall/reorder unit.

5. A system, comprising: a processor; and a cache coupled to said processor, wherein said cache comprises: a stall/reorder unit configured to receive snoop requests and store information about said snoop requests in a plurality of queues; a plurality of latches, wherein said plurality of latches store clock cycle information indicative of durations that information about said snoop requests reside in associated queues of said plurality of queues; an arbiter coupled to said stall/reorder unit, wherein said arbiter selects a processor request or a snoop request to be serviced, wherein if said arbiter selects said processor request, then information about said snoop request which was not selected for servicing is maintained in an associated queue of said plurality of queues for a next clock cycle; and a realignment unit coupled to said stall/reorder unit, wherein if said arbiter selects said snoop request to be serviced, then said realignment unit receives one or more counter bits associated with said selected snoop request that indicates a number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues; wherein if said one or more counter bits indicates that said number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues is less than n clock cycles, then said realignment unit stores a result for said selected snoop request in a queue for said n clock cycles minus said number of clock cycles that information about said selected snoop request resided in said associated queue of said plurality of queues, wherein n is a positive integer.

6. The system as recited in claim 5, wherein said stall/reorder unit further comprises: a control unit, wherein said control unit increments a counter in a latch associated with said snoop request not selected thereby indicating that information about said snoop request not selected will have resided in said associated queue of said plurality of queues for an additional period of time.

7. The system as recited in claim 5, wherein said realignment unit transmits said result for said selected snoop request to an interconnect.

8. The system as recited in claim 5, wherein said plurality of queues and said plurality of latches reside within said stall/reorder unit.
Description



TECHNICAL FIELD

The present invention relates to the field of caches in a multiprocessor system, and more particularly to reducing the number of rejected snoop requests by extending the time to respond to snoop requests.

BACKGROUND INFORMATION

A multiprocessor system may comprise multiple processors coupled to a common shared system memory. The multiprocessor system may further include one or more levels of cache associated with each processor. A cache includes a relatively small, high speed memory ("cache memory") that contains a copy of information from one or more portions of the system memory. A Level-1 (L1) cache or primary cache may be built into the integrated circuit of the processor. The processor may be associated with additional levels of cache, such as a Level-2 (L2) cache and a Level-3 (L3) cache. These lower level caches, e.g., L2, L3, may be employed to stage data to the L1 cache and typically have progressively larger storage capacities but longer access latencies.

The cache memory may be organized as a collection of spatially mapped, fixed size storage region pools commonly referred to as "congruence classes." Each of these storage region pools typically comprises one or more storage regions of fixed granularity. These storage regions may be freely associated with any equally granular storage region in the system as long as the storage region spatially maps to a congruence class. The position of the storage region within the pool may be referred to as the "set." The intersection of each congruence class and set contains a cache line. The size of the storage granule may be referred to as the "cache line size." A unique tag may be derived from an address of a given storage granule to indicate its residency in a given congruence class and set.

When a processor generates a read request and the requested data resides in its cache memory (e.g., cache memory of L1 cache), then a cache read hit takes place. The processor may then obtain the data from the cache memory without having to access the system memory. If the data is not in the cache memory, then a cache read miss occurs. The memory request may be forwarded to the system memory and the data may subsequently be retrieved from the system memory as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from the system memory may be provided to the processor and may also be written into the cache memory due to the statistical likelihood that this data will be requested again by that processor. Likewise, if a processor generates a write request, the write data may be written to the cache memory without having to access the system memory over the system bus.

Hence, data may be stored in multiple locations. For example, data may be stored in a cache of a particular processor as well as in system memory. If a processor altered the contents of a system memory location that is duplicated in its cache memory (e.g., cache memory of L1 cache), the cache memory may be said to hold "modified" data. The system memory may be said to hold "stale" or invalid data. Problems may result if another processor (other than the processor whose cache memory is said to hold "modified" data) or bus agent, e.g., Direct Memory Access (DMA) controller, inadvertently obtained this "stale" or invalid data from system memory. Subsequently, it is required that the other processors or other bus agents are provided the most recent copy of data from either the system memory or cache memory where the data resides. This may commonly be referred to as "maintaining cache coherency." In order to maintain cache coherency, therefore, it may be necessary to monitor the system bus to see if another processor or bus agent accesses cacheable system memory. This method of monitoring the system bus is referred to in the art as "snooping."

Each cache may be associated with logic circuitry commonly referred to as a "snoop controller" configured to monitor the system bus for the snoopable addresses requested by a different processor or other bus agent. Snoopable addresses may refer to the addresses requested by the other processor or bus agent that are to be snooped by snoop controllers on the system bus. Snoop controllers may snoop these snoopable addresses to determine if copies of the snoopable addresses requested by the other processor or bus agent are within their associated cache memories using a protocol commonly referred to as Modified, Exclusive, Shared and Invalid (MESI). In the MESI protocol, an indication of a coherency state is stored in association with each unit of storage in the cache memory. This unit of storage may commonly be referred to as a "coherency granule". A "cache line" may be the size of one or more coherency granules. In the MESI protocol, the indication of the coherency state for each coherency granule in the cache memory may be stored in a cache state directory in the cache subsystem. Each coherency granule may have one of four coherency states: modified (M), exclusive (E), shared (S), or invalid (I), which may be indicated by two or more bits in the cache state directory. The modified state indicates that a coherency granule is valid only in the cache memory containing the modified or updated coherency granule and that the value of the updated coherency granule has not been written to system memory. When a coherency granule is indicated as exclusive, the coherency granule is resident in only the cache memory having the coherency granule in the exclusive state. However, the data in the exclusive state is consistent with system memory. If a coherency granule is marked as shared, the coherency granule is resident in the associated cache memory and may be in one or more cache memories in addition to the system memory. If the coherency granule is marked as shared, all of the copies of the coherency granule in all the cache memories so marked are consistent with the system memory. Finally, the invalid state may indicate that the data and the address tag associated with the coherency granule are both invalid and thus are not contained within that cache memory.

To determine whether a "cache hit" or a "cache miss" occurred from an address requested by the processor or whether a copy of a snoopable address requested by another processor or bus agent is within the cache memory, there may be logic in the cache to search what is referred to as a "cache directory". The cache directory may be searched using a portion of the bits in the snoopable address or the address requested by the processor. The cache directory, as mentioned above, stores the coherency state for each coherency granule in the cache memory. The cache directory also stores a unique tag used to indicate whether data from a particular address is stored in the cache memory. This unique tag may be compared with particular bits from the snoopable address and the address requested by the processor. If there is a match, then the data contained at the requested address lies within the cache memory. Hence, the cache directory may be searched to determine if the data contained at the requested or snoopable address lies within the cache memory.

An example of a processor associated with multiple levels of caches incorporating the above-mentioned concepts is described below in association with FIG. 1. Referring to FIG. 1, FIG. 1 illustrates a processor 101 coupled to an L2 cache 102 which is coupled to an L3 cache 103. Processor 101, L2 cache 102 and L3 cache 103 may be implemented on an integrated circuit 104. L3 cache 103 may include a multiplexer 105 configured to receive requests from processor 101, such as a read or write request described above, as well as the snoopable address via an interconnect 106. Interconnect 106 is connected to a system bus (not shown) which is connected to other processors (not shown) or bus agents (not shown). An arbitration mechanism 107 may determine which of the two requests (requests from interconnect 106 and from processor 101) gets serviced. The selected request is dispatched into a dispatch pipeline 108. If the snoop request is not selected, it may be sent on a bypass pipeline 113. Bypass pipeline 113 may be configured to indicate to interconnect 106 to retry resending the snoop request that was denied.

Dispatch pipeline 108 is coupled to a cache directory 109. Dispatch pipeline 108 may contain logic configured to determine if the data at the requested address lies within a cache memory 114 of L3 cache 103. Dispatch pipeline 108 may determine if the data at the requested address lies within cache memory 114 by comparing the tag values in cache directory 109 with the value stored in particular bits in the requested address. As mentioned above, if there is match, then the data contained at the requested address lies within cache memory 114. Otherwise, cache memory 114 does not store the data at the requested address. The result may be transmitted to response pipeline 110 configured to transmit an indication as to whether the data at the requested address lies within cache memory 114. The result may be transmitted to either processor 101 or to another processor (not shown) or bus agent (not shown) via interconnect 106.

Referring to FIG. 1, response pipeline 110 and bypass pipeline 113 may be coupled to a multiplexer 115. Multiplexer 115 may be configured to select to send either the result from response pipeline 110 or the request to retry resending the snoop request denied from bypass pipeline 113 by using particular bit values from arbiter 107. That is, arbiter 107 may be configured to send particular bit values to the select input of multiplexer 115 used to select either the result from response pipeline 110 or the request to retry resending the snoop request denied from bypass pipeline 113.

Referring again to FIG. 1, dispatch pipeline 108 may further be configured to dispatch the result, e.g., cache hit, to processor's 101 requests to read/write machines 112A-N, where N is any number. Read/write machines 112A-N may collectively or individually be referred to as read/write machines 112 or read/write machine 112, respectively. Read/write machines 112 may be configured to execute these requests, e.g., read request, for processor 101.

Dispatch pipeline 108 may further be configured to dispatch the result to requests from interconnect 106 to snooping logic, referred to herein as "snoop machines" 111A-N, where N is any number. Snoop machines 111A-N may collectively or individually be referred to as snoop machines 111 or snoop machine 111, respectively. Snoop machines 111 may be configured to respond to the requests from other processors or bus agents. Snoop machines 111 may further be configured to write modified data in the cache memory of L3 cache 103 to the system memory (not shown) to maintain cache coherency.

Referring to FIG. 1, interconnect 106 may transfer a received snoop request to multiplexer 105 every cycle. The response to the snoop request may be transmitted at a given fixed number of cycles after interconnect 106 transmits the snoop request to L3 cache 103. For example, interconnect 106 may transmit the snoop request to multiplexer 105 on a given cycle followed by a determination by arbiter 107 as to whether the snoop request is selected to be dispatched to dispatch pipeline 108 or is to be transmitted on bypass pipeline 113 to response pipeline 110. If the snoop request is selected, it enters dispatch pipeline 108 and response pipeline 110 some cycle(s) later. A search in cache directory 109 is made some cycle(s) later by dispatch pipeline 108. The result as to whether data at the snoop address lies within cache memory 114 is transmitted to response pipeline 110. The response may be generated and transmitted to interconnect 106 some cycle(s) later by response pipeline 110. All these actions occur on a fixed schedule as illustrated in FIG. 2.

FIG. 2 is a timing diagram illustrating the actions described above occurring on a fixed schedule. Referring to FIG. 2, in conjunction with FIG. 1, interconnect 106 sends snoop requests A, B, C, and D to multiplexer 105 during the indicated clock cycles. Processor 101 (labeled "processor" in FIG. 2) sends requests M and N to multiplexer 105 during the indicated clock cycles. As illustrated in FIG. 2, snoop requests B and C are transmitted during the same cycle as requests M and N. The request (either the snoop request or the request sent by processor 101) becomes selected and dispatched by arbiter 107 to dispatch pipeline 108 (labeled "dispatch pipeline" in FIG. 2). As illustrated in FIG. 2, arbiter 107 selects snoop request A followed by selecting requests M and N instead of snoop requests B and C, respectively, followed by selecting snoop request D. These selected requests are dispatched to dispatch pipeline 110 in the clock cycles indicated in FIG. 2.

FIG. 2 further illustrates which clock cycle the result as to whether data at the addresses requested by snoop requests A and D was found within cache memory 114 is inputted to response pipeline 110. Snoop requests B and C are inputted into bypass pipeline 113 (indicated by "bypass pipeline" in FIG. 2) at the illustrated clock cycle since they were not selected by arbiter 107. At the end of response pipeline 110 for snoop request A (corresponds to the time to respond to snoop request A as labeled in FIG. 2), the result is transmitted to interconnect 106 at that given cycle. At the end of bypass pipeline 113 for snoop request B (corresponds to the time to respond to snoop request B as labeled in FIG. 2), the result (request to retry resending snoop request B) is transmitted to interconnect 106 at the cycle following the transmission of the result for snoop request A and so forth. As illustrated in FIG. 2, the time to respond to each snoop request occurs on a fixed schedule.

As stated above, if the snoop request is not selected by arbiter 107 (arbiter 107 selected request from processor 101 instead of snoop request), then the snoop request, e.g., snoop requests B and C, is sent to bypass pipeline 113 some cycle(s) later. The response indicating to retry sending the snoop request is generated and transmitted to interconnect 106 at a given cycle by bypass pipeline 113, some cycles later. Consequently, a snoop request from interconnect 106 may have to be denied and requested to be retried again which may result in hundreds of additional clock cycles of delay. If the number of rejected snoop requests could be reduced, then the performance could be improved.

Therefore, there is a need in the art to improve the performance by reducing the number of snoop requests denied.

SUMMARY

The problems outlined above may at least in part be solved in some embodiments by extending the time to respond to the snoop requests. The time to respond to snoop requests may be extended by "n" clock cycles. These "n" cycles may be used to provide additional time to resend the snoop request to be accepted by the arbitration mechanism upon denial of the snoop request by the arbitration mechanism. By providing the snoop request additional opportunities to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied thereby improving the performance.

In one embodiment of the present invention, a method for reducing the number of snoop requests rejected may comprise the step of receiving a new snoop request. The method may further comprise entering the new snoop request in a first available latch in the first unit if the first unit is not full. The method may further comprise sending the new snoop request to a second unit if the first unit is full where the second unit is configured to transmit a request to retry resending the new snoop request if the first unit is full. The method may further comprise implementing a hold operation upon receipt of a request from an arbitration mechanism to accept a request from a processor instead of a snoop request. The snoop requests have a higher priority than requests from the processor. The arbitration mechanism is configured to select the snoops requests over the requests from the processor unless issuance by the arbitration mechanism of the request to accept the request from the processor instead of the snoop request.

The foregoing has outlined rather generally the features and technical advantages of one or more embodiments of the present invention in order that the detailed description of the present invention that follows may be better understood. Additional features and advantages of the present invention will be described hereinafter which may form the subject of the claims of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the present invention can be obtained when the following detailed description is considered in conjunction with the following drawings, in which:

FIG. 1 illustrates internal components of a cache, such as an L3 cache;

FIG. 2 is a timing diagram illustrating that the time to respond to a snoop request occurs on a fixed schedule;

FIG. 3 illustrates a multiprocessor system configured in accordance with an embodiment of the present invention;

FIG. 4 illustrates a cache, such as an L3 cache, incorporated with a mechanism to reduce the number of snoop requests that get rejected in accordance with an embodiment of the present invention;

FIG. 5 illustrates an embodiment of the present invention of a realignment unit;

FIG. 6A is a timing diagram illustrating the time to respond to a snoop request using the mechanism of FIG. 1;

FIG. 6B is a timing diagram illustrating the time to respond to a snoop request using the mechanism of FIG. 4 in accordance with an embodiment of the present invention;

FIG. 7 is an illustration of expanding the time to respond to a snoop request using the mechanism of FIG. 4 in accordance with an embodiment of the present invention;

FIGS. 8A-B are a flowchart of a method for reducing the number of snoop requests that get rejected in accordance with an embodiment of the present invention;

FIG. 9 is an embodiment of the present invention of stall/reorder unit;

FIG. 10 is a flowchart of a method detailing the operation of the embodiment of stall/reorder unit described in FIG. 9 in accordance with an embodiment of the present invention;

FIG. 11 is another embodiment of the present invention of stall/reorder unit;

FIG. 12 is a timing diagram illustrating some conditions when a shift-down operation occurs in the pipeline of the stall/reorder unit in accordance with an embodiment of the present invention;

FIG. 13 is an additional timing diagram illustrating other conditions when a shift-down operation occurs in the pipeline of the stall/reorder unit in accordance with an embodiment of the present invention;

FIGS. 14A-D are a flowchart of a method detailing the operation of the embodiment of stall/reorder unit described in FIG. 11 in accordance with an embodiment of the present invention; and

FIG. 15 is a flowchart of a method for issuing a high priority request by the control unit in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without such specific details. In other instances, well-known circuits have been shown in block diagram form in order not to obscure the present invention in unnecessary detail. For the most part, details considering timing considerations and the like have been omitted inasmuch as such details are not necessary to obtain a complete understanding of the present invention and are within the skills of persons of ordinary skill in the relevant art.

FIG. 3--Multiprocessor System

FIG. 3 illustrates an embodiment of the present invention of a multiprocessor system 300. System 300 may include one or more processors 301A-B. Processors 301A-B may collectively or individually be referred to as processors 301 or processor 301, respectively. Processors 301A-B may each include a level one (L1) cache 302A-B, e.g., L1 instruction/data cache, respectively. L1 caches 302A-B may be configured to store instruction and data values that may be repeatedly accessed by processors 301A-B, respectively. L1 caches 302A-B may collectively or individually be referred to as L1 caches 302 or L1 cache 302, respectively. It is noted that those skilled in the art will recognize that multiple L1 caches, e.g., L1 instruction cache, L1 data cache, may be implemented instead of a unified L1 cache.

In order to minimize data access latency, one or more additional levels of cache coupled to processors 301 may be implemented such as a level two (L2) cache 303A-B coupled to processors 301A-B, respectively. L2 caches 303A-B may collectively or individually be referred to as L2 caches 303 or L2 cache 303, respectively. Furthermore, FIG. 3 illustrates a level three (L3) cache 304 coupled to L2 cache 303A. The lower cache levels may be employed to stage data to an L1 cache 302 and typically have progressively larger storage capacities but longer access latencies. It is noted that processors 301 may each be coupled to any number of additional levels of cache. It is further noted that in one embodiment, each processor 301A-B and associated lower cache levels may reside on a single intergraded circuit 305A-B, respectively.

Referring to FIG. 3, each processor 301 may be coupled to a bus 306. Bus 306 may subsequently permit the transmit of information, e.g., addresses, data, between processors 301 and a system memory 307. It is noted that system 300 may include any number of processors 301 coupled to system memory 307 via bus 306. It is further noted that FIG. 3 is not to be limited in scope to any particular embodiment and that FIG. 3 is illustrative.

Referring to FIG. 3, processor 301 may generate a transfer request to be received by bus 306. A "transfer request" may refer to either a request to read an address not within its associated cache memory(ies) or a request to write to an address not exclusively owned by its associated cache memory(ies).

Bus 306 may contain logic configured to determine if the received transfer request is snoopable ("snoopable transfer request"). That is, bus 306 may contain logic configured to determine if the received transfer request is to be broadcasted to the other snoop controllers (not shown) not associated with processor 301 that generated the transfer request. The other snoop controllers (not shown) may be configured to determine if a copy of the requested snoopable address, i.e., a copy of the requested coherency granule(s), is within their associated cache memories. The broadcasted transfer request may commonly be referred to as a "snoop request."

As stated in the Background Information section, a snoop request from an interconnect of a cache may have to be denied and requested to be retried again which may result in hundreds of additional clock cycles of delay. If the number of rejected snoop requests could be reduced, then the performance could be improved. Therefore, there is a need in the art to improve the performance by reducing the number of snoop requests denied.

A mechanism for reducing the number of snoop requests that get rejected is described below in association with FIGS. 4, 5, 6A-B, 7 and 8A-B. FIG. 4 illustrates the mechanism for reducing the number of snoop requests that get rejected in L3 cache 304 in integrated circuit 305A. It is noted that the mechanism may be implemented in any cache and that the principles of the present invention described in FIG. 4 may be applied to any cache. FIG. 5 illustrates an embodiment of a realignment unit illustrating how the realignment unit calculates the number of clock cycles to store the result of the snoop request or the request to retry resending the snoop request. FIG. 6A illustrates the time to respond to a snoop request using the mechanism of FIG. 1. FIG. 6B illustrates extending the time to respond to the snoop request thereby reducing the number of snoop requests that get rejected. FIG. 7 is a timing diagram illustrating the extension of the time to respond to the snoop request. FIGS. 8A-B are a flowchart of a method for reducing the number of snoop requests that get rejected using the mechanism as described in FIG. 4.

FIG. 4--L3 Cache Incorporated with Mechanism to Reduce the Number of Snoop Requests that get Rejected

FIG. 4 illustrates an embodiment of the present invention of a L3 cache 304 (FIG. 3) that includes a mechanism for reducing the number of snoop requests that get rejected.

Referring to FIG. 4, L3 cache 304 includes a multiplexer 401 configured to receive a request from processor 301A as well as a snoop request received via an interconnect 402. The snoop request is received by multiplexer 401 via a unit, referred to herein as the "stall/reorder unit" 403. Stall/reorder unit 403 may be configured to store information, e.g., address, of the received snoop request for up to a maximum number of n clock cycles. By being able to store information about the snoop request for up to a maximum number of n clock cycles, the time to respond to a snoop request is expanded. By expanding the time to respond to a snoop request, there will be fewer snoop requests that get rejected as explained in further detail below.

Stall/reorder unit 403 may include a series of queues 404A-N, where N is any number. Queues 404A-N may collectively or individually be referred to as queues 404 or queue 404, respectively. Queues 404 may be configured to store information, e.g., address, about the snoop requests. Stall/reorder unit 403 may further include a series of latches 405A-N, where N is any number, storing a count of the number of clock cycles information about a particular snoop request has resided in stall/reorder unit 403. Latches 405A-N may collectively or individually be referred to as latches 405 or latch 405, respectively. Each latch 405, e.g., latch 405A, may store a count of the number of clock cycles information about a snoop request resides in an associated queue 404, e.g., queue 404A. Stall/reorder unit 403 may further includes a control unit 406 which will be described in more detail below.

Upon receiving a snoop request from interconnect 402, stall/reorder unit 403 forwards the snoop request to multiplexer 401. Arbiter 407 determines which of the two requests (requests from interconnect 402 and from processor 301A) gets serviced. The selection performed by arbiter 407 is communicated to control unit 406.

If arbiter 407 denies the snoop request, control unit 406 may be configured to maintain the information stored in queue 404 for that denied snoop request. Further, control unit 406 may be configured to increment the counter in the associated latch 405 thereby indicating that the information about the snoop request will continue to reside in queue 404. Control unit 406 may be further configured to determine if any of the counters counted "n" cycles indicating that the information about a snoop request in an associated queue 404 resided there for "n" clock cycles. As stated above, stall/reorder unit 403 may be configured to store the information about a snoop request up to a maximum number of n clock cycles. When a latch 405 indicates that a counter counted "n" clock cycles, the snoop request may be transmitted to a unit, referred to herein as the "realignment unit" 409, by stall/reorder unit 403 via bypass line 417. Further, stall/reorder unit 403 may transmit along with the snoop request the counter bit(s) indicated by the associated latch 405 to realignment unit 413 via bypass line 417.

Upon receiving a snoop request and associated counter bit(s) that indicate that the information about the snoop request resided in stall/reorder unit 403 for n clock cycles, realignment unit 413 transmits a response to interconnect 402 indicating to retry resending the snoop request.

Stall/reorder unit 403 may further be configured to transmit a snoop request received from interconnect 402 to realignment unit 413 over bypass line 416 if queues 404 are full and are not able to store information about the received snoop request. Along with the transmitted snoop request, an indication that zero clock cycles were counted by a counter(s) may be transmitted to realignment unit 413 over bypass line 416.

Upon receiving a snoop request and associated counter bit(s) that indicate that the information about the snoop request did not reside in stall/reorder unit 403 for any clock cycles, realignment unit 413 transmits a response to interconnect 402 indicating to retry resending the snoop request after n clock cycles has transpired as described in further detail below.

As stated above, when arbiter 407 denies selecting a snoop request, control unit 406 maintains the information stored in queue 404 for that snoop request. Further, control unit 406 may increment the counter in the associated latch 405 thereby indicating that the snoop request will have resided in queue 404 for an additional period of time. Upon being denied by arbiter 407, stall/reorder unit 403 may be configured to resend that snoop request to multiplexer 401 upon a later point in time.

If, on the other hand, arbiter 407 selects the snoop request, stall/reorder unit 403 may be configured to transmit to multiplexer 401 the counter bit(s) stored in latch 405 associated with the accepted snoop request that indicate the number of clock cycles, if any, that the information about the accepted snoop request had resided in stall/reorder unit 403. This information may be passed along with the accepted snoop request. Upon being accepted by arbiter 407, the selected snoop request may be sent to dispatch pipeline 408. Dispatch pipeline 408 is coupled to a cache directory 409. Dispatch pipeline 408 may contain logic configured to determine if the data at the address of the snoop request lies within a cache memory 410 of L3 cache 304. Dispatch pipeline 408 may determine if the data at the requested address lies within cache memory 410 by comparing the tag values in cache directory 409 with the value stored in particular bits in the requested address. If there is a match, then the data contained at the requested address lies within cache memory 410. Otherwise, cache memory 410 does not store the data at the requested address. The result may be transmitted by dispatch pipeline 408 to response pipeline 411 configured to transmit an indication as to whether the data at the requested address lies within cache memory 410. The result is transmitted to realignment unit 413.

Dispatch pipeline 408 may further be configured to dispatch the result, e.g., cache hit, to processor's 301A requests to read/write machines 414A-N, where N is any number. Read/write machines 414A-N may collectively or individually be referred to as read/write machines 414 or read/write machine 414, respectively. Read/write machines 414 may be configured to execute these requests, e.g., read request, for processor 301A.

Dispatch pipeline 408 may further be configured to dispatch the result to requests from interconnect 402 to snooping logic, referred to herein as "snoop machines" 415A-N, where N is any number. Snoop machines 415A-N may collectively or individually be referred to as snoop machines 415 or snoop machine 415, respectively. Snoop machines 415 may be configured to respond to the requests from other processors or bus agents. Snoop machines 415 may further be configured to write modified data in the cache memory of L3 cache 304 to system memory 307 (FIG. 3) to maintain cache coherency.

As stated above, realignment unit 413 receives the counter bit(s) associated with the accepted snoop request that indicate the number of clock cycles, if any, that the information about the accepted snoop request had resided in stall/reorder unit 403. If the counter bit(s) indicate that the number of clock cycles is less than n clock cycles, then realignment unit 413 stores the result for the snoop request in a queue 412 for n clock cycles minus the number of clock cycles indicated by the counter bit(s). After waiting for n clock cycles minus the number of clock cycles indicated by the counter bit(s), realignment unit 413 transmits the result to interconnect 402. If, the counter bit(s) indicate n clock cycles, then realignment unit 413 transmits the result to interconnect 402 without storing the result in queue 412. By storing a snoop request denied by arbiter 407 for up to n cycles and storing the result to an accepted snoop request for n cycles minus the number of clock cycles the information about the snoop request was stored in stall/reorder unit 403, the time to respond has been extended by n clock cycles thereby providing additional time for a snoop request to be accepted instead of being rejected. That is, by extending the number of snoop requests that get serviced by the cache directory, the number of snoop requests that get rejected is reduced. An illustration of extending the time to respond to a snoop request is provided in FIGS. 6A-B. An embodiment of the present invention of realignment unit 413 illustrating how realignment unit 413 calculates the number of clock cycles to store the result for a snoop request or the request to retry resending the snoop request in queue 412 is provided below in association with FIG. 5.

Referring to FIG. 5, realignment unit 413 may include latches 501A-D. Latches 501A-D may collectively or individually be referred to as latches 501 or latch 501, respectively. It is noted that realignment unit 413 may include any number of latches 501 and that FIG. 5 is illustrative. Realignment unit 413 may further include multiplexers 502A-C coupled to latches 501A-C, respectively. Multiplexers 502A-C may collectively or individually be referred to as multiplexers 502 or multiplexer 502, respectively. It is noted that realignment unit 413 may include any number of multiplexers 502. The number of multiplexers 502 corresponds to "n" clock cycles as defined above. The number of latches 501 corresponds to one more than the number of multiplexers 502. Latches 501 and multiplexers 502 may form queue 412 of FIG. 4. Realignment unit 413 may further include a control module 503 coupled to the selector input of multiplexers 502A-C.

Referring to FIG. 5, in conjunction with FIG. 4, multiplexer 502A receives as input the output of latch 501A and the response (the result or request to retry resending the snoop request) to the snoop request. Similarly, multiplexer 502B receives as input the output of latch 501B and the response to the snoop request and multiplexer 502C receives as input the output of latch 501C and the response to the snoop request. Latch 501A receives as input the response to the snoop request and latch 501D receives the result that is to be transmitted to interconnect 402.

Control module 503 receives the count value transmitted from stall/reorder unit 403 as discussed above. Based on this value, control module 503 will select a particular multiplexer 502 to output the response (the result) to the snoop request. If there are anymore succeeding multiplexers 502, then those multiplexers 502 will output the result stored in the previous latch 501 the following clock cycle. For example, suppose the count value received by control module 503 indicated that the number of clock cycles that queue 404 has stored the information, e.g., address, for that snoop request, was zero. Control module 503 may then ensure that the result is stored in queue 412 (represented by latches 501A-D) for "n" clock cycles which in the example of FIG. 5 is three clock cycles. Consequently, control module 503 inputs a value to the selector of multiplexer 502A indicating to output the response (the result) from the snoop request. The response is later stored in latch 501B for a clock cycle. Control module 503 then inputs a value to the selector of multiplexer 502B indicating to output the response stored in latch 501B. That output is stored in latch 501C for a clock cycle. Control module 503 then inputs a value to the selector of multiplexer 502C indicating to output the response stored in latch 501C. That output is stored in latch 501D for a clock cycle followed by realignment unit 509 transmitting the result to interconnect 402.

Referring to FIG. 6A, FIG. 6A illustrates the time to respond (indicated by "TTR" in FIG. 6A) using the mechanism of FIG. 1. As illustrated in FIG. 6A, the response pipeline, such as response pipeline 110 of FIG. 1 and the TTR are the same. However, the TTR is expanded using the mechanism of FIG. 4 as illustrated in FIG. 6B.

FIG. 6B illustrates the time to respond (indicated by "TTR" in FIG. 6B) using the mechanism of FIG. 4. Referring to FIG. 6B, in conjunction with FIG. 4, the TTR includes the length of the response pipeline, such as response pipeline 411, plus the time durations labeled "stall" and "realign". The time duration of "stall" refers to the number of clock cycles, if any, that a snoop request resides in stall/reorder unit 403. That is, the time duration of "stall" refers to the number of clock cycles that the information, e.g., address, of a snoop request resides in queue 404 of stall/reorder unit 403. The time duration of "realign" refers to the number of clock cycles, if any, that the result to a snoop request resides in realignment unit 413. That is, the time duration of "realign" refers to the number of clock cycles that the result to a snoop request resides in queue 412 of realignment unit 413. It is noted that either the time duration of "stall" or "realign" may be a length of zero clock cycles. However, the total number of clock cycles of the "stall" and "realign" time periods equals "n" clock cycles.

Another illustration of expanding the TTR using the mechanism of FIG. 4 is illustrated in FIG. 7. FIG. 7 is a timing diagram illustrating the expansion of the time to respond to a snoop request using the mechanism of FIG. 4 in accordance with an embodiment of the present invention. Referring to FIG. 7, in conjunction with FIG. 4, interconnect 402 sends snoop requests A, B, C and D to stall/reorder unit 403 during the indicated clock cycles. Processor 301A (labeled "processor" in FIG. 7) sends requests M and N to multiplexer 401 during the indicated clock cycles. As illustrated in FIG. 7, snoop requests B and C are transmitted during the same clock cycle as requests M and N. As further illustrated in FIG. 7, requests A, M and N are initially selected by arbiter 407 and requests B, C and D are initially denied by arbiter 407. The selected requests are dispatched by arbiter 407 to dispatch pipeline 408 (labeled "dispatch pipeline" in FIG. 7) in the indicated clock cycles. The response these requests are later inputted to response pipeline 411 (labeled "response pipeline" in FIG. 7). For ease of understanding, FIG. 7 includes a count value following the label of the snoop request, e.g., A, B, C, D, indicating the number of clock cycles until the scheduled time to transmit the response to interconnect 402. For example, FIG. 7 illustrates that the time to respond to a snoop request is ten clock cycles. Hence, each snoop request includes a count value indicating the number of clock cycles (ranging from zero to nine) until the scheduled time to transmit the response to interconnect 402. It is noted that the time to respond may be any number of clock cycles and that FIG. 7 is illustrative.

As stated above, snoop request A was immediately accepted by arbiter 407. The response (result as to whether data at the address of snoop request A lies within cache memory 410) to snoop request A is later inputted to response pipeline 411 in the clock cycle indicated in FIG. 7. As illustrated in FIG. 7, the time to respond to a snoop request is ten clock cycles. Hence, since snoop request A was never denied by arbiter 407, the stall period as discussed above was zero clock cycles. Consequently, the result to snoop request A is stored in queue 412 of realignment unit 413 for the number of clock cycles that the time to respond was extended as described above. In the example illustrated in FIG. 7, the length of response pipeline 411 is six clock cycles. Since in the example illustrated in FIG. 7, the time to respond to a snoop request is ten clock cycles, the result to the snoop request is stored in queue 412 of realignment unit 413 for four clock cycles. These four clock cycles occur during the realign period (labeled "realign pipeline" in FIG. 7). At the end of the time to respond, the result is transmitted to interconnect 402 in the indicated clock cycle.

Similarly, as illustrated in FIG. 7, processor requests M and N were immediately accepted by arbiter 407. These requests were dispatched to dispatch pipeline 408 in the indicated clock cycle and the response to requests M and N were later inputted to response pipeline 411 (labeled "response pipeline" in FIG. 7). At the end of the response pipeline, these results are transmitted to processor 301 A.

As stated above, when arbiter 407 denies selecting a snoop request, control unit 406 maintains the information stored in queue 404 for that snoop request for another clock cycle. The snoop request is retried the next clock cycle. The process is repeated until arbiter 407 selects the snoop request or until "n" clock cycles has been counted by the counter(s) for the duration of time that queue 404 has stored the information, e.g., address, for that snoop request. As illustrated in FIG. 7, there were two clock cycles in which the information for snoop request B was stored in queue 404 corresponding to the two times that snoop request B was denied by arbiter 407 (indicated by retrying snoop request B for two clock cycles). These two clock cycles occur during the stall period (labeled "stall pipeline" in FIG. 7). Similarly, snoop request C was stored in queue 404 for two clock cycles during the stall period as illustrated in FIG. 7. Similarly, snoop request D was denied for two clock cycles and hence was stored in queue 404 for two clock cycles during the stall period as illustrated in FIG. 7.

Once these previously denied snoop requests B, C and D are accepted by arbiter 407, snoop requests B, C and D are dispatched to dispatch pipeline 408 in the clock cycle indicated in FIG. 7. The responses (result as to whether data at the addresses of snoop requests B, C and D lie within cache memory 410) to snoop requests B, C and D are later inputted to response pipeline 411. At the end of the response pipeline, the results to snoop requests B, C and D are inputted to realignment unit 413 and stored in queue 412 for n cycles minus the number of clock cycles indicated by the counter bit(s) received by stall/reorder unit 403. That is, the result to snoop requests B, C and D are stored in queue 412 for the length of the realign period. FIG. 7 illustrates that the realign period for each of these requests is two clock cycles in length. At the end of the realign period for these requests, which may be zero clock cycles, the result is transmitted to interconnect 402 by realignment unit 413 as illustrated in FIG. 7.

In the example for snoop request A, as illustrated in FIG. 7, the realign period is four clock cycles and the stall period is zero clock cycles and hence "n" clock cycles (total number of clock cycles in addition to the response pipeline to formulate the total time to respond) corresponds to four clock cycles. Hence, the total time to respond to snoop request A is the length of the response pipeline plus four clock cycles (realign period plus the stall period) thereby extending the time to respond to a snoop request by four clock cycles over the mechanism of FIG. 1.

Similarly, in the example for snoop requests B, C and D, as illustrated in FIG. 7, the realign period is two clock cycles and the stall period is two clock cycles and hence "n" clock cycles (total number of clock cycles in addition to the response pipeline to formulate the total time to respond) corresponds to four clock cycles. Hence, the total time to respond to snoop requests B, C and D is the length of the response pipeline plus four clock cycles (realign period plus the stall period) thereby extending the time to respond to a snoop request by four clock cycles over the mechanism of FIG. 1.

By extending the time to respond to a snoop request, there is additional time for a snoop request to be accepted instead of being rejected. That is, by extending the number of snoop requests that get serviced by the cache directory, the number of snoop requests that get rejected is reduced.

A method for reducing the number of snoop requests that get rejected using the mechanism of FIG. 4 is described below in association with FIGS. 8A-B.

FIGS. 8A-B--Method for Reducing the Number of Snoop Requests that get Rejected

FIGS. 8A-B are a flowchart of one embodiment of the present invention of a method 800 for reducing the number of snoop requests that get rejected.

Referring to FIG. 8A, in conjunction with FIGS. 3-5, 6A-B and 7, in step 801, stall/reorder unit 403 receives a snoop request from interconnect 402. In step 802, stall/reorder unit determines if queues 404 are full. If queues 404 are full, then, in step 803, stall/reorder unit 403 sends the snoop request along with its counter value(s) to realignment unit 413 via bypass line 416.

If, however, queues 404 are not full, then, in step 804, the snoop request enters stall/reorder unit 403. In step 805, stall/reorder unit 403 determines if the snoop request is ready to be dispatched to multiplexer 401.

If the snoop request is ready to be dispatched to multiplexer 401, then, in step 806, stall/reorder unit 403 attempts to dispatch the snoop request to multiplexer 401. In step 807, stall/reorder unit 403 determines if the dispatch of the snoop request was successful. If the dispatch was successful, then, in step 808, stall/reorder unit 403 removes the information, e.g., address, about the snoop request from queue 404.

If, however, the dispatch was not successful or if the snoop request was not ready to be dispatched, then, in step 809, stall/reorder unit 403 determines if the information, e.g., address, about the snoop request has been stored in queues 404 for "n" clock cycles.

If the information about the snoop request has not been stored in queues 404 for n clock cycles, then stall/reorder unit 403 determines if the snoop request is ready to be dispatched to multiplexer 401 in step 805.

If, however, the information about the snoop request has been stored in queues 404 for n clock cycles, then, in step 810, stall/reorder unit 403 transmits the snoop request along with its counter value(s) to realignment unit 413 via bypass line 417. In step 811, stall/reorder unit 403 removes the information, e.g., address, about the snoop request from queues 404.

FIG. 8B, as described below, describes the operations performed by realignment unit 413. Referring to FIG. 8B, in conjunction with FIGS. 3-5, 6A-B and 7, upon the snoop request being successfully dispatched to multiplexer 401 and accepted by arbiter 407, realignment unit 413 receives the result of the snoop request, along with its counter value(s), from response pipeline 411 in step 811. Further, realignment unit 413 receives the snoop request, along with its counter value(s) from stall/reorder unit 403 via bypass line 416, 417 in step 811.

In step 812, realignment unit 413 examines the received counter value(s) associated with the received snoop request. In step 813, realignment unit 413 stores the result or request to retry resending the snoop request in the appropriate latch 501. In step 814, the result or request to retry resending the snoop request is delayed the appropriate clock cycles ("the realignment period") by realignment unit 413. In step 815, realignment unit 413 issues the result or request to retry resending the snoop request to interconnect 402.

It is noted that method 800 may include other and/or additional steps that, for clarity and brevity, are not depicted. It is further noted that method 800 may be executed in a different order presented and that the order presented in the discussion of FIGS. 8A-B is illustrative. It is further noted that certain steps in method 800 may be executed in a substantially simultaneous manner.

A detail description of an embodiment of the present invention of stall/reorder unit 403 is provided below in association with FIG. 9.

FIG. 9--Embodiment of Stall/Reorder Unit

Referring to FIG. 9, stall/reorder unit 403 includes a plurality of multiplexers 901A-N, where N is any number. Multiplexers 901A-N may collectively or individually be referred to as multiplexers 901 or multiplexer 901, respectively. Stall/reorder unit 403 may further include a plurality of latches 902A-N coupled to multiplexers 901A-N. Latches 902A-N may collectively or individually be referred to as latches 902 or latch 902, respectively. Stall/reorder unit 403 may further include control unit 406 as described above. It is noted that stall/reorder unit 403 may include additional elements than depicted, but these additional elements were not depicted for ease of understanding.

The embodiment of stall/reorder unit 403 described in FIG. 9 is used under the condition of the snoop request having higher priority over a request from the processor. That is, when arbiter 407 (FIG. 4) receives a request from processor 301A (FIG. 4) and a snoop request from stall/reorder unit 403 (FIG. 4), arbiter selects the snoop request unless it receives an acknowledgment to select the request from processor 301A from control unit 406 as described below.

Referring to FIG. 9, in conjunction with FIG. 4, an incoming snoop request may be transmitted on bypass 416 when latches 902 are full and there is no shift operation, as described below, about to occur. If latches 902 are not full, then the incoming snoop request is inputted to multiplexers 901. Multiplexers 901 also receive as inputs the stored snoop request (address, type, etc. of snoop request) in the succeeding latch 902 and the stored snoop request in the preceding latch 902 except for multiplexer 901A.

As to which input of multiplexer 902 will be selected, control unit 406 issues a command to each multiplexer 902 to perform the following actions. Control unit 406 may issue a command to each multiplexer 902 to "shift-down". Shift down may refer to multiplexers 901 outputting the stored snoop request, if any, in the preceding latch 902 and the last latch 902 in the stack of latches 902 outputting the snoop request to either multiplexer 401 or to bypass line 417 based on whether the count value associated with the snoop request is n clock cycles. The count value associated with the snoop request may be determined by counters 405 as described above in FIG. 4. In one embodiment, counters 405 may reside in control unit 406

Control unit 406 may further issue a command to each multiplexer 902 to "hold." Hold may refer to multiplexers 901 outputting the snoop request stored in the succeeding latch 902.

Control unit 406 may further issue a command to a particular multiplexer 902 to output the incoming snoop reque


Free Web Sudoku Puzzles.
Solve with your browser.
    9 8 2       1
  2         8   9
      1       3 5
7   8       6    
      7   8      
    4       9   8
4 3       1      
6   7         9  
9       7 2 3    
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!