Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Increase your Sales Lead List using Myspace
Category:
Business  

Refinancing the Responsible Way Ways to avoid Predatory Lending ...
Category:
Business  

Are you on the first page in the search engines
Category:
Marketing  

A New Test To Save Potential Heart Attack
Category:
Health / Fitness  

Adding Audio To Your Webpages In Seconds
Category:
Marketing  

How Can I Tell My Partner I Don t Like The Way He Makes Love To ...
Category:
Home And Family  

Multi Disciplinary Approach to Fibromyalgia Treatment
Category:
Health / Fitness  

How To Become A Wealthy Piano Teacher
Category:
Business  

Building a Home Theater using Green Glue or Mass loaded vinyl
Category:
Hobbies / Pastimes  

how alcohol affects the brain
Category:
Health / Fitness  

3 Simple Things
Category:
Business  

What food caused heartburn
Category:
Health / Fitness  

Is Botox Right For You
Category:
Health / Fitness  

Why on Earth are You STILL Cold Calling
Category:
Business  

Stop smoking the essential resource that any smoker needs to sto...
Category:
Health / Fitness  

Healthy Weight Loss
Category:
Health / Fitness  

How to find a qualified Property Agent
Category:
Business  

McDonald s CEO Greenberg Urges McFamily To Stand Tall
Category:
Business  

Learn How To Capture New Business For Your Award Shop
Category:
Marketing  

The Profound Fitness Manifesto Part V Test Track Tweak
Category:
Health / Fitness  

Choosing The Perfect Area Rug For Your Home
Category:
Home And Family  

Ready for the Ashes 2006 2007
Category:
Entertainment / Television  

LASIK Surgery How The Excimer Laser Works
Category:
Health / Fitness  

High Blood Pressure in Children
Category:
Health / Fitness  

Collecting Diecast Vehicles is a fun hobby for folks of all ages...
Category:
Hobbies / Pastimes  

Conservatories and Building Regulations
Category:
Home And Family  

Picking the Perfect Hawaiian Vacation Package on Kauai Maui and ...
Category:
Travel  

The Seven Secrets of Great Customer Service
Category:
Business  

Add Years to Your Life
Category:
Health / Fitness  

How to Get Radio Interviews to Promote Your Business
Category:
Marketing  

Depression and Anxiety
Category:
Health / Fitness  

Driving Traffic to Your Blog Part One
Category:
Business  

A Guide to Buying Children s Toys
Category:
Home And Family  

The Top 10 Ways To Improve Your Interview Body Language Part Two...
Category:
Business  

Persuasive Techniques You Could Use To Get What You Want In Your...
Category:
Home And Family  

How Do You Know That
Category:
Business  

How To Make Your Own Baby Cosmetics
Category:
Home And Family  

Home Improvement Ideas and Tips
Category:
Business  

Consolidating Credit Cards How to Effectively Use Balance Transf...
Category:
Finance / Investment  

Golf In Lofoten is a Mystical Experience
Category:
Sports  

What Makes YOU So Special An Exercise in Differentiation
Category:
Business  

What Are The Ultimate Bridal Gifts
Category:
Home And Family  

Who Else Wants To Make Money With Adsense
Category:
Marketing  

Natural Isn t Always Best Buyer Beware
Category:
Health / Fitness  

Adverse Credit Remortgage Refinance at Better Terms
Category:
Finance / Investment  

The Simplest And Easiest Way To Position Your Business Ahead Of ...
Category:
Marketing  

Franchising Pros And Cons
Category:
Business  

Africa Is Turning Mobile
Category:
Business  

Natural Hair Care Products Beginners Guide
Category:
Health / Fitness  

Types of Acne Treatment Solutions For Your Skin
Category:
Health / Fitness  

Using Teleseminars to help your congregation listen to you while...
Category:
Self Help  

Master the art of Debt management
Category:
Finance / Investment  

Learn Piano Online
Category:
Hobbies / Pastimes  

CHRISTMAS HOT JOBS
Category:
Business  

I Pulled A Fast One on My Obese Husband Now He s Skinny and Lovi...
Category:
Health / Fitness  

Mother Nature Rules
Category:
Health / Fitness  

Re Visioning
Category:
Self Help  

Everyone Needs A Spaghetti Garden
Category:
Home And Family  

Free Teleseminar Is Showing Thousands How To Make 500 A Day
Category:
Business  

The Emotional Effects of Acne
Category:
Health / Fitness  

Indie Music Your big break may be just around the corner find ou...
Category:
Entertainment / Television  

An Effective Way for Getting Rid of Mosquitos
Category:
Health / Fitness  

Do I Need Medical Treatment for Menopause
Category:
Health / Fitness  

Dichotomy of Preference
Category:
Self Help  

Golf Workouts For Winter
Category:
Sports  

Coin Collecting Was Easier When I Was Younger
Category:
Home And Family  

Choosing Dog Tags for Your Canine
Category:
Pets  

How Alcohol Affects Panic Attacks
Category:
Health / Fitness  

Can You Afford To Live On Organic Food
Category:
Health / Fitness  

Jargon Buster Finance in Plain English
Category:
Finance / Investment  

The Top 10 Countdown to a Flu less Holiday
Category:
Health / Fitness  

Secure Emergency Cash Advance Short Term Powers
Category:
Business  

Is Internet marketing just another JOB
Category:
Marketing  

Euro Pounds Currency markets property Costa Del Sol Spain August...
Category:
Business  

Why the Lack of a Merchant Account Could Be the Death of Your Bu...
Category:
Marketing

State tracking methodology Number:7,385,608 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Georgia's Parliament Urges Breaking Diplomatic Ties With Russia by Peter Heinlein
     Darfur Rebels Claim to Shoot Down Spy Plane by VOA News
     Obama Prepares to Formally Accept Democratic Party Presidential Nomination by VOA News

Title: State tracking methodology

Abstract: Redundant changes of tracked state issued by an application are filtered out by comparing the new state value with the old value, and if they are the same, no update is made. State changes are collected in on-chip memory and added to the bin if the state vector associated with the bin is out of date. State changes within a bin are done incrementally in temporal order, and a bin is only brought up to date prior to adding in a new primitive if the state has changed since the last primitive was added to it.

Patent Number: 7,385,608 Issued on 06/10/2008 to Baldwin


Inventors: Baldwin; David R. (Weybridge, GB)
Assignee: 3DLabs Inc. Ltd. (Hamilton, BM)
Appl. No.: 10/917,427
Filed: August 11, 2004


Related U.S. Patent Documents

Application NumberFiling DatePatent NumberIssue Date
60533675Dec., 2003

Current U.S. Class: 345/506 ; 345/419; 345/421; 345/422; 345/426; 345/581; 345/582; 345/589
Field of Search: 345/506,419,421,422,426,581,582,589


References Cited [Referenced By]

U.S. Patent Documents
5889994 March 1999 Brown et al.
6323860 November 2001 Zhu et al.
6344852 February 2002 Zhu
6738069 May 2004 Doyle
Primary Examiner: Nguyen; Kimbinh T.
Attorney, Agent or Firm: Groover & Associates

Parent Case Text



CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application 60/533,675 filed Dec. 31, 2003, which is hereby incorporated by reference.
Claims



What is claimed is:

1. A method for rendering 3D graphics, comprising the steps of: when a state item is received for one or more bins, comparing the new state item received with the old state items of each of said respective bins in dependence on temporal information to track said state items; and for at least some bins for which the state items are the same, making no updates to said state items for said bins; and rendering in accordance with said state items; wherein redundant changes of tracked state items issued by an application are filtered out.

2. The method of claim 1, wherein said state items are sorted in temporal order with the most recent state items first.

3. The method of claim 1, wherein said state items are sorted using a double-linked list.

4. The method of claim 1, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

5. The method of claim 1, wherein said bin's timestamp is updated after a state item has been added.

6. The method of claim 1, wherein said state items are compressed.

7. The method of claim 1, wherein said state items are classified as either high frequency state or low frequency state.

8. A method for rendering 3D graphics, comprising the steps of: collecting state items in on-chip memory; associating one or more bins with a respective timestamp; adding said state items to a bin only if the timestamp associated with said bin is out of date; wherein redundant changes of said state items issued by an application are filtered out; and rendering in accordance with said state items.

9. The method of claim 8, wherein said state items are sorted in temporal order with the most recent state items first.

10. The method of claim 8, wherein said state items are sorted using a double-linked list.

11. The method of claim 8, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

12. The method of claim 8, wherein said bin's timestamp is updated after a state item has been added.

13. The method of claim 8, wherein said state items are compressed.

14. The method of claim 8, wherein said state items are classified as either high frequency state or low frequency state.

15. A method for rendering 3D graphics, comprising the steps of: collecting state items for one or more bins; associating each state item with a respective timestamp; updating said timestamp in dependence of newly received state items; sorting said state items in temporal order; associating each of said bins with a respective timestamp to indicate when said respective bin was last updated; updating a bin's state in dependence of result of comparing said bin's timestamp with the timestamp of the state item to be added; and copying the most recent state item to said bin; and rendering in accordance with said state items.

16. The method of claim 15, wherein said state items are sorted in temporal order with the most recent state items first.

17. The method of claim 15, wherein said state items are sorted using a double-linked list.

18. The method of claim 15, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

19. The method of claim 15, wherein said bin's timestamp is updated after a state item has been added.

20. The method of claim 15, wherein said state items are compressed.

21. The method of claim 15, wherein said state items are classified as either high frequency state or low frequency state.

22. A method for tracking state items issued by an application, comprising the steps of: classifying said state items as either high frequency or low frequency; compressing said high frequency state items into a compact format; collecting said high frequency state items in on-chip memory; associating one or more bins with a respective timestamp; adding a high frequency state item to a bin for tracking only if the timestamp associated with said bin is out of date; wherein redundant changes of tracked high frequency state items issued by an application are filtered out; and rendering in accordance with said state items.

23. The method of claim 22, wherein said state items are sorted in temporal order with the most recent state items first.

24. The method of claim 22, wherein said state items are sorted using a double-linked list.

25. The method of claim 22, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

26. The method of claim 22, wherein said bin's timestamp is updated after a state item has been added.

27. A method for tracking state items issued by an application, comprising the steps of: classifying said state items as either high frequency or low frequency; compressing said high frequency state items into a compact format; collecting high frequency state items for one or more bins in on-chip memory; associating each high frequency state item with a respective timestamp; updating said timestamp when a new high frequency state item is received; sorting said high frequency state items in temporal order; associating each of said bins with a respective timestamp to indicate when said respective bin was last updated; when a bin's state is to be updated, comparing said bin's timestamp with the timestamp of the high frequency state item to be added; and copying the most recent high frequency state item to said bin; and rendering in accordance with said state items.

28. The method of claim 27, wherein said state items are sorted in temporal order with the most recent state items first.

29. The method of claim 27, wherein said state items are sorted using a double-linked list.

30. The method of claim 27, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

31. The method of claim 27, wherein said bin's timestamp is updated after a state item has been added.

32. A computer system for 3D graphics rendering comprising: a host processor; and a 3D graphics accelerator comprising: a device for a) collecting state items; b) associating one or more bins with a respective timestamp; and c) adding said state items to a bin only if the timestamp associated with said bin is out of date.

33. The system of claim 32, wherein said state items are sorted in temporal order with the most recent state items first.

34. The system of claim 32, wherein said state items are sorted using a double-linked list.

35. The system of claim 32, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

36. The system of claim 32, wherein said bin's timestamp is updated after a state item has been added.

37. The system of claim 32, wherein said state items are compressed.

38. The system of claim 32, wherein said state items are classified as either high frequency state or low frequency state.

39. A computer system for 3D graphics rendering comprising: a host processor; and a 3D graphics accelerator comprising: a device for tracking state items by a) classifying state items as either high frequency or low frequency; b) compressing said high frequency state items into a compact format; c) collecting said high frequency state items for one or more bins in on-chip memory; d) associating said bins with a respective timestamp; and e) adding a high frequency state item to a bin only if the timestamp associated with said bin is out of date.

40. The system of claim 39, wherein said state items are sorted in temporal order with the most recent state items first.

41. The system of claim 39, wherein said state items are sorted using a double-linked list.

42. The system of claim 39, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

43. The system of claim 39, wherein said bin's timestamp is updated after a state item has been added.

44. A graphics rendering module, comprising: a double-linked, state-tracking table comprising: a list of the state items for a particular frame; and a timestamp for each state item; a bin table comprising: a list of the bins for a particular scene; a timestamp for each bin; and a bin record update pointer for each bin; and a rendering hardware using a binning architecture which sorts said state items in temporal order; copies a state item to a bin only if the timestamp for said state item is more recent than the timestamp for said bin; and updates the timestamp for a bin once a state item has been copied to said bin.

45. The module of claim 44, wherein said state items are sorted in temporal order with the most recent state items first.

46. The module of claim 44, wherein said state items are sorted using a double-linked list.

47. The module of claim 44, wherein said state items are added to said bin in reverse of the temporal order in which they were updated.

48. The module of claim 44, wherein said state items are compressed.

49. The module of claim 44, wherein said state items are classified as either high frequency state or low frequency state.
Description



FIELD OF THE INVENTION

The present inventions relate to computer graphics and, more particularly, to state tracking.

BACKGROUND AND SUMMARY OF THE INVENTION

Background: 3D Computer Graphics

One of the driving features in the performance of most single-user computers is computer graphics. This is particularly important in computer games and workstations, but is generally very important across the personal computer market.

For some years, the most critical area of graphics development has been in three-dimensional ("3D") graphics. The peculiar demands of 3D graphics are driven by the need to present a realistic view, on a computer monitor, of a three-dimensional scene. The pattern written onto the two-dimensional screen must, therefore, be derived from the three-dimensional geometries in such a way that the user can easily "see" the three-dimensional scene (as if the screen were merely a window into a real three-dimensional scene). This requires extensive computation to obtain the correct image for display, taking account of surface textures, lighting, shadowing, and other characteristics.

The starting point (for the aspects of computer graphics considered in the present application) is a three-dimensional scene, with specified viewpoint and lighting (etc.). The elements of a 3D scene are normally defined by sets of polygons (typically triangles), each having attributes such as color, reflectivity, and spatial location. (For example, a walking human, at a given instant, might be translated into a few hundred triangles which map out the surface of the human's body.) Textures are "applied" onto the polygons, to provide detail in the scene. (For example, a flat, carpeted floor will look far more realistic if a simple repeating texture pattern is applied onto it.) Designers use specialized modelling software tools, such as 3D Studio, to build textured polygonal models.

The 3D graphics pipeline consists of two major stages, or subsystems, referred to as geometry and rendering. The geometry stage is responsible for managing all polygon activities and for converting three-dimensional spatial data into a two-dimensional representation of the viewed scene, with properly-transformed polygons. The polygons in the three-dimensional scene, with their applied textures, must then be transformed to obtain their correct appearance from the viewpoint of the moment; this transformation requires calculation of lighting (and apparent brightness), foreshortening, obstruction, etc.

However, even after these transformations and extensive calculations have been done, there is still a large amount of data manipulation to be done: the correct values for EACH PIXEL of the transformed polygons must be derived from the two-dimensional representation. (This requires not only interpolation of pixel values within a polygon, but also correct application of properly oriented texture maps.) The rendering stage is responsible for these activities: it "renders" the two-dimensional data from the geometry stage to produce correct values for all pixels of each frame of the image sequence.

The most challenging 3D graphics applications are dynamic rather than static. In addition to changing objects in the scene, many applications also seek to convey an illusion of movement by changing the scene in response to the user's input. Whenever a change in the orientation or position of the camera is desired, every object in a scene must be recalculated relative to the new view. As can be imagined, a fast-paced game needing to maintain a high frame rate will require many calculations and many memory accesses.

Background: Texturing

There are different ways to add complexity to a 3D scene. Creating more and more detailed models, consisting of a greater number of polygons, is one way to add visual interest to a scene. However, adding polygons necessitates paying the price of having to manipulate more geometry. 3D systems have what is known as a "polygon budget," an approximate number of polygons that can be manipulated without unacceptable performance degradation. In general, fewer polygons yield higher frame rates.

The visual appeal of computer graphics rendering is greatly enhanced by the use of "textures". A texture is a two-dimensional image which is mapped into the data to be rendered. Textures provide a very efficient way to generate the level of minor surface detail which makes synthetic images realistic, without requiring transfer of immense amounts of data. Texture patterns provide realistic detail at the sub-polygon level, so the higher-level tasks of polygon-processing are not overloaded. See Foley et al., Computer Graphics: Principles and Practice (2.ed. 1990, corr. 1995), especially at pages 741-744; Paul S. Heckbert, "Fundamentals of Texture Mapping and Image Warping," Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994; Heckbert, "Survey of Computer Graphics," IEEE Computer Graphics, November 1986, pp. 56; all of which are hereby incorporated by reference. Game programmers have also found that texture mapping is generally a very efficient way to achieve very dynamic images without requiring a hugely increased memory bandwidth for data handling.

A typical graphics system reads data from a texture map, processes it, and writes color data to display memory. The processing may include mipmap filtering which requires access to several maps. The texture map need not be limited to colors, but can hold other information that can be applied to a surface to affect its appearance; this could include height perturbation to give the effect of roughness. The individual elements of a texture map are called "texels".

Awkward side-effects of texture mapping occur unless the renderer can apply texture maps with correct perspective. Perspective-corrected texture mapping involves an algorithm that translates "texels" (pixels from the bitmap texture image) into display pixels in accordance with the spatial orientation of the surface. Since the surfaces are transformed (by the host or geometry engine) to produce a 2D view, the textures will need to be similarly transformed by a linear transform (normally projective or "affine"). (In conventional terminology, the coordinates of the object surface, i.e. the primitive being rendered, are referred to as an (s,t) coordinate space, and the map of the stored texture is referred to a (u,v) coordinate space.) The transformation in the resulting mapping means that a horizontal line in the (x,y) display space is very likely to correspond to a slanted line in the (u,v) space of the texture map, and hence many additional reads will occur, due to the texturing operation, as rendering walks along a horizontal line of pixels.

One of the requirements of many 3-D graphics applications (especially gaming applications) is fill and texturing rates. Gaming and DCC (digital content creation) applications use complex textures, and may often use multiple textures with a single primitive. (CAD and similar workstation applications, by contrast, make much less use of textures, and typically use smaller polygons but more of them.) Achieving an adequately high rate of texturing and fill operations requires a very large memory bandwidth.

Background: Binning

A tiled, binning, chunking, or bucket rendering architecture is where the primitives are sorted into screen regions before they are rendered. This architecture allows all the primitives within a screen region to be rendered together to exploit the higher locality of reference to the z and color buffers, thereby allowing more efficient memory usage typically by using only on-chip memory. This also enables other whole-scene rendering opportunities such as deferred-rendering, order-independent transparency, and new types of antialiasing. In the present application, "transparent" is used generally to designate anything with alpha <1.

The primitives and state are recorded in a spatial database in memory that represents the frame being rendered. This is done after any T&L processing so everything is in screen coordinates. Ideally, no rendering occurs until the frame is complete; however, it will be done early on a user flush if the amount of binned data exceeds a programmable threshold or if the memory set aside to hold the database is exhausted. While the database for one frame is being constructed, the database for an earlier frame will be rendered.

The screen is divided up into rectangular regions called bins, and each bin heads a linked list of bin records that hold the state and primitives that overlap with this bin region. A primitive and its associated state may be repeated across several bins. Vertex data is held separately and is not replicated when a primitive overlaps multiple bins to allow more efficient storage mechanisms to be used. Primitives are maintained in temporal order within a bin.

Opaque primitives can be rendered in any order and are usually rendered in the order the primitives are submitted. Generally, the depth test ensures that the final result is the same. However, different rendering orders of co-planar polygons will give different results.

To render transparent primitives correctly, they need to be drawn either in a front-to-back or back-to-front order after all the opaque primitives have been rendered. The application sorts the transparent primitives into order before submitting them for rendering, and there are two basic algorithms used:

The application can sort the transparent primitives in a manner similar to the Painter's algorithm (an early method for hidden surface removal). There may be no correct rendering order when transparent primitives are cyclically interleaved or penetrated, and in these cases, the application would need to clip the primitives against each other to generate a definitive order.

The application can submit the transparent primitives multiple times with a dual depth test to render the transparent surfaces one layer at a time. A layer is the set of farthest transparent primitives (or parts there of) that are in front of the nearest opaque primitives. After each layer is rendered, it is incorporated into the opaque primitives for the next pass. Subsequent layers move closer to the eye position. This technique is called depth peeling. Alternatively, it can be implemented with subsequent layers moving farther away from the eye; however, this requires a triple depth test and is more expensive to render, but has the advantage of terminating early once a certain number of layers has been rendered (extra layers add very little to the fidelity of the image).

Binning has the following benefits:

Reduces the rendering bandwidth by keeping all the depth and color data on-chip except for the final write to memory once a bin has been processed. For aliased rendering, the frame buffer bandwidth is, therefore, a constant one-pixel write per frame irrespective of overdraw or the amount of alpha-blending or depth read-modify-write operations. Also, note that in many cases, there is no need to save the depth buffer to memory, thereby halving the bandwidth. For full scene antialiasing (FSAA), this is even more dramatic as approximately 4.times. more reads and writes occur while rendering (assuming 4-sample FSAA). The down-sampling also is done from on-chip memory so the bandwidth demand remains the same as in the non-FSAA case. Some of these bandwidth savings are lost due to the bandwidth needed to build and parse the bin data structures, and this will be exacerbated with FSAA as the caches will cover a smaller area of screen (the database will be traversed more times). The over all bandwidth saving is scene and triangle-size dependent. Fragment computations or texturing is saved by using deferred rendering. A bin is traversed twice--on the first (but simpler pass), the visibility buffer is set up, and no color calculations are done. On the second pass, only those fragments determined to be visible are rendered--effectively reducing the opaque depth complexity to 1. As most games have an average depth complexity >3, this can give up to a 3.times. or more boost to the apparent fill rate (depending on the original primitive submission order). Less FSAA work. During the first pass of the deferred rendering operation, the location of edges (geometric and inferred due to penetrating faces) can be ascertained, and only those sub-tiles holding edges need to have the multi-sample depth values calculated and the color replicated to the covered sample points. This saves cycles to update the multi-sample buffers and any program cost for alpha-blending. Stochastic super sampling FSAA. The contents of a bin are rendered multiple times with the post-transformed primitives being jittered per pass. This is similar to accumulation buffering at the application level but occurs without any application involvement (motion blur and depth of field effects cannot be done). It has superior quality and smaller memory footprint than multi-sample FSAA; however, it is slower as the color is computed at each sample point (unlike multi-sample where one color per fragment is calculated). The T&L and rasterization work proceed in parallel with no fine grain dependencies so a bottle neck in one part will not stall the other. This will still happen at frame granularity, but within a frame, the work flow will be much smoother. Memory footprint can be reduced when the depth buffer does not need to be saved to memory. With FSAA, the depth and color sample buffers are rarely needed after the filtered color has been determined. Note that as all the memory is virtual, space can be allocated for these buffers (in case of a premature flush), but the demand will only be made on the working set if a flush occurs. Note that the semantics of OpenGL can make this hard to use. State Tracking Methodology

Redundant changes of tracked state issued by the application are filtered out by comparing the new state value with the old value, and if they are the same, no update is made.

State changes are collected in on-chip memory and added to the bin if the state vector associated with the bin is out of date. State changes within a bin are done incrementally in temporal order, and a bin is only brought up to date prior to adding in a new primitive if the state has changed since the last primitive was added to it.

To determine when the state in a bin needs to be updated, each item of state has a timestamp associated with it, and this is updated whenever that state is received. The state items are sorted in temporal order with the most recent state items first. Each bin has a timestamp to indicate when it was last updated. When a bin's state is to be updated, the bin's timestamp is compared against the state timestamp, working from most recent backwards, and the more recent state is copied to the bin. The state items will be added in reverse of the temporal order that they were updated in (which should not cause a problem), and once this has completed, the bin's timestamp is updated. The timestamp is reset at the start of every frame and incremented on the first primitive after a series of state changes. The sorting is done by a double-linked list, and new state items are moved to the head.

In addition to the above-listed advantages, the disclosed innovations, in various embodiments, also provide one or more of at least the following advantages: Increased speed. Increased efficiency. Compatible with OpenGL and similar AGI's.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed inventions will be described with reference to the accompanying drawings, which show important sample embodiments of the invention and which are incorporated in the specification hereof by reference, wherein:

FIG. 1 is an example of a state tracking table and bin table of the present inventions.

FIGS. 2A and 2B show a flowchart of the rendering process utilized by the methods and systems of the present inventions.

FIG. 1A is a block diagram of the P20 core architecture.

FIG. 1B is a block diagram of T&L Subsystem 1A100.

FIG. 1C is a block diagram of Binning Subsystem 1A110.

FIG. 1D is a block diagram of WID Subsystem 1A150.

FIG. 1E is a block diagram of Visibility Subsystem 1A160.

FIG. 1F is a block diagram of the first half of Fragment Subsystem 1A170.

FIG. 1G is a block diagram of the second half of Fragment Subsystem 1A170.

FIG. 1H is a block diagram of SD Subsystem 1A180.

FIG. 1I is a block diagram of Pixel Subsystem 1A190.

FIG. 1J is an overview of a computer system, with a rendering subsystem, which advantageously incorporates the disclosed graphics architecture.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The numerous innovative teachings of the present application will be described with particular reference to the presently preferred embodiment (by way of example, and not of limitation).

P20 Architecture

The following description gives details of a sample embodiment of the preferred rendering accelerator chip (referred to as "P20" in the following document, although not all details may apply to every chip revision marketed as P20). The following description gives an overview of the P20 Core Architecture and largely ignores other important parts of P20 such as GPIO and the Memory subsystem.

P20 is an evolutionary step from P10 and extends many of the ideas embodied in P10 to accommodate higher performance and extensions in APIs, particularly OpenGL 2 and DX9.

The main functional enhancements over P10 are the inclusion of a binning subsystem and a fragment shader targeted specifically at high level language support.

The P20 architecture is a hybrid design employing fixed-function units where the operations are very well defined and programmable units where flexibility is needed. No attempt has been made to make it backwards compatible, and a major rewrite of the driver software is expected. (The architecture will be less friendly towards software--changes in the API state will no longer be accomplished by setting one or more mode bits in registers, but will need a new program to be generated and downloaded when state changes. More work is pushed onto software to do infrequent operations such as aligning stipple or dither patterns when a window moves.)

General Performance Goals

The general raw performance goals are:

64 fragment/cycle WID/scissor/area stipple processing; 64 fragments/cycle Z failure (visibility testing); 16 fragments/cycle fill rate at 32 bpp (depth buffered with flat or Gouraud shading); 6 fragments/cycle for single texture (trilinear) operations; 3 cycle single pixel Gouraud shaded depth buffered triangle rate; 4-sample multi-sample operation basically for free; and 400 MHz operational frequency (This frequency assumes a 0.13 micron process. A 200 MHz design speed at 0.18 micron scales by 25% going to a 0.15 micron process, and this scales again by 25% going to 0.13 according to TSMC.).

The architecture has been designed to allow a range of performance trade-offs to be made, and the first-instantiated version will lie somewhere in the middle of the performance landscape.

Isochronous Operation

Isochronous operation is where some type of rendering is scheduled to occur at a specific time (such as during frame blanking) and has to be done then irrespective of what ever other rendering may be in progress. GDI+/Longhorn is introducing this notion to the Windows platform. The two solutions to this problem are to have an independent unit to do this so the main graphics core does not see these isochronous commands or to allow the graphics core to respond to pre-emptive multi-tasking.

The first solution sounds the simplest and easiest to implement, and probably is, if the isochronous stream were limited to simple bits; however, the functionality does not have to grow very much (fonts, lines, stretch blits, color conversion, cubic filtering, video processing, etc.) before this side unit starts to look more and more like a full graphics core.

The second solution is future proof and may well be more gate-efficient as it reuses resources already needed for other things. However, it requires an efficient way to context switch, preferably without any host intervention, and a way to suspend the rasterizer in the middle of a primitive.

Fast context switching can be achieved by duplicating registers and using a bit per Tile message to indicate which context should be used or a command to switch sets. This is the fastest method but duplicating all the registers (and WCS) will be very expensive and sub setting them may not be very future proof if a register is missed out that turns out to be needed.

As any context-switchable state flows through into the rasterizer, part of the pipeline that it goes through is the Context Unit. This unit caches all context data and maintains a copy in the local memory. A small cache is needed so that frequently updating values such as mode registers do not cause a significant amount of memory traffic. When a context switch is needed, the cache is flushed, and the new context record read from memory and converted into a message stream to update downstream units. The message tags will be allocated to allow simple decode and mapping into the context record for both narrow and wide-message formats. Some special cases on capturing the context, as well as restoring it, will be needed to look after the cases where keyhole loading is used, for example during program loading.

Context switching the rasterizer part way through a primitive is avoided by having a second rasterizer dedicated to the isochronous stream. This second rasterizer is limited to just rectangles as this fulfils all the anticipated uses of the isochronous stream. (If the isochronous stream wants to draw lines, for example, then the host software can always decompose them into tiles and send the tile messages just as if the rasterizer had generated them.)

There are some special cases where intermediate values (such as the plane equations) will need to be regenerated, and extra messages will be sent following a context switch to force these to occur. Internal state that is incremented, such as glyph position and line stipple position, needs to be handled separately.

T&L context is saved by the Bin Manager Unit and restored via the GPIO Context Restore Unit. The Bin Manager, Bin Display, Primitive Setup and Rasterizer units are saved by the Context Unit and restored via the GPIO Context Restore Unit.

Memory Bandwidth

Memory bandwidth is a crucial design factor, and every effort has been made to use the bandwidth effectively; however, there is no substitute for having sufficient bandwidth in the first place. A simple calculation shows that 32 bits per pixel, Z-buffered, alpha-blended rendering takes 16 bytes per fragment so a 16 fragment-per-cycle architecture running at 400 MHz needs a memory bandwidth of 102 GB/s. Add in memory inefficiencies (page breaks, refresh) and video refresh (fairly insignificant in comparison to the rendering bandwidth), and this probably gets up at 107 GB/s or so. (With an 8-filter pipe system, turning on textures will decrease this figure to approximately 51 GB/s because the number of fragments per cycle will halve. Textures can be stored compressed so a 32-bit texture will take one byte of storage so the increase in bandwidth due to texture fetches will be reduced (5 bytes were assumed in the calculations--4 bytes from the high resolution texture map per fragment and 4 bytes per four fragments for the low resolution map)).

The memory options are as follows: DDR2 SDRAM running at 500 MHz has a peak bandwidth of 16 GB/s when the memory is 128-bits wide, or 32 GB/s when 256-bits wide. There are no real impediments to using this type of memory, but increasing the width beyond 256 bits is not feasible due to pin count and cost. Embedded DRAM or 1T RAM. eRAM is the only technology that can provide these very high bandwidth rates by enabling very wide memory configurations. eRAM comes with a number of serious disadvantages: There is a high premium on the cost of the chips as they require more manufacturing steps (for eDRAM); they are foundry-specific, and with some foundries, the logic speed suffers. Only a modest amount of eRAM (say 8 MBytes) can fit onto a chip economically. This is far short of what is needed, particularly with higher-resolution and deep-pixel displays. eRAM really needs to be used as a cache (so it is back to relying on high locality of reference and reuse of pixel data to give a high apparent bandwidth to an economical, external memory system). Change the rules. If the screen were small enough to fit into an on-chip cache (made from eRAM or more traditional RAM), then most of this rendering bandwidth will be absorbed internally. Clearly, the screen cannot be made small enough or the internal caches big enough, but by sorting the incoming geometry and state into small cache-sized, screen-aligned regions (called bins, buckets, chunks and, confusingly, tiles in the literature) and rendering each bin in turn allow this to be achieved. This is accomplished by spending the memory bandwidth in a different way (writing and reading the bin database) so provided that the database bandwidth is less than the rendering bandwidth and can be accommodated by the external memory bandwidth, the goal has been effectively achieved.

P20 uses an (optional) binning style architecture together with state of the art DDR2 memory to get the desired performance. Binning also offers some other interesting opportunities that will be described later.

Binning

Binning works by building a spatially-sorted scene description before rendering to allow the rendering of each region (or bin) to be constrained to fit in the caches. The building of the bin database for one frame occurs while the previous frame is rendered. (Frame means more than just the displayed frame. Intermediate `frames`, such as generated by render-to-texture operations, also are included in this definition. Any number of frames may be held in the bin data structures for subsequent rendering; however, it is normal to buffer only one final display frame to reserve interactivity and reduce the transport delay in an application or game.)

Binning has the following benefits: Reduces the rendering bandwidth by keeping all the depth and color data on-chip except for the final write to memory once a bin has been processed. For aliased rendering, the frame buffer bandwidth is, therefore, a constant one-pixel write per frame irrespective of overdraw or the amount of alpha-blending or depth read-modify-write operations. Also, note that in many cases, there is no need to save the depth buffer to memory, thereby halving the bandwidth. For FSAA, this is even more dramatic as approximately 4.times. more reads and writes occur while rendering (assuming 4-sample FSAA). The down-sampling also is done from on-chip memory so the bandwidth demand remains the same as in the non-FSAA case. Some of these bandwidth savings are lost due to the bandwidth needed to build and parse the bin data structures, and this will be exacerbated with FSAA as the caches will cover a smaller area of screen (the database will be traversed more times). The over all bandwidth saving is scene and triangle-size dependent. Fragment computations or texturing is saved by using deferred rendering. A bin is traversed twice--on the first (but simpler pass), the visibility buffer is set up, and no color calculations are done. On the second pass, only those fragments determined to be visible are rendered--effectively reducing the opaque depth complexity to 1. As most games have an average depth complexity >3, this can give up to a 3.times. or more boost to the apparent fill rate (depending on the original primitive submission order). Less FSAA work. During the first pass of the deferred rendering operation, the location of edges (geometric and inferred due to penetrating faces) can be ascertained, and only those sub-tiles holding edges need to have the multi-sample depth values calculated and the color replicated to the covered sample points. This saves cycles to update the multi-sample buffers and any program cost for alpha-blending. Order Independent Transparency. Each bin region has a pair of bin buffers--one holds the opaque primitives and the other holds the transparent primitives. After the opaque bin is rendered, the transparent bin is rendered multiple times until all the transparency layers have been resolved. The layers are resolved in a back to front order, and successive layers touch fewer and fewer fragments. Stochastic super sampling FSAA. The contents of a bin are rendered multiple times with the post-transformed primitives being jittered per pass. This is similar to accumulation buffering at the application level but occurs without any application involvement (motion blur and depth of field effects cannot be done). It has superior quality and smaller memory footprint than multi-sample FSAA; however, it is slower as the color is computed at each sample point (unlike multi-sample where one color per fragment is calculated). The T&L and rasterization work proceed in parallel with no fine grain dependencies so a bottle neck in one part will not stall the other. This will still happen at frame granularity, but within a frame, the work flow will be much smoother. Memory footprint can be reduced when the depth buffer does not need to be saved to memory. With FSAA, the depth and color sample buffers are rarely needed after the filtered color has been determined. Note that as all the memory is virtual, space can be allocated for these buffers (in case of a premature flush), but the demand will only be made on the working set if a flush occurs. Note that the semantics of OpenGL can make this hard to use.

The bin database holds the post-transformed primitive data and state. Only primitives that have passed clipping and culling will be added to the database, and great care is taken to ensure this data is held in a compact format with a low build and traversal cost.

However, if there is not enough memory to hold the bin data structures, then two portions of the memory are allocated: one for state and primitive information and the other for vertex data. Both regions can be 256 MB in size. It is unlikely, therefore, that the bins will need to be prematurely flushed before all the data has been seen. Reserving such large amounts of memory, however, may be problematic in some systems. This memory is virtual memory. Therefore, in these extreme scenes, performance will gradually degrade (as pages are swapped out of on-card memory), but all the algorithms and optimizations will continue. Nevertheless, the problem of running out of memory on the ultra-extreme scenes, or maybe because less generous state/primitive and vertex buffers have been allocated, must be addressed.

When the buffers overflow, the scene is effectively rendered in several `passes`, and the memory footprint savings is lost, but most of the bandwidth savings still remain. For each pass, the results of the previous pass need to be loaded, and the results of the current pass saved. The rendering bandwidth requirement for the depth and color buffers is, therefore, #pixels*((#passes*2)-1)*bytes per pixel for depth and color. Therefore, provided each pass holds a reasonable amount of geometry, there is still large savings. Clearly, depth complexity plays an important role in this, but on complex scenes that will overflow the bin data structure buffers, there will usually be high-depth complexity.

When there is premature flushing, the order-independent binning and stochastic super-sampling algorithms break as they rely on having all the scene present before they start. A premature flush also will disable edge tracking so the correct image will be generated, albeit at a lower performance.

A block diagram for the core of P20 is shown in FIG. 1A. Some general observations: General control, register loading, and synchronising internal operations are all done via the message stream. The message stream, for the most part, does not carry any vertex parameter data (other than the coordinate data). The message stream does not carry any pixel data except for upload/download data and fragment coverage data. The private data paths give more bandwidth and can be tailored to the specific needs of the sending and receiving units. The Fragment Subsystem can be thought of as working in parallel but is, in fact, physically connected as a daisy chain to make the physical layout easier. GPIO

There are two independent command streams--one servicing the GP stream (for 3D and general 2D commands), and one servicing the Isochronous stream. The isochronous command unit has less functionality as it does not need, for example, to support vertex arrays.

GPIO performs the following distinct operations:

Input DMA

The command stream is fetched from memory (host or local as determined by the page tables) and broken into messages based on the tag format. The message data is padded out to 128 bits, if necessary, with zeros, except for the last 32 bits which are set to floating point 1.0. (This allows the short hand formats for vertex parameters to be handled automatically.) The DMA requests can be queued up in a command FIFO or can be embedded into the DMA buffer itself, thereby allowing hierarchical DMA (to two levels). The hierarchical DMA is useful to pre-assemble common command or message sequences.

Circular Buffers

The circular buffers provide a mechanism whereby P20 can be given work in very small packets without incurring the cost of an escape call to the operating system. These escape calls are relatively expensive so work is normally packaged up into large amounts before being given to the graphics system. This can result in the graphics system being idle until enough work has accumulated in a DMA buffer, but not enough to cause it to be dispatched to the obvious detriment of performance. The circular buffers are preferably stored in local memory and mapped into the ICD, and chip resident write pointer registers are updated when work has been added to the circular buffers (this does not require any operating system intervention). When a circular buffer goes empty, the hardware will automatically search the pool of circular buffers for more work and instigate a context switch if necessary.

There are 16 circular buffers, and the command stream is processed in an identical way to input DMA, including the ability to `call` DMA buffers.

Vertex Arrays

Vertex arrays are a more compact way of holding vertex data and allow a lot of flexibility on how the data is laid out in memory. Each element in the array can hold up to 16 parameters, and each parameter can be from one to four floats in size. The parameters can be held consecutively in memory or held in their own arrays. The vertex elements can be accessed sequentially or via one or two-index arrays.

Vertex Cache Control for Indexed Arrays

When vertex array elements are accessed via index arrays and the arrays hold lists of primitives (lines, triangles or quads, independent or strips), then frequently the vertices are meshed in some way that can be discovered by comparing the indices for the current primitive against a recent history of indices. If a match is found, then the vertex does not need to be fetched from memory (or indeed processed again in the Vertex Shading Unit), thus saving the memory bandwidth and processing costs. The 16 most recent indices are held.

Output DMA

The output DMA is mainly used to load data from the core into host memory. Typical uses of this are for image upload and returning current vertex state. The output DMA is initiated via messages that pass through the core and arrive via the Host Out Unit. This allows any number of output DMA requests to be queued.

Shadow Cache

The shadow cache will keep a copy of the input command stream in memory so it can be reused without an explicit copy. This helps caching of models in on-card memory behind the application's back, particularly when parts of the model are liable to change.

Format Conversion

The Pack and UnPack units provide programmable support for format conversion during download and upload of pixel data.

T&L Subsystem

Transform and Lighting Subsystem 1A100 is shown in FIG. 1B.

The main thing to note is that the clipping and culling can be done before or after the vertex shading operation depending on Geometry Router Unit 1B103 setting. Doing the clipping and culling prior to an expensive shading operation can, in some cases, avoid doing work that would be later discarded. A side effect of the cull operation is that the face direction is ascertained so only the correct side in two-sided lighting needs be evaluated. (This is handled automatically and is hidden from the programmer. Silhouette vertices (i.e. those that belong to front and back facing triangles) are processed twice.)

Vertex Parameter Unit 1B101's main tasks are to track current parameter values (for context switching and Get operations), remap input parameters to the slots a vertex shader has been compiled to expect them in, assist with color material processing, and parameter format conversion to normalized floating point values.

Vertex Transformation Unit 1B102 transforms the incoming vertex position using a 4.times.4 transformation matrix. This is done as a stand alone operation outside of Vertex Shading Unit 1B106 to allow clipping and culling to be done prior to vertex shading.

The Geometry Router Unit 1B103 reorders the pipeline into one of two orders: Transform.fwdarw.Clipping.fwdarw.Shading.fwdarw.Vertex Generator or Transform.fwdarw.Shading.fwdarw.Clipping.fwdarw.Vertex Generator so that expensive shading operations can be avoided on vertices that are not part of visible primitives.

Cull Clipping Unit 1B104 calculates the sign of the area of a primitive and culls it (if so enabled). The primitive is tested against the view frustum and (optionally) user-clipping planes and discarded if it is found to be out of view. In view, primitives pass unchanged. The partially in-view primitives are (optionally) guard band-clipped before being submitted for full clipping. The results of the clipping process are the barycentric coordinates for the intermediate vertices.

Vertex Shading Unit 1B106 is where the lighting and texture coordinate generation are done using a user-defined program. The programs can be 1024 instructions long, and conditionals, subroutines, and loops are supported. The matrices, lighting parameters, etc. are held in a 512 Vec4 Coefficient memory. Intermediate results are stored either in a 64-deep vec2 memory or an 8-deep scalar memory, providing a total of 136 registers. These registers are typeless but are typically used to store 36-bit floats. The vertex input consists of 24 Vec4s and are typeless. (One parameter is identified as the trigger parameter, and this is the last parameter for a vertex.) The vertex results are output as a coordinate and up to 16 Vec4 parameter results. The parameters are typeless, and their interpretation depends on the program loaded into Fragment Shading Unit 1F171.

Vertices are entered into the double-buffered input registers in round robin fashion. When 16 input vertices have been received or an attempt is made to update the program or coefficient memories, the program is run. Non-unit messages do not usually cause the program to run, but they are correctly interleaved with the vertex results on output to maintain temporal ordering.

Vertex Shading Unit 1B106 is implemented as a 16-element SIMD array, with each element (VP) working on a separate vertex. Each VP consists of two FP multipliers, an FP adder, a transcendental unit, and an ALU. The floating point operations are done using 36-bit numbers (similar to IEEE but with an extra 4 mantissa bits). Dual mathematical instructions can be issued so multiple paths exist between the arithmetic elements, the input storage elements, and the output storage elements.

Vertex Generator Unit 1B105 holds a 16-entry vertex cache and implements the vertex machinery to associate the stream of processed vertices with the primitive type. When enough vertices for the given primitive type have been received, a GeomPoint, GeomLine, or GeomTriangle message is issued. Clipped primitives have their intermediate vertices calculated here using the barycentric coordinates from clipping and the post-shading parameter data. Flat shading, line stipple, and cylindrical texture wrapping are also controlled here.

Viewport Transform Unit 1B107 perspectively divides the (selected) vertex parameters, and viewport maps the coordinate data.

Polygon Mode Unit 1B108 decomposes the input triangle or quad primitives into points and/or lines as needed to satisfy OpenGL's polymode processing requirements.

The context data for the T&L subsystem is stored in the context record by Bin Manager Unit 1A113.

Binning Subsystem

Binning Subsystem 1A110 is largely passive when binning is not enabled, and the messages just flow through; however, it does convert the coordinates to be screen relative. Stippled lines are decomposed, and vertex parameters are still intercepted and forwarded to the PF Cache 1C118 to reduce message traffic through the rest of the system. The following description assumes binning is enabled.

Binning Subsystem 1A110 is shown in the FIG. 1C.

Bin Setup Unit 1C111 takes the primitive descriptions (the Render* messages) together with the vertex positions and prepares the primitive for rasterization. For triangles, this is simple as the triangle vertices are given, but for lines and points, the vertices of the rectangle or square to be rasterized must be computed from the input vertices and size information. Stippled lines are decomposed into their individual segments as these are binned separately. Binning and rasterization occur in screen space so the input window-relative coordinates are converted to screen space coordinates here.

Bin Rasterizer Unit 1C112 takes the primitive description prepared by the Bin Setup Unit and calculates the bins that a primitive touches. A bin can be viewed as a `fat` pixel as far as rasterization is concerned as it is some multiple of 32 pixels in width and height. The rasterizer uses edge functions and does an inside test for each corner of the candidate bin to determine if the primitive touches it. The primitive and the group of bins that it touches are passed to Bin Manager Unit 1C113 for processing. The bin seeking accurately tracks the edges of the primitive for aliased rendering; however, antialiased rendering can sometimes include bins not actually touched by the primitive (this is a slight inefficiency but doesn't cause any problems downstream).

Bin Manager Unit 1C113 maintains a spatial database in memory that describes the current frame being built while Bin Display Unit 1C114 is rendering the previous frame. All writes to memory go via Bin Write Cache 1C115. The database is divided between a Vertex Buffer and a Bin Record Buffer. The vertex buffer holds the vertex data (coordinate and parameters), and these are appended to the buffer whenever they arrive. The buffer works in a pseudo circular buffer fashion and is used collectively by all the bins. The Bin Record Buffer is a linked list of bin records with one linked list per bin region on the screen (up to 256) and holds state data as well as primitive data. A linked list is used because the number of primitives per bin region on the screen can vary wildly. When state data is received, it is stored locally until a primitive arrives. When a primitive arrives, the bin(s) is checked to see if any state has changed since the last primitive was written to the bin, and the bin updated with the changed state. Compressed pointers to the vertices used by a primitive are calculated and, together with the primitive details, are appended to the linked list for this bin.

Bin Manager Unit 1C113 only writes to memory, and Bin Write Cache 1A115 handles the traditional cache functions to minimize memory bandwidth and read/modify/write operations as many of the writes will only update partial memory words.

Bin Manager Unit 1C113 also can be used as a conduit for vertex data to be written directly to memory to allow the results of one vertex shader to be fed back into a second vertex shader and can be used, for example, for surface tessellation. The same mechanism can also be used to load memory with texture objects and programs.

Bin Display Unit 1C114 will traverse the bin record linked list for each bin and parse the records, thereby recreating the temporal stream of commands this region of the screen would have seen had there been no binning. Prior to doing the parsing, the initial state for the bin is sent downstream to ensure all units start in the correct state. Parsing of state data is simple--it is just packaged correctly and forwarded. Parsing primitives is more difficult as the vertex data needs to be recovered from the compressed vertex pointers and sent on before the primitive itself. Only the coordinate data is extracted at this point--the parameter data is handled later, after primitive visibility has been determined. A bin may be parsed several times to support deferred rendering, stochastic super sampling, and order-independent transparency. Clears and multi-sampling filter operations can also be done automatically per bin.

The second half of the binning subsystem is later in the pipeline, but is described now.

Overlap Unit 1C116 is basically a soft FIFO (i.e. if the internal hardware FIFO becomes full, it will overflow to memory) and provides buffering between Visibility Subsystem 1A160 and Fragment Subsystem 1A170 to allow the visibility testing to run on ahead and not get stalled by fragment processing. This is particularly useful when deferred rendering is used as the first pass produces no fragment processing work so could be hidden under the second pass of the previous bin. Tiles are run-length encoded to keep the memory bandwidth down.

The Parameter Fetch (PF) Units will fetch the binned parameter data for a primitive if, and only if, the primitive has passed visibility testing (i.e. at least one tile from the primitive is received in the PF Subsystem). This is particularly useful with deferred rendering where in the first pass everything is consumed by the Visibility Subsystem. The PF Units are also involved in loading texture object data (i.e. the state to control texture operations for one of the 32 potentially active texture maps) and can be used to load programs from memory into Pixel Subsystem 1A190 (to avoid having to treat them as tracked state while binning).

PF Address Unit 1C117 calculates the address in memory where the parameters for the vertices used by a primitive are stored and makes a request to PF Cache 1C118 for that parameter data to be fetched. The parameter data will be passed directly to PF Data Unit 1C119. It also will calculate the addresses for texture objects and pixel programs.

PF Data Unit 1C119 will convert the parameter data for the vertices into plane equations and forward these to Fragment Subsystem 1A170 (over their own private connection). For 2D rendering, planes can also be set up directly without having to supply vertex data. The texture object data and pixel programs also are forwarded on the message stream.

Rasterizer Subsystem

The Rasterizer subsystem consists of a Primitive Setup Unit, a Rasterizer Unit and a Rectangle Rasterizer Unit.

Rectangle Rasterizer Unit 1A120, as the name suggests, will only rasterize rectangles and is located in the isochronous stream. The rasterization direction can be specified.

Primitive Setup Unit 1A130 takes the primitive descriptions (the Render* messages) together with the vertex positions and prepares the primitive for rasterization. This includes calculating the area of triangles, splitting stippled lines (aliased and antialiased) into individual line segments (some of this work has already been done in Bin Setup Unit 1C111), converting lines into quads for rasterization, converting points into screen-aligned squares for rasterization and AA points to polygons. Finally, it calculates the projected x and y gradients from the floating point coordinates to be used elsewhere in the pipeline for calculating parameter and depth gradients for all primitives.

The xy coordinate input to Rasterizer Unit 1A140 is 2's complement 15.10 fixed point numbers. When a Draw* command is received, the unit will then calculate the 3 or 4 edge functions for the primitive type, identify which edges are inclusive edges (i.e. should return inside if a sample point lies exactly on the edge; this needs to vary depending on which is the top or right edge so that butting triangles do not write to a pixel twice) and identify the start tile.

Once the edges of the primitive and a start tile are known, the rasterizer seeks out screen-aligned super tiles (32.times.32 pixels) which are inside the edges or intersect the edges of the primitive. (In a dual P20 system, only those super tiles owned by a rasterizer are visited.) Super tiles that pass this stage are further divided into 8.times.8 tiles for finer testing. Tiles that pass this second stage will be either totally inside or partially inside the primitive. Partial tiles are further tested to determine which pixels in the tile are inside the primitiv


Free Web Sudoku Puzzles.
Solve with your browser.
    5     8     9
9   1            
          7   6 4
      4   3   2  
1     8 2 9     3
  8   1   5      
4 3   2          
            6   1
5     7     3    
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!