Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

aspen nightlife the ultimate taxi
Category:
Travel  

Ideas for Deck Designs
Category:
Home And Family  

How Your Bank Can Save You Money
Category:
Marketing  

Best Destinations For Florida Family Vacations
Category:
Travel  

The Success of British Airways
Category:
Travel  

How Does Cosmetic Dentistry Work
Category:
Health / Fitness  

Essential Elements the Perfect Opportunity must Possess
Category:
Marketing  

Two Important Questions Every Network Marketer Must Know How To ...
Category:
Business  

Selling software online How do you present your software
Category:
Marketing  

Important Information on Sleep Disorders
Category:
Health / Fitness  

Stamps Collecting has Never Been So Easy
Category:
Entertainment / Television  

Myths and Misconceptions About Starting an Online Business
Category:
Marketing  

Break Into the High Flying Crowd
Category:
Marketing  

Attending Camp with a Friend
Category:
Sports  

Coping with the pain
Category:
Home And Family  

Perinate Herpes Simplex Viral Infection
Category:
Health / Fitness  

Off Line Marketing Secrets to Getting More Customers
Category:
Marketing  

Baby Shower Poems How to Write Baby Poems Like a Pro
Category:
Home And Family  

Simple Ways To Debt Relief
Category:
Finance / Investment  

From Domain s Purchase To The Real Gain
Category:
Business  

South Africa s Convenience Store Market A Toddler Amongst Sprint...
Category:
Business  

Does Your Online Copy Talk
Category:
Marketing  

Your Home Is Your Sanctuary
Category:
Home And Family  

Acne Prevention Do and Dont s
Category:
Health / Fitness  

Sarcopenia As we Age Muscle Loss Occurs
Category:
Health / Fitness  

Looking For A Home Based Business Opportunity K I S S
Category:
Business  

Cialis
Category:
Self Help  

How To Drop Your Weight and Become Healthier Using These 7 Every...
Category:
Health / Fitness  

EMPLOYEE ENGAGEMENT AND MENTAL HEALTH
Category:
Business  

Eating Out and Loosing Weight
Category:
Health / Fitness  

The Surefire Increase To Your Traffic From Yesterday
Category:
Marketing  

When To Use A Collection Agency
Category:
Finance / Investment  

Pakistan Pharma Industry going International
Category:
Business  

6 Secret Signs of an Easy Home Business
Category:
Business  

How old should you be before buying a loft bed
Category:
Home And Family  

Using Autoresponders To Multiply Marketing Power Save Time
Category:
Marketing  

Health Insurance Quotes
Category:
Finance / Investment  

Informative Free Report Guides You To Antenna Cell Flashing Phon...
Category:
Business  

Cruise stocks a risk vs reward analysis
Category:
Business  

Instant Lottery Tickets How To Make Money With Losing Lottery Ti...
Category:
Entertainment / Television  

Bird Flu Vaccines What is Taking So Long
Category:
Health / Fitness  

A Solid Choice for Business cards
Category:
Business  

Secured loans for unemployed tone down the bitterness of unemplo...
Category:
Finance / Investment  

Cashing in on Coca Cola Memorabilia New Ideas for Old Art
Category:
Home And Family  

10 Skin Care Tips Look Stunning in Your 40s
Category:
Health / Fitness  

5 Ways to Manage your Diet for Diabetes
Category:
Health / Fitness  

Marquis Theater A Modern Musical Experience
Category:
Entertainment / Television  

Get Online Knowledge About Alcoholism Treatment
Category:
Health / Fitness  

Kissing Tips Make a Kiss More Passionate
Category:
Self Help  

Make Your Office a Paper Free Zone
Category:
Business  

How to Submit Articles on the Internet
Category:
Business  

Mutual Funds and Their Risks
Category:
Business  

The Cost of Diabetes and Free Diabetic Supplies
Category:
Health / Fitness  

When You Go On Vacation This Summer
Category:
Travel  

6 Simple Ways to Create the Best Most Fantastic Valentines
Category:
Home And Family  

Type of computer games
Category:
Entertainment / Television  

Pregnancy and Diabetes What You Should Know
Category:
Health / Fitness  

Chew slowly and digest the rules
Category:
Business  

An Introduction to CD Mastering
Category:
Hobbies / Pastimes  

WiMAX to constitute a major share of wireless broadband market
Category:
Marketing  

Acne Products The Different Categories
Category:
Home And Family  

Trading the Forex Markets with the Forex Trading Machine
Category:
Finance / Investment  

Energy Savings by Use of the Correct Spray Nozzle
Category:
Business  

Digging Deep To Get The Most From RSS Technology for Marketing
Category:
Marketing  

If You Want To Be Successful in Trading There s Only One Thing Y...
Category:
Finance / Investment  

Choosing the Right Wedding Music
Category:
Home And Family  

The Truth About Vitamin Deficiencies
Category:
Health / Fitness  

Online Casino Gamble
Category:
Hobbies / Pastimes  

Plasma Television Myths and Facts
Category:
Home And Family  

Generate MEANINGFUL Traffic to Your Site
Category:
Marketing  

Understanding Legal Advice
Category:
Real Estate  

Where adsense should appear
Category:
Marketing  

The process of buying a new home from a home builder
Category:
Real Estate  

How to sell property to overseas property buyers
Category:
Finance / Investment  

SELLING INFORMATION PRODUCTS What Sells What Doesn t
Category:
Marketing

Clip-less rasterization using line equation-based traversal Number:6,765,575 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Clip-less rasterization using line equation-based traversal

Abstract: Clip-less rasterization is provided by a plurality of operations. First, a primitive is received that is defined by a plurality of vertices. Each of such vertices includes a W-value. Thereafter, an area is identified based on the W-values. Such area is representative of a portion of a display to be drawn corresponding to the primitive.

Patent Number: 6,765,575 Issued on 07/20/2004 to Voorhies,   et al.


Inventors: Voorhies; Douglas A. (Menlo Park, CA), Foskett; Nicholas J. (Mountain View, CA), Papakipos; Matthew N. (Palo Alto, CA)
Assignee: Nvidia Corporation (Santa Clara, CA)
Appl. No.: 09/455,728
Filed: December 6, 1999


Current U.S. Class: 345/441
Current International Class: G06T 11/40 (20060101)
Field of Search: 345/441,442,443


References Cited [Referenced By]

U.S. Patent Documents
5694143 December 1997 Fielder et al.
5838337 November 1998 Kimura et al.
5977997 November 1999 Vainsencher
6000027 December 1999 Pawate et al.
6057855 May 2000 Barkans
6198488 March 2001 Lindholm et al.
6417858 July 2002 Bosch et al.
6518974 February 2003 Taylor et al.
Foreign Patent Documents
0690430 Jan., 1996 EP
WO 98/28695 Jul., 1998 WO
WO 99/52040 Oct., 1999 WO

Other References

Olano et al., Triangle Scan Conversion using 2D homogeneous Coordinates, 1997 SIGGRAPH, Eurographics Workshop on Graphics Hardware,.* .
Niizeke et al., Projectively Invariant intersection Detections for Solid Modeling, 1994, ACM Press, pp. 277-299.* .
Blinn et al., Clipping using Homogeneous Coordinates, 1978, ACM Press, pp. 245-251.* .
Ivan E. Sutherland and Gary W. Hodgman,Rreentrant Polygon Clipping, Jan. 1974, Communications of the ACM, vol. 17 No. 1 pp. 32-42.* .
Marc Olano and Trey Greer; "Triangle Scan Coversion using 2D Homogeneous Coordinates"; 1997; SIGGRAPH/Eurographics Workshop..

Primary Examiner: Brier; Jeffery
Attorney, Agent or Firm: Silicon Valley IP Group, PC Zilka; Kevin J.

Parent Case Text



RELATED APPLICATIONS

The present application is related to applications entitled "Method, Apparatus and Article of Manufacture for Area Rasterization using Sense Points" which was filed on Dec. 6, 1999 under Ser. No. 09/455,305, "Method, Apparatus and Article of Manufacture for Boustrophedonic Rasterization" which was filed on Dec. 6, 1999 under Ser. No. 09/454,505, "Transform, lighting and rasterization system embodied on a single semiconductor platform" which was filed on Dec. 6, 1999 under Ser. No. 09/454,516, and issued under U.S. Pat. No. 6,198,488, "Method, Apparatus and Article of Manufacture for a Vertex Attribute Buffer in a Graphics Processor" which was filed on Dec. 6, 1999 under Ser. No. 09/454,525, "Method, Apparatus and Article of Manufacture for a Transform Module in a Graphics Processor" which was filed on Dec. 6, 1999 under Ser. No. 09/456,102, "Method and Apparatus for a Lighting Module in a Graphics Processor" which was filed on Dec. 6, 1999 under Ser. No. 09/454,524, and "Method, Apparatus and Article of Manufacture for a Sequencer in a Transform/Lighting Module Capable of Processing Multiple Independent Execution Threads" which was filed on Dec. 6, 1999 under Ser. No. 09/456,104, and which are all incorporated herein by reference in their entirety.
Claims



What is claimed is:

1. A method for rendering primitives, comprising: receiving a primitive defined by a plurality of vertices each including a W-value; and identifying an area based on the W-values, wherein the area is representative of a portion of a display to be drawn corresponding to the primitive; wherein the space in the area is identified by utilizing a plurality of region-defining points that define an enclosed polygonal region, and stepping the region-defining points with respect to the area.

2. The method as recited in claim 1, wherein the region-defining points are initially positioned on top of the display if a top vertex of the area is positioned outside boundaries of the display.

3. The method as recited in claim 1, wherein the region-defining points are initially positioned on a top vertex of the area if the top vertex is positioned inside boundaries of the display.

4. The method as recited in claim 1, and further comprising searching for a top point of the area based on signs of slopes of the primitive.

5. The method as recited in claim 4, wherein the searching includes determining whether a current point is outside the area utilizing a test involving line equations that define the primitive.

6. The method as recited in claim 5, wherein the searching further includes determining a direction in which to step, if it is determined that the current point is outside the area.

7. A computer program embodied on a computer readable medium for rendering primitives, comprising: a code segment for receiving a primitive defined by a plurality of vertices each including a W-value; and a code segment for identifying an area based on the W-values, wherein the area is representative of a portion of a display to be drawn corresponding to the primitive; wherein the space in the area is identified by utilizing a plurality of region-defining points that define an enclosed polygonal region, and stepping the region-defining points with respect to the area.

8. The computer program as recited in claim 7, wherein the region-defining points are initially positioned on top of the display if a top vertex of the area is positioned outside boundaries of the display.

9. The computer program as recited in claim 7, wherein the region-defining points are initially positioned on a top vertex of the area if the top vertex is positioned inside boundaries of the display.

10. The computer program as recited in claim 7, and further comprising a code segment for searching for a top point of the area based on signs of slopes of the primitive.

11. The computer program as recited in claim 10, wherein the searching includes determining whether a current point is outside the area utilizing a test involving line equations that define the primitive.

12. The computer program as recited in claim 10, wherein the searching further includes determining a direction in which to step, if it is determined that the current point is outside the area.
Description



FIELD OF THE INVENTION

The present invention relates generally to rasterizers and, more particularly, to the conversion of primitives defined by vertexes to equivalent images composed of pixel patterns without clipping the primitives.

BACKGROUND OF THE INVENTION

In a traditional computer graphics pipeline, drawn primitives such as lines or polygons sometimes extend from on-screen to off-screen, and a "clip stage" in the pipeline alters the shape of the primitives to clip them to include only the on-screen portion. The rasterizer stage then generates the pixels corresponding to the on-screen portion of the primitive. This "clip stage," however, often reduces the performance of the graphics pipeline because the computations involved are complex and highly variable.

There are several ways in which vertex coordinates of a primitive may extend beyond the portion that should be drawn. The most obvious way is extending beyond the top, bottom, left, and/or right edges of the screen, or window. FIG. 1 shows a triangle 100 clipped by the screen edges thus rendering a seven-sided polygon 102. It should be noted that in some prior art systems, a "guard band" is defined about an outer perimeter of the screen. By employing the guard band as the "cut-off," some primitives that extend beyond the screen, but not beyond the guard band may avoid clipping. By employing a guard band, clipping is thus reduced, but not completely avoided.

Yet another type of clipping involves the third "Z" dimension or, in other words, the dimension extending "into" or "out of" the screen. Most graphics systems clip primitives against "near" and "far" limits, or planes. This is primarily done to control the range of the Z coordinate, so it can be expressed with adequate precision. This, however, also can add two more imposed edges on a polygon.

Another consequence of not clipping is more obscure. Before drawing a 3-D triangle, it is projected on to a 2-D image plane. Commonly, the mathematical projection involves perspective, making nearby objects appear large and far objects appear smaller. In a 3-D Cartesian coordinate system with X, Y, and Z axes that has the viewer looking down the +Z axis, this perspective is often accomplished by dividing the 3-D X and Y coordinates by the Z coordinate. Thus objects twice as far away, or twice the distance along the Z axis, have only half the screen-space X and Y extent.

A problem arises when an object spans both the +Z and -Z half-spaces. Division by a negative Z can produce reasonable on-screen X and Y values. As a result, points with a negative Z may project upside down and backwards on to the screen similar to positive Z points. An example of this phenomenon is shown in FIG. 1A. As shown, an end 110 of a long triangle 112 is in front of the hypothetical "camera" or viewpoint 114, with +Z coordinates, while the another end 116 is behind the camera with a -Z coordinate. FIG. 1B shows that after projection, all three vertexes 118, 120 and 122 are on-screen when only two of them (118 and 120) are legitimate and the other one (122) is a projection from behind the camera.

Ordinarily, conventional clipping procedures would have eliminated the part of the triangle behind the camera, cutting off the -Z portion at the bottom of the screen. One example of avoiding the foregoing conventional clipping procedures may be found with reference to "Triangle Scan Conversion using 2D Homogeneous Coordinates" authored by Marc Olano and Trey Greer, published in the 1997 Siggraph/Eurographics Workshop, and which is incorporated herein by reference in its entirety. Such technique, however, fails to deliver a sizable increase in performance.

There is therefore a need for a rasterization system that can handle primitives with any post-transform vertex coordinates that extend beyond the edges of the screen for the purpose of avoiding standard clipping operations while increasing processing performance.

DISCLOSURE OF THE INVENTION

A method, apparatus and article of manufacture are provided for clip-less rasterization. First, a primitive, i.e. line, triangle, etc., is received that is defined by a plurality of vertices. Each of such vertices includes a W-value. Thereafter, an area is identified based on the W-values of the vertices. Such area is representative of a portion of a display to be drawn corresponding to the primitive. As will become apparent, the area enclosed by the primitive is not always the area to be rasterized.

The instant process employs a variable, W, that is commonly used for projection, i.e., for viewing objects in perspective. The variable W is a number that the other coordinates, X, Y and Z, are divided by in order to make nearby things larger and far things smaller. The variable W is representative of a distance between a "center of projection" and the corresponding vertex.

In one embodiment where the primitive is a triangle, if none of the vertices have a negative W-value, the sides of the triangle enclose the area. On the other hand, if only one of the vertices has a negative W-value, the area is positioned outside an edge of the primitive opposite the vertex having the negative W-value. In particular, the area to be drawn is bounded by two lines that are co-linear with the two triangle sides sharing the -W vertex, and further bounded by a side of the triangle that shares the two +W vertexes. Still yet, if only two of the vertices have negative W-values, the area is positioned outside the primitive opposite both of the vertices having the negative W-values. In other words, the area to be drawn is bounded by two lines that are co-linear with the two triangle sides sharing the +W vertex, and further contiguous to the +W vertex.

The present invention is thus capable of handling all three of the foregoing cases. If the vertices are off-screen, it draws only the on-screen portion. If part of the triangle is beyond the near and/or far plane, it draws only the portion within those planes. If the triangle has one or two negative Z vertexes, only the correct +Z portion is drawn. By limiting the traversal using line equations and Z testing, little time is wasted exploring "bad" pixels that fail an edge or Z test. This is possible because all clipping, by screen edge or near or far plane, results in a convex region on-screen which can be "explored" easily because of its convex nature.

In one aspect of the present invention, positioning a plurality of points on or near the area, and moving the points about in the area identifies the space in the area. The points may define an enclosed polygonal region. As an option, the points may be initially positioned on top of the display if a top vertex of the area is positioned outside boundaries of the display.

These and other advantages of the present invention will become apparent upon reading the following detailed description and studying the various figures of the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other aspects and advantages are better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:

FIGS. 1, 1A, and 1B illustrate a prior art method of rasterization;

FIG. 1C is a flow diagram illustrating the various components of one embodiment of the present invention implemented on a single semiconductor platform;

FIG. 2 is a schematic diagram of a vertex attribute buffer (VAB) in accordance with one embodiment of the present invention;

FIG. 2A is a chart illustrating the various commands that may be received by VAB in accordance with one embodiment of the present invention;

FIG. 2B is a flow chart illustrating a method of loading and draining vertex attributes to and from VAB in accordance with one embodiment of the present invention;

FIG. 2C is a schematic diagram illustrating the architecture of the present invention employed to implement the operations of FIG. 2B;

FIG. 3 illustrates the mode bits associated with VAB in accordance with one embodiment of the present invention;

FIG. 4 illustrates the transform module of the present invention;

FIG. 4A is a flow chart illustrating a method of running multiple execution threads in accordance with one embodiment of the present invention;

FIG. 4B is a flow diagram illustrating a manner in which the method of FIG. 4A is carried out in accordance with one embodiment of the present invention;

FIG. 5 illustrates the functional units of the transform module of FIG. 4 in accordance with one embodiment of the present invention;

FIG. 6 is a schematic diagram of the multiplication logic unit (MLU) of the transform module of FIG. 5;

FIG. 7 is a schematic diagram of the arithmetic logic unit (ALU) of the transform module of FIG. 5;

FIG. 8 is a schematic diagram of the register file of the transform module of FIG. 5;

FIG. 9 is a schematic diagram of the inverse logic unit (ILU) of the transform module of FIG. 5;

FIG. 10 is a chart of the output addresses of output converter of the transform module of FIG. 5 in accordance with one embodiment of the present invention;

FIG. 11 is an illustration of the micro-code organization of the transform module of FIG. 5 in accordance with one embodiment of the present invention;

FIG. 12 is a schematic diagram of the sequencer of the transform module of FIG. 5 in accordance with one embodiment of the present invention;

FIG. 13 is a flowchart delineating the various operations associated with use of the sequencer of the transform module of FIG. 12;

FIG. 14 is a flow diagram delineating the operation of the sequencing component of the sequencer of the transform module of FIG. 12;

FIG. 14A is a flow diagram illustrating the components of the present invention employed for handling scalar and vector components during graphics-processing;

FIG. 14B is a flow diagram illustrating one possible combination 1451 of the functional components of the present invention shown in FIG. 14A which corresponds to the transform module of FIG. 5;

FIG. 14C is a flow diagram illustrating another possible combination 1453 of the functional components of the present invention shown in FIG. 14A;

FIG. 14D illustrates a method implemented by the transform module of FIG. 12 for performing a blending operation during graphics-processing in accordance with one embodiment of the present invention;

FIG. 15 is a schematic diagram of the lighting module of one embodiment of the present invention;

FIG. 16 is a schematic diagram showing the functional units of the lighting module of FIG. 15 in accordance with one embodiment of the present invention;

FIG. 17 is a schematic diagram of the multiplication logic unit (MLU) of the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 18 is a schematic diagram of the arithmetic logic unit (ALU) of the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 19 is a schematic diagram of the register unit of the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 20 is a schematic diagram of the lighting logic unit (LLU) of the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 21 is an illustration of the flag register associated with the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 22 is an illustration of the micro-code fields associated with the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 23 is a schematic diagram of the sequencer associated with the lighting module of FIG. 16 in accordance with one embodiment of the present invention;

FIG. 24 is a flowchart delineating the manner in which the sequencers of the transform and lighting modules are capable of controlling the input and output of the associated buffers in accordance with one embodiment of the present invention;

FIG. 25 is a diagram illustrating the manner in which the sequencers of the transform and lighting modules are capable of controlling the input and output of the associated buffers in accordance with the method of FIG. 24;

FIG. 25B is a schematic diagram of the various modules of the rasterizer of FIG. 1B;

FIG. 26 illustrates a schematic of the set-up module of the rasterization module of the present invention;

FIG. 26A is an illustration showing the various parameters calculated by the set-up module of the rasterizer of FIG. 26;

FIG. 27 is a flowchart illustrating a method of the present invention associated with the set-up and traversal modules of the rasterizer component shown in FIG. 26;

FIG. 27A illustrates sense points that enclose a convex region that is moved to identify an area in a primitive in accordance with one embodiment of the present invention;

FIG. 28 is a flowchart illustrating a process of the present invention associated with the process row operation 2706 of FIG. 27;

FIG. 28A is an illustration of the sequence in which the convex region of the present invention is moved about the primitive;

FIG. 28B illustrates another example of the sequence in which the convex region of the present invention is moved about the primitive;

FIG. 29 is a flowchart illustrating an alternate boustrophedonic process of the present invention associated with the process row operation 2706 of FIG. 27;

FIG. 29A is an illustration of the sequence in which the convex region of the present invention is moved about the primitive in accordance with the boustrophedonic process of FIG. 29;

FIG. 30 is a flowchart illustrating an alternate boustrophedonic process using boundaries;

FIG. 31 is a flowchart showing the process associated with operation 3006 of FIG. 30;

FIG. 31A is an illustration of the sequence in which the convex region of the present invention is moved about the primitive in accordance with the boundary-based boustrophedonic process of FIGS. 30 and 31;

FIG. 32 is a flowchart showing the process associated with operation 2702 of FIG. 27;

FIG. 32A is an illustration showing which area is drawn if no negative W-values are calculated in the process of FIG. 32;

FIG. 32B is an illustration showing which area is drawn if only one negative W-value is calculated in the process of FIG. 32; and

FIG. 32C is an illustration showing which area is drawn if only two negative W-values are calculated in the process of FIG. 32.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIGS. 1, 1A and 1B show the prior art. FIGS. 1C-32C show a graphics pipeline system of the present invention.

FIG. 1C is a flow diagram illustrating the various components of one embodiment of the present invention. As shown, the present invention is divided into four main modules including a vertex attribute buffer (VAB) 50, a transform module 52, a lighting module 54, and a rasterization module 56 with a set-up module 57. In one embodiment, each of the foregoing modules is situated on a single semiconductor platform in a manner that will be described hereinafter in greater detail. In the present description, the single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip.

The VAB 50 is included for gathering and maintaining a plurality of vertex attribute states such as position, normal, colors, texture coordinates, etc. Completed vertices are processed by the transform module 52 and then sent to the lighting module 54. The transform module 52 generates vectors for the lighting module 54 to light. The output of the lighting module 54 is screen space data suitable for the set-up module which, in turn, sets up primitives. Thereafter, rasterization module 56 carries out rasterization of the primitives. It should be noted that the transform and lighting modules 52 and 54 might only stall on the command level such that a command is always finished once started.

In one embodiment, the present invention includes a hardware implementation that at least partially employs Open Graphics Library (OpenGL.RTM.) and D3D.TM. transform and lighting pipelines. OpenGL.RTM. is the computer industry's standard application program interface (API) for defining 2-D and 3-D graphic images. With OpenGL.RTM., an application can create the same effects in any operating system using any OpenGL.RTM.-adhering graphics adapter. OpenGL.RTM. specifies a set of commands or immediately executed functions. Each command directs a drawing action or causes special effects.

FIG. 2 is a schematic diagram of VAB 50 in accordance with one embodiment of the present invention. As shown, VAB 50 passes command bits 200 while storing data bits 204 representative of attributes of a vertex and mode bits 202. In use VAB 50 receives the data bits 204 of vertices and drains the same.

The VAB 50 is adapted for receiving and storing a plurality of possible vertex attribute states via the data bits 204. In use after such data bits 204, or vertex data, is received and stored in VAB 50, the vertex data is outputted from VAB 50 to a graphics-processing module, namely the transform module 52. Further, the command bits 200 are passed by VAB 50 for determining a manner in which the vertex data is inputted to VAB 50 in addition to other processing which will be described in greater detail with reference to FIG. 2A. Such command bits 200 are received from a command bit source such as a microcontroller, CPU, data source or any other type of source which is capable of generating command bits 200.

Further, mode bits 202 are passed which are indicative of the status of a plurality of modes of process operations. As such, mode bits 202 are adapted for determining a manner in which the vertex data is processed in the subsequent graphics-processing modules. Such mode bits 202 are received from a command bit source such as a microcontroller, CPU, data source or any other type of source which is capable of generating mode bits 202.

It should be noted that the various functions associated with VAB 50 may be governed by way of dedicated hardware, software or any other type of logic. In various embodiments, 64, 128, 256 or any other number of mode bits 202 may be employed.

The VAB 50 also functions as a gathering point for the 64 bit data that needs to be converted into a 128-bit format. The VAB 50 input is 64 bits/cycle and the output is 128 bits/cycle. In other embodiments, VAB 50 may function as a gathering point for 128-bit data, and VAB 50 input may be 128 bits/cycle or any other combination. The VAB 50 further has reserved slots for a plurality of vertex attributes that are all IEEE 32 bit floats. The number of such slots may vary per the desires of the user. Table 1 illustrates exemplary vertex attributes employed by the present invention.

TABLE 1 Position: x,y,z,w Diffuse Color: r,g,b,a Specular Color: r,g,b Fog: f Texture0: s,t,r,q Texture1: s,t,r,q Normal: nx,ny,nz Skin Weight: w

During operation, VAB 50 may operate assuming that the x,y data pair is written before the z,w data pair since this allows for defaulting the z,w pair to (0.0,1.0) at the time of the x,y write. This may be important for default components in OpenGL.RTM. and D3D.TM.. It should be noted that the position, texture0, and texture1 slots default the third and fourth components to (0.0,1.0). Further, the diffuse color slot defaults the fourth component to (1.0) and the texture slots default the second component to (0.0).

The VAB 50 includes still another slot 205 used for assembling the data bits 204 that may be passed into or through the transform and lighting module 52 and 54, respectively, without disturbing the data bits 204. The data bits 204 in the slot 205 can be in a floating point or integer format. As mentioned earlier, the data bits 204 of each vertex has an associated set of mode bits 202 representative of the modes affecting the processing of the data bits 204. These mode bits 202 are passed with the data bits 204 through the transform and lighting modules 52 and 54, respectively, for purposes that will be set forth hereinafter in greater detail.

In one embodiment, there may be 18 valid VAB, transform, and lighting commands received by VAB 50. FIG. 2A is a chart illustrating the various commands that may be received by VAB 50 in accordance with one embodiment of the present invention. It should be understood that all load and read context commands, and the passthrough command shown in the chart of FIG. 2A transfer one data word of up to 128 bits or any other size.

Each command of FIG. 2A may contain control information dictating whether each set of data bits 204 is to be written into a high double word or low double word of one VAB address. In addition, a 2-bit write mask may be employed for providing control to the word level. Further, there may be a launch bit that informs VAB controller that all of the data bits 204 are present for a current command to be executed.

Each command has an associated stall field that allows a look-up to find information on whether the command is a read command in that it reads context memory or is a write command in that it writes context memory. By using the stall field of currently executing commands, the new command may be either held off in case of conflict or allowed to proceed.

In operation, VAB 50 can accept one input data word up to 128 bits (or any other size) per cycle and output one data word up to 128 bits (or any other size) per cycle. For the load commands, this means that it may take two cycles to load the data into VAB 50 to create a 128-bit quad-word and one cycle to drain it. For the scalar memories in the lighting module 54, it is not necessary to accumulate a full quad-word, and these can be loaded in one cycle/address. For one vertex, it can take up to 14 cycles to load the 7 VAB slots while it only takes 7 cycles to drain them. It should be noted, however, that it is only necessary to update the vertex state that changes between executing vertex commands. This means that, in one case, the vertex position may be updated taking 2 cycles, while the draining of the vertex data takes 7 cycles. It should be noted that only 1 cycle may be required in the case of the x,y position.

FIG. 2B is a flow chart illustrating one method of loading and draining vertex attributes to and from VAB 50 during graphics-processing. Initially, in operation 210, at least one set of vertex attributes is received in VAB 50 for being processed. As mentioned earlier, each set of vertex attributes may be unique, and correspond to a single vertex.

In use the vertex attributes are stored in VAB 50 upon the receipt thereof in operation 212. Further, each set of stored vertex attributes is transferred to a corresponding one of a plurality of input buffers of the transform module 52. The received set of vertex attributes is also monitored in order to determine whether a received vertex attribute has a corresponding vertex attribute of a different set currently stored in VAB 50, as indicated in operation 216.

Upon it being determined that a stored vertex attribute corresponds to the received vertex attribute in decision 217, the stored vertex attribute is outputted to the corresponding input buffer of the transform module 52 out of order. See operation 218. Immediately upon the stored vertex attribute being outputted, the corresponding incoming vertex attribute may take its place in VAB 50. If no correspondence is found, however, each set of the stored vertex attributes may be transferred to the corresponding input buffer of the transform module 52 in accordance with a regular predetermined sequence. Note operation 219.

It should be noted that the stored vertex attribute might not be transferred in the aforementioned manner if it has an associated launch command. Further, in order for the foregoing method to work properly, the bandwidth of an output of VAB 50 must be at least the bandwidth of an input of VAB 50.

FIG. 2C is a schematic diagram illustrating the architecture of the present invention employed to implement the operations of FIG. 2B. As shown, VAB 50 has a write data terminal WD, a read data terminal RD, a write address terminal WA, and a read address RA terminal. The read data terminal is coupled to a first clock-controlled buffer 230 for outputting the data bits 204 from VAB 50.

Also included is a first multiplexer 232 having an output coupled to the read address terminal of VAB 50 and a second clock-controlled buffer 234. A first input of the first multiplexer 232 is coupled to the write address terminal of VAB 50 while a second input of the first multiplexer 232 is coupled to an output of a second multiplexer 236. A logic module 238 is coupled between the first and second multiplexers 232 and 236, the write address terminal of VAB 50, and an output of the second clock-controlled buffer 234.

In use the logic module 238 serves to determine whether an incoming vertex attribute is pending to drain in VAB 50. In one embodiment, this determination may be facilitated by monitoring a bit register that indicates whether a vertex attribute is pending or not. If it is determined that the incoming vertex attribute does have a match currently in VAB 50, the logic module 238 controls the first multiplexer 232 in order to drain the matching vertex attribute so that the incoming vertex attribute may be immediately stored in its place. On the other hand, if it is determined that the incoming vertex attribute does not have a match currently in VAB 50, the logic module 238 controls the first multiplexer 232 such that VAB 50 is drained and the incoming vertex attribute is loaded sequentially or in some other predetermined order, per the input of the second multiplexer 236 which may be updated by the logic module 238.

As a result, there is no requirement for VAB 50 to drain multiple vertex attributes before a new incoming vertex attribute may be loaded. The pending vertex attribute forces out the corresponding VAB counterpart if possible, thus allowing it to proceed. As a result, VAB 50 can drain in an arbitrary order. Without this capability, it would take 7 cycles to drain VAB 50 and possibly 14 more cycles to load it. By overlapping the loading and draining, higher performance is achieved. It should be noted that this is only possible if an input buffer is empty and VAB 50 can drain into input buffers of the transform module 52.

FIG. 3 illustrates the mode bits associated with VAB 50 in accordance with one embodiment of the present invention. The transform/light mode information is stored in a register via mode bits 202. Mode bits 202 are used to drive the sequencers of the transform module 52 and lighting module 54 in a manner that will be become apparent hereinafter. Each vertex has associated mode bits 202 that may be unique, and can therefore execute a specifically tailored program sequence. While, mode bits 202 may generally map directly to the graphics API, some of them may be derived.

In one embodiment, the active light bits (LIS) of FIG. 3 may be contiguous. Further, the pass-through bit (VPAS) is unique in that when it is turned on, the vertex data is passed through with scale and bias, and no transforms or lighting is done. Possible mode bits 202 used when VPAS is true are the texture divide bits (TDV0,1), and foggen bits (used to extract fog value in D3D.TM.). VPAS is thus used for pre-transformed data, and TDV0,1 are used to deal with a cylindrical wrap mode in the context of D3D.TM..

FIG. 4 illustrates the transform module of one embodiment of the present invention. As shown, the transform module 52 is connected to VAB 50 by way of 6 input buffers 400. In one embodiment, each input buffer 400 might be 7*128b in size. The 6 input buffers 400 each is capable of storing 7 quad words. Such input buffers 400 follow the same layout as VAB 50, except that the pass data is overlapped with the position data.

In one embodiment, a bit might be designated for each attribute of each input buffer 400 to indicate whether data has changed since the previous instance that the input buffer 400 was loaded. By this design, each input buffer 400 might be loaded only with changed data.

The transform module 52 is further connected to 6 output vertex buffers 402 in the lighting module 54. The output buffers include a first buffer 404, a second buffer 406, and a third buffer 408. As will become apparent hereinafter, the contents, i.e. position, texture coordinate data, etc., of the third buffer 408 are not used in the lighting module 54. The first buffer 404 and second buffer 406 are both, however, used for inputting lighting and color data to the lighting module 54. Two buffers are employed since the lighting module is adapted to handle two read inputs. It should be noted that the data might be arranged so as to avoid any problems with read conflicts, etc.

Further coupled to the transform module 52 is context memory 410 and micro-code ROM memory 412. The transform module 52 serves to convert object space vertex data into screen space, and to generate any vectors required by the lighting module 54. The transform module 52 also does processes skinning and texture coordinates. In one embodiment, the transform module 52 might be a 128-bit design processing 4 floats in parallel, and might be optimized for doing 4 term dot products.

FIG. 4A is a flow chart illustrating a method of executing multiple threads in the transform module 52 in accordance with one embodiment of the present invention. In operation, the transform module 52 is capable of processing 3 vertices in parallel via interleaving. To this end, 3 commands can be simultaneously executed in parallel unless there are stall conditions between the commands such as writing and subsequently reading from the context memory 410. The 3 execution threads are independent of each other and can be any command since all vertices contain unique corresponding mode bits 202.

As shown in FIG. 4A, the method of executing multiple threads includes determining a current thread to be executed in operation 420. This determination might be made by identifying a number of cycles that a graphics-processing module requires for completion of an operation, and tracking the cycles. By tracking the cycles, each thread can be assigned to a cycle, thus allowing determination of the current thread based on the current cycle. It should be noted, however, that such determination might be made in any desired manner that is deemed effective.

Next, in operation 422, an instruction associated with a thread to be executed during a current cycle is retrieved using a corresponding program counter number. Thereafter, the instruction is executed on the graphics-processing module in operation 424.

In one example of use, the instant method includes first accessing a first instruction, or code segment, per a first program counter. As mentioned earlier, such program counter is associated with a first execution thread. Next, the first code segment is executed in the graphics-processing module. As will soon become apparent, such graphics-processing module might take the form of an adder, a multiplier, or any other functional unit or combination thereof.

Since the graphics-processing module requires more than one clock cycle to complete the execution, a second code segment might be accessed per a second program counter immediately one clock cycle after the execution of the first code segment. The second program counter is associated with a second execution thread, wherein each of the execution threads process a unique vertex.

To this end, the second code segment might begin execution in the graphics-processing module prior to the completion of the execution of the first code segment in the graphics-processing module. In use the graphics-processing module requires a predetermined number of cycles for every thread to generate an output. Thus, the various steps of the present example might be repeated for every predetermined number of cycles.

This technique offers numerous advantages over the prior art. Of course, the functional units of the present invention are used more efficiently. Further, the governing code might be written more efficiently when the multiple threading scheme is assumed to be used.

For example, in the case where the graphics-processing module includes a multiplier that requires three clock cycles to output an answer, it would be necessary to include two no operation commands between subsequent operations such as a=b*c and d=e*a, since "a" would not be available until after the three clock cycles. In the present embodiment, however, the code might simply call d=e*a immediately subsequent a=b*c, because it can be assumed that such code will be executed as one of three execution threads that are called once every three clock cycles.

FIG. 4B is a flow diagram illustrating a manner in which the method of FIG. 4A is carried out. As shown, each execution thread has an associated program counter 450 that is used to access instructions, or code segments, in instruction memory 452. Such instructions might then be used to operate a graphics-processing module such as an adder 456, a multiplier 454, and/or an inverse logic unit or register 459.

In order to accommodate a situation where at least two of the foregoing processing modules are used in tandem, at least one code segment delay 457 is employed between the graphics-processing modules. In the case where a three-thread framework is employed, a three-clock cycle code segment delay 457 is used. In one embodiment, the code segment delay 457 is used when a multiplication instruction is followed by an addition instruction. In such case, the addition instruction is not executed until three clock cycles after the execution of the multiplication instruction in order to ensure that time has elapsed which is sufficient for the multiplier 456 to generate an output.

After the execution of each instruction, the program counter 450 of the current execution thread is updated and the program counter of the next execution thread is called by module 458 in a round robin sequence to access an associated instruction. It should be noted that the program counters might be used in any fashion including, but not limited to incrementing, jumping, calling and returning, performing a table jump, and/or dispatching. Dispatching refers to determining a starting point of code segment execution based on a received parameter. Further, it important to understand that the principles associated with the present multiple thread execution framework might also be applied to the lighting module 54 of the graphics-processing pipeline of the present invention.

In the case where a three-thread framework is employed, each thread is allocated one input buffer and one output buffer at any one time. This allows loading of three more commands with data while processing three commands. The input buffers and output buffers are assigned in a round robin sequence in a manner that will be discussed later with reference to FIGS. 27 and 28.

The execution threads are thus temporally and functionally interleaved. This means that each function unit is pipelined into three stages and each thread occupies one stage at any one time. In one embodiment, the three-threads might be set to always execute in the same sequence, i.e. zero then one then three. Conceptually, the threads enter a function unit at t=clock modulo three. Once a function unit starts work, it takes three cycles to deliver the result (except the ILU that takes six), at which time the same thread is again active.

FIG. 5 illustrates the functional units of the transform module 52 of FIG. 4 in accordance with one embodiment of the present invention. As shown, included are input buffers 400 that are adapted for being coupled to VAB 50 for receiving vertex data therefrom.

A memory logic unit (MLU) 500 has a first input coupled to an output of input buffers 400. As an option, the output of MLU 500 might have a feedback loop 502 coupled to the first input thereof.

Also provided is an arithmetic logic unit (ALU) 504 having a first input coupled to an output of MLU 500. The output of ALU 504 further has a feedback loop 506 connected to the second input thereof. Such feedback loop 502 may further have a delay 508 coupled thereto. Coupled to an output of ALU 504 is an input of a register unit 510. It should be noted that the output of register unit 510 is coupled to the first and second inputs of MLU 500.

An inverse logic unit (ILU) 512 is provided including an input coupled to the output of ALU 504 for performing an inverse or an inverse square root operation. In an alternate embodiment, ILU 512 might include an input coupled to the output of register unit 510.

Further included is a conversion, or smearing, module 514 coupled between an output of ILU 512 and a second input of MLU 500. In use the conversion module 514 serves to convert scalar vertex data to vector vertex data. This is accomplished by multiplying the scalar data by a vector so that the vector operators such as the multiplier and/or adder may process it. For example, a scalar A, after conversion, may become a vector (A,A,A,A). In an alternate embodiment, the smearing module 514 might be incorporated into the multiplexers associated with MLU 500, or any other component of the present invention. As an option, a register 516 might be coupled between the output of ILU 512 and an input of the conversion unit 514. Further, such register 516 might be threaded.

Memory 410 is, coupled to the second input of MLU 500 and the output of ALU 504. In particular, memory 410 has a read terminal coupled to the second input of MLU 500. Further, memory 410 has a write terminal coupled to the output of ALU 504.

The memory 410 has stored therein a plurality of constants and variables for being used in conjunction with the input buffer 400, MLU 500, ALU 504, register unit 510, ILU 512, and the conversion module 514 for processing the vertex data. Such processing might include transforming object space vertex data into screen space vertex data, generating vectors, etc.

Finally, an output converter 518 is coupled to the output of ALU 504. The output converter 518 serves for being coupled to a lighting module 54 via output buffers 402 to output the processed vertex data thereto. All data paths except for the ILU might be designed to be 128 bits wide or other data path widths may be used.

FIG. 6 is a schematic diagram of MLU 500 of the transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. As shown, MLU 500 of the transform module 52 includes four multipliers 600 that are coupled in parallel.

MLU 500 of transform module 52 is capable of multiplying two four component vectors in three different ways, or pass one four component vector. MLU 500 is capable of performing multiple operations. Table 2 illustrates such operations associated with MLU 500 of transform module 52.

TABLE 2 CMLU_MULT o[0] = a[0] * b[0], o[1] = a[1] * b[1], o[2] = a[2] * b[2], o[3] = a[3] * b[3] CMLU_MULA o[0] = a[0] * b[0], o[1] = a[1] * b[1], o[2] = a[2] * b[2], o[3] = a[3] CMLU_MULB o[0] = a[0] * b[0], o[1] = a[1] * b[1], o[2] = a[2] * b[2], o[3] = b[3] CMLU_PASA o[0] = a[0], o[1] = a[1], o[2] = a[2], o[3] = a[3] CMLU_PASB o[0] = b[0], o[1] = b[1], o[2] = b[2], o[3] = b[3]

Possible A and B inputs are shown in Table 3.

TABLE 3 MA_M MLU MA_V Input Buffer MA_R RLU (shared with MB_R) MB_I ILU MB_C Context Memory MB_R RLU (shared with MA_R)

Table 4 illustrates a vector rotate option capable of being used for cross products.

TABLE 4 MR_NONE No change MR_ALBR Rotate A[XYZ] vector left, B[XYZ] vector right MR_ARBL Rotate A[XYZ] vector right, B[XYZ] vector left

FIG. 7 is a schematic diagram of ALU 504 of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. As shown, ALU 504 of transform module 52 includes three adders 700 coupled in parallel/series. In use ALU 504 of transform module 52 can add two three component vectors, pass one four component vector, or smear a vector component across the output. Table 5 illustrates various operations of which ALU 504 of transform module 52 is capable.

TABLE 5 CALU_ADDA o[0] = a[0] + b[0], o[1] = a[1] + b[1], o[2] = a[2] + b[2], o[3] = a[3] CALU_ADDB o[0] = a[0] + b[0], o[1] = a[1] + b[1], o[2] = a[2] + b[2], o[3] = b[3] CALU_SUM3B o[0123] = b[0] + b[1] + b[2] CALU_SUM4B o[0123] = b[0] + b[1] + b[2] + b[3] CALU_SMRB0 o[0123] = b[0] CALU_SMRB1 o[0123] = b[1] CALU_SMRB2 o[0123] = b[2] CALU_SMRB3 o[0123] = b[3] CALU_PASA o[0] = a[0], o[1] = a[1], o[2] = a[2], o[3] = a[3] CALU_PASB o[0] = b[0], o[1] = b[1], o[2] = b[2], o[3] = b[3]

Table 6 illustrates the A and B inputs of ALU 504 of transform module 52.

TABLE 6 AA_A ALU (one instruction delay) AA_C Context Memory AB_M MLU

It is also possible to modify the sign bits of the A and B input by effecting no change, negation of B, negation of A, absolute value A,B. It should be noted that when ALU 504 outputs scalar vertex data, this scalar vertex data is smeared across the output in the sense that each output represents the scalar vertex data. The pass control signals of MLU 500 and ALU 504 are each capable of disabling all special value handling during operation.

FIG. 8 is a schematic diagram of the vector register file 510 of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. As shown, the vector register file 510 includes four sets of registers 800 each having an output connected to a first input of a corresponding multiplexer 802 and an input coupled to a second input of the corresponding multiplexer 802.

In one embodiment of the present invention, the vector register file 510 is threaded. That is, there are three copies of the vector register file 510 and each thread has its own copy. In one embodiment, each copy contains eight registers, each of which might be 128 bits in size and store four floats. The vector register file 510 is written from ALU 504 and the output is fed back to MLU 500. The vector register file 510 has one write and one read per cycle.

In operation, it is also possible to individually mask a write operation to each register component. The vector register file 510 exhibits zero latency when the write address is the same as the read address due to a bypass path 511 from the input to the output. In this case, unmasked components would be taken from the registers and masked components would be bypassed. The vector register file 510 is thus very useful for building up vectors component by component, or for changing the order of vector components in conjunction with the ALU SMR operations (See Table 5). Temporary results might be also stored in the vector register file 510.

FIG. 9 is a schematic diagram of ILU 512 of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. As shown, ILU 512 of transform module 52 is capable of generating a floating-point reciprocal (1/D) and a reciprocal square root (1/D (1/2)). To carry out such operations, either one of two iterative processes might be executed on a mantissa. Such processes might be executed with any desired dedicated hardware, and are shown below:

Reciprocal Square-root Reciprocal (1/D) (1/D (1/2)) x.sub.n + 1 = x.sub.n (2 - x.sub.n * D) x.sub.n+1 = (1/2) * x.sub.n (3 - x.sub.n.sup.2 * D) 1) table look up for x.sub.n (seed) table look up for x.sub.n (seed) x.sub.n x.sub.n * x.sub.n 2) 1.sup.st iteration: multiply-add 1.sup.st iteration: multiply-add 2 - x.sub.n * D 3 - x.sub.n.sup.2 * D 3) 1.sup.st iteration: multiply 1.sup.st iteration: multiply x.sub.n (2 - x.sub.n * D) (1/2) * x.sub.n (3 - x.sub.n.sup.2 * D) 4) 2.sup.nd iteration: no-op 2.sup.nd iteration: square pass x.sub.n + 1 x.sub.n+1.sup.2 5) 2.sup.nd iteration: multiply-add 2.sup.nd iteration: multiply-add 2 - x.sub.n+1 * D 3 - x.sub.n+1.sup.2 * D 6) 2.sup.nd iteration: multiply 2.sup.nd iteration: multiply x.sub.n+1 (2 - x.sub.n+1 * D) (1/2) * x.sub.n+1 (3 - x.sub.n+1.sup.2 * D)

As shown, the two processes are similar, affording a straightforward design. It should be noted that the iterations might be repeated until a threshold precision is met.

In operation, ILU 512 performs two basic operations including an inverse operation and inverse square root operation. Unlike the other units, it requires six cycles to generate the output. The input is a scalar, and so is the output. As set forth earlier, the threaded holding register 516 at ILU 512 output is relied upon to latch the result until the next time a valid result is generated. Further, the scalar output is smeared into a vector before being fed into MLU 500. The inverse unit 512 uses look-up tables and a two pass Newton-Raphson iteration to generate IEEE (Institute of Electrical and Electronics Engineers) outputs accurate to within about 22 mantissa bits. Table 7 illustrates the various operations that might be performed by ILU 512 of transform module 52.

TABLE 7 CILU_INV o = 1.0/a CILU_ISQ o = 1.0/sqrt(a) CILU_CINV o = 1.0/a (with range clamp) CILU_NOP no output

The foregoing range clamp inversion operation of Table 7 might be used to allow clipping operations to be handled by rasterization module 56. Coordinates are transformed directly into screen space that can result in problems when the homogeneous clip space w is near 0.0. To avoid multiplying by 1.0/0.0 in the perspective divide, the 1/w calculation is clamped to a minimum and a maximum exponent.

In use the context memory 410 as shown in FIG. 5 reads and writes only using quad-words. The memory can be read by MLU 500 or ALU 504 each cycle, and can be written by ALU 504. Only one memory read is allowed per cycle. If a read is necessary, it is done at the start of an instruction and then pipelined down to ALU 504 three cycles later. Context memory 410 need not necessarily be threaded.

FIG. 10 is a chart of the output addresses of output converter 518 of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. The output converter 518 is responsible for directing the outputs to proper destinations, changing the bit precision of data, and some data swizzling to increase performance. All data destined for lighting module 54 is rounded to a 22 bit floating point format organized as S1E8M13 (one sign, eight exponent, 13 mantissa bits). The destination buffers 402 as shown in FIG. 4 in lighting module 54 are threaded.

Data swizzling is useful when generating vectors. Such technique allows the generation of a distance vector (1,d,d*d) without penalty when producing a vector. The distance vector is used for fog, point parameter and light attenuation. This is done with an eye vector and light direction vectors. Table 8 illustrates the various operations associated with such vectors. It should be noted that, in the following table, squaring the vector refers to d.sup.2 =dot[(x,y,z), (x,y,z)], and storing d.sup.2 in the w component of (x,y,z).

TABLE 8 1. Square the vector (x,y,x,d*d) (output d*d to VBUF, 1.0 to VBUF) 2. Generate inverse sqrt of d*d (1/d) 3. Normalize vector (x/d,y/d,z/d,d) (output x/d,y/d,z/d to WBUF, d to VBUF)

It should be noted that the math carried out in the present invention might not always be IEEE compliant. For example, it might be assumed that "0" multiplied by any number renders "0." This is particularly beneficial when dealing with the equations such as d=d.sup.2 *1/(d.sup.2).sup.1/2, where d=0. Without making the foregoing assumption, such equation would afford an error, thus causing problems in making related computations.

FIG. 11 is an illustration of the micro-code organization of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. The transform module micro-code might be arranged into 15 fields making up a total width of 44 bits. Fields might be delayed to match the data flow of the units. MLU 500 operations are executed at a delay of zero, ALU operations are executed at a delay of one, and RLU, output operations are executed at a delay of two. Each delay is equivalent to three cycles.

FIG. 12 is a schematic diagram of sequencer 1200 of transform module 52 of FIG. 5 in accordance with one embodiment of the present invention. As shown in FIG. 12, sequencer 1200 of transform module 52 includes a buffer 1202 adapted for receiving the mode bits from VAB 50 that are indicative of the status of a plurality of modes of process operations.

Also included is memory 412 capable of storing code segments that each are adapted to carry out the process operations in accordance with the status of the modes. A sequencing module 1206 is coupled between memory 412 and a control vector module 1205 which is in turn coupled to buffer 1202 for identifying a plurality of addresses in memory 412 based on a control vector derived from mode bits 202. The sequencing module 1206 is further adapted for accessing the addresses in memory 412 for retrieving the code segments that might be used to operate transform module 52 to transfer data to an output buffer 1207.

FIG. 13 is a flowchart delineating the various operations associated with use of sequencer 1200 of transform module 52 of FIG. 12. As shown, sequencer 1200 is adapted for sequencing graphics-processing in a transform or lighting operation. In operation 1320, mode bits 202 are first received which are indicative of the status of a plurality of modes of process operations. In one embodiment, mode bits 202 might be received from a software driver.

Then, in operation 1322, pluralities of addresses are then identified in memory based on mode bits 202. Such addresses are then accessed in the memory in operation 1324 for retrieving code segments that each are adapted to carry out the process operations in accordance with the status of the modes. The code segments are subsequently executed with a transform or lighting module for processing vertex data. Note operation 1326.

FIG. 14 is a flow diagram delineating the operation of the sequencing module 1206 of sequencer 1200 of transform module 52 of FIG. 12. As shown, a plurality of mode registers 1430 each include a unique set of mode bits 202 which in turn correspond to a single vertex. It should be noted that mode registers 1430 are polled in a round robin sequence in order to allow the execution of multiple execution threads in the manner set forth earlier during reference to FIGS. 4A and 4B.

Once the current execution thread is selected, a corresponding group of mode bits 202 are decoded in operation 1432. Upon mode bits 202 being decoded in operation 1432, a control vector is afforded which includes a plurality of bits each of which indicate whether a particular code segment is to be accessed in ROM 1404 for processing the corresponding vertex data.

Upon determining whether a code segment should be accessed in ROM 1404 and executed, a pointer operation 1436 increments the current thread pointer to start the next execution thread to obtain a second group mode bits 202 to continue a similar operation. This might be continued for each of the threads in a round robin sequence.

Once the control vector has been formed for a particular group of mode bits 202, a priority encoder operation 1438 determines, or identifies, a next "1" or enabled, bit of the control vector. If such a bit is found, the priority encoder operation 1438 produces an address in ROM 1404 corresponding to the enabled bit of the control vector for execution purposes.

Upon returning to the initial group of mode bits 202 after handling the remaining threads, and after the mode bits have been decoded and the control vector is again available, a masking operation 1434 might be used to mask the previous "1", or enabled, bit that was identified earlier. This allows analysis of all remaining bits after mask operation 1434.

The foregoing process might be illustrated using the following tables. Table 9 shows a plurality of equations that might be executed on subject vertex data.

TABLE 9 R = (a ) R = (a + d * e ) R = (a + b * c + f) R = (a + b * c + d * e ) R = 1.0/(a ) R = 1.0/(a + d * e ) R = 1.0/(a + b * c + f) R = 1.0/(a + b * c + d * e )

As shown, there are four possibilities of products that might be summed in addition to an inverse operation (a, b*c


Free Web Sudoku Puzzles.
Solve with your browser.
  8 9   6        
1             3  
        2 3   5  
      3       1 2
      6   5      
6 7       9      
  4   1 7        
  2             3
        3   5 8  
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!