Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

Bad Credit Loans Made Easier by Pre Approval
Category:
Business  

Vitamin supplements by Nguang Nguek Fluek
Category:
Health / Fitness  

How you Can Save Money if you Book Hotels in Central Rome
Category:
Travel  

Universal Life Insurance guide 101
Category:
Finance / Investment  

FINE or VICE Cash Loans
Category:
Finance / Investment  

Why Blogs are so popular
Category:
Marketing  

Office Supplies and Client Relation
Category:
Business  

Buying a Hidden Spy Camera
Category:
Business  

Understanding Flower Bulbs
Category:
Home And Family  

Parenting 101 Get Into a Parenting Class
Category:
Home And Family  

Lanzarote Tourist
Category:
Travel  

A Visitors Guide to Paris France
Category:
Travel  

Personal Accounts Choosing Your Bank
Category:
Business  

Protect Yourself Against Viruses
Category:
Computers  

Acne A Clean Face First Step In A 12 Step Program
Category:
Health / Fitness  

Inspiring Chicago Musical
Category:
Entertainment / Television  

VOIP security guide
Category:
Computers  

Three Reasons For Becoming A Foster Parent
Category:
Home And Family  

Affiliate Programs MLM Income Opportunity Residual
Category:
Business  

Hepatitis C Symptoms What are the Signs and Symptoms of Hepatiti...
Category:
Health / Fitness  

Sales Success Who Do You Really Work For
Category:
Business  

Stress Testing Tools How to Test for Stress Level DHEA
Category:
Health / Fitness  

Stay At Home CEO How a Single Dad Found Financial Success Workin...
Category:
Business  

Build Your Confidence and Find Your Soulmate
Category:
Entertainment / Television  

Importance of Good Web Design
Category:
Business  

WANT MORE CHANCES OF WINNING THE LOTTERY JACKPOT
Category:
Business  

Eight Strategies to Become a Winner
Category:
Self Help  

Business Property Investment can provide Guaranteed Returns For ...
Category:
Business  

IVR Surveys The secret to Increasing response Rates
Category:
Business  

New Bankruptcy Training Course Provides 7 CLE Credits for Parale...
Category:
Business  

Something new to try What about a head or face massage
Category:
Health / Fitness  

10 Tips for Rapid Fat Loss
Category:
Health / Fitness  

A Guide to Tropical Wall Murals
Category:
Home And Family  

Debt Relief Solutions Get the Way for Financial Relief
Category:
Finance / Investment  

Evolution of Myspace from a social networking website to a marke...
Category:
Marketing  

Top Networking Marketing Opportunities Is There Such A Thing
Category:
Business  

What are you prepared to risk to optimise your chances of intern...
Category:
Marketing  

Using a Free Baby Shower Word Scramble Game
Category:
Home And Family  

To Everyone that Wants to Taste the Love
Category:
Entertainment / Television  

Business Loans
Category:
Business  

PSP Downloads Site Receives 5 Star Rating
Category:
Home And Family  

Did Colorado Kill Doc Holliday
Category:
Travel  

What is franchising
Category:
Business  

Dead Ducks Don t Quack
Category:
Business  

Capital and Repayment Mortgages
Category:
Finance / Investment  

Three Online Stock Trading Systems
Category:
Finance / Investment  

Compare Gyms and Save
Category:
Health / Fitness  

What are the Health Benefits of an Infrared Sauna
Category:
Health / Fitness  

Timeframe of long term SEO results
Category:
Marketing  

Why You Might Consider Enhancement After LASIK Laser Eye Surgery...
Category:
Health / Fitness  

One Way Links and Reciprocal Link Exchange and Traffic
Category:
Marketing  

YES Real Estate Investing Works In Your Area Too
Category:
Finance / Investment  

Avoid Cold Calling Download Ebook Free Online
Category:
Business  

handbags
Category:
Computers  

Ergonomic Keyboards As Healthy Computing Christmas Presents
Category:
Health / Fitness  

Cottage Getaway to Plan Book early to secure your Cottage Rental...
Category:
Travel  

Understanding Teen Acne
Category:
Home And Family  

Tropical Home Decor
Category:
Home And Family  

12 Cost effective Ways to Keep Your Child Safe around the Home
Category:
Home And Family  

Its A Massive Participation For Ebook Free Internet Marketing
Category:
Business  

What Are Supplemental Credit Cardholders
Category:
Business  

How a High Fiber Diet Can Save Your Life
Category:
Health / Fitness  

Equity Indexed Annuity is a Fixed Annuity Now Known as an Index ...
Category:
Finance / Investment  

Do You Have Fear and Anxiety
Category:
Health / Fitness  

Using A Data Recovery Service A Quick Overview
Category:
Computers  

Hemorrhoids Exercises to Easy Your Hemorrhoids
Category:
Health / Fitness  

What Comprises a Good Graphic Design
Category:
Computers  

Know the Real Estate Industry Before Investing
Category:
Business  

Gain Trust From Your Business Partners Is So Important
Category:
Business  

Email Marketing For Success
Category:
Business  

Rx Assistance For NY Citizens By ACIRX
Category:
Business  

Secured Loan
Category:
Finance / Investment  

Are there really free online surveys that pay
Category:
Business  

Supply Sodium Alginate
Category:
Business  

Bread Makers Why your Kitchen is Begging for One
Category:
Home And Family

Compiler apparatus and method of optimizing a source program by reducing a hamming distance between two instructions Number:7,386,844 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Compiler apparatus and method of optimizing a source program by reducing a hamming distance between two instructions

Abstract: A compiler apparatus is capable of generating instruction sequences causing a processor to operate with lower power consumption. The compiler apparatus translates a source program into a machine language program for a processor including execution units which can execute instructions in parallel, and including instruction issue units which issue the instructions executed, respectively, by the execution units. The compiler apparatus includes a parser unit operable to parse the source program, an intermediate code conversion unit operable to convert the parsed source program into intermediate codes, an optimization unit operable to optimize the intermediate codes to reduce a hamming distance between instructions from the same instruction issue unit in consecutive instruction cycles, and includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions.

Patent Number: 7,386,844 Issued on 06/10/2008 to Heishi,   et al.


Inventors: Heishi; Taketo (Osaka, JP), Ogawa; Hajime (Suita, JP), Tani; Takenobu (Jyoyo, JP), Sasagawa; Yukihiro (Soraku-gun, JP)
Assignee: Matsushita Electric Industrial Co., Ltd. (Osaka, JP)
Appl. No.: 10/760,429
Filed: January 21, 2004


Foreign Application Priority Data

Jan 28, 2003 [JP] 2003-019365

Current U.S. Class: 717/161 ; 713/300; 713/320; 717/151; 717/159
Field of Search: 717/140-161 713/300-340


References Cited [Referenced By]

U.S. Patent Documents
5537656 July 1996 Mozdzen et al.
5572736 November 1996 Curran
5574921 November 1996 Curran
5790874 August 1998 Takano et al.
5835776 November 1998 Tirumalai et al.
5854935 December 1998 Enomoto
6002878 December 1999 Gehman et al.
6535984 March 2003 Hurd
6725450 April 2004 Takayama
6826704 November 2004 Pickett
6938248 August 2005 Kitakami et al.
7073169 July 2006 Ogawa et al.
7076775 July 2006 Webster et al.
7299369 November 2007 Webster et al.
7302597 November 2007 Webster
2002/0161986 October 2002 Kamigata et al.
2002/0199177 December 2002 Ogawa et al.
2003/0212914 November 2003 Webster et al.
2004/0015922 January 2004 Kitakami et al.
2005/0005180 January 2005 Webster
2005/0010830 January 2005 Webster
2005/0022041 January 2005 Mycroft et al.
2005/0229017 October 2005 Webster
Foreign Patent Documents
63-126018 May., 1988 JP
08-101777 Apr., 1996 JP
2001-22591 Jan., 2001 JP
2001-92661 Apr., 2001 JP
2002-123331 Apr., 2002 JP
2002-323982 Nov., 2002 JP

Other References

Lee et al., "Compiler Optimization on Instruction Scheduling for Low Power," 2000, IEEE, p. 55-60. cited by examiner .
Pedram, Massous, "Power Optimization and Management in Embedded Systems," 2001, ACM, p. 239-244. cited by examiner .
Chakrapani et al., "The Emerging Power Crisis in Embedded Processors: What can a poor compiler do?" 2001, ACM, p. 287-291. cited by examiner .
Hirohiko Ono, "A Practicing Engineer Discusses Computer Science--VLIW (Parallel ALU) Machine and Compiler", Interface, vol. 15, No. 11, CQ Publishing Company, Nov. 1, 1989, pp. 259-266. cited by other.

Primary Examiner: Zhen; Wei
Assistant Examiner: Chen; Qing
Attorney, Agent or Firm: Wenderoth, Lind & Ponack, L.L.P.

Claims



What is claimed is:

1. A computer-readable storage medium encoded with a compiler apparatus for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the compiler apparatus comprising: a parser unit operable to parse a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; an intermediate code conversion unit operable to receive the parsed source program and convert each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; an optimization unit operable to receive the intermediate codes and optimize scheduling of the instructions of the intermediate codes by: scheduling the instructions of the intermediate codes for each instruction cycle of a plurality of instruction cycles without changing dependencies between the instructions of the intermediate codes, each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution units; and scheduling the instructions of the intermediate codes to reduce a hamming distance between two instructions including (i) an instruction in a target instruction cycle, and (ii) an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit, the optimization unit being operable to schedule the instructions of the intermediate codes to reduce the hamming distance of instructions which are scheduled for each of the instruction cycles; and a code generation unit operable to receive the optimized intermediate codes and convert the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.

2. The computer-readable storage medium according to claim 1, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by determining an instruction to be executed in the target instruction cycle and determining an instruction issue unit in which the instruction is to be stored so as to reduce a hamming distance between the two instructions when the instructions of the intermediate codes are scheduled for each of the instruction cycles.

3. The computer-readable storage medium according to claim 2, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by determining which instruction is to be executed in the target instruction cycle and determining which instruction register of the instruction issue unit storing the instruction is for storing the instruction, to reduce the hamming distance between the two instructions when the instructions of the intermediate codes are scheduled for each of the instruction cycles.

4. The computer-readable storage medium according to claim 1, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by scheduling the instructions of the intermediate codes to reduce a hamming distance between operation codes of the two instructions, the two instructions being stored in instruction registers of the same instruction issue unit.

5. The computer-readable storage medium according to claim 1, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by scheduling the instructions of the intermediate codes to reduce a hamming distance between register numbers of the two instructions when the instructions of the intermediate codes are scheduled for each of the instruction cycles, the two instructions being stored in instruction registers of the same instruction issue unit.

6. A computer-readable storage medium encoded with a compiler apparatus for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, and each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the compiler apparatus comprising: a parser unit operable to parse a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; an intermediate code conversion unit operable to receive the parsed source program and convert each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; an optimization unit operable to receive the intermediate codes and optimize the instructions of the intermediate codes by: changing, for each instruction cycle of a plurality of instruction cycles, a correspondence between (i) instructions of the intermediate codes to be executed in the same instruction cycle and (ii) the instruction issue units from which the instructions are issued, the optimization unit changing the correspondence without changing dependencies between the instructions of the intermediate codes converted by the intermediate code conversion unit, and each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution unit; and changing the correspondence between (i) instructions to be executed in a target instruction cycle and (ii) the instruction issue units from which the instructions are issued, to reduce a hamming distance between two instructions including an instruction to be executed in the target instruction cycle and an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit; and a code generation unit operable to receive the optimized intermediate codes and convert the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.

7. The computer-readable storage medium according to claim 6, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by changing the correspondence between (i) the instructions to be executed in the target instruction cycle, and (ii) the instruction issue units from which the instructions are issued, to reduce a sum of hamming distances, each of the hamming distances being calculated between the two instructions, the two instructions being issued to an identical instruction issue unit, and the two instructions used to calculate the sum of the hamming distances being included in the instruction issue units, respectively.

8. The computer-readable storage medium according to claim 6, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by changing the correspondence between (i) the instructions to be executed in the target instruction cycle, and (ii) the instruction issue units in which the instructions are issued, to reduce a hamming distance between operation codes of the two instructions.

9. The computer-readable storage medium according to claim 6, wherein the optimization unit is operable to optimize the instructions of the intermediate codes by changing the correspondence between (i) the instructions to be executed in the target instruction cycle, and (ii) the instruction issue units in which the instructions are issued, to reduce a hamming distance between register numbers of the two instructions.

10. A method for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the method comprising: parsing a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; converting each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; optimizing scheduling of the instructions of the intermediate codes by: scheduling the instructions of the intermediate codes for each instruction cycle of a plurality of instruction cycles without changing dependencies between the instructions of the intermediate codes, each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution units; and scheduling the instructions of the intermediate codes to reduce a hamming distance between two instructions including (i) an instruction in a target instruction cycle, and (ii) an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit, and the scheduling of the instructions of the intermediate codes reducing the hamming distance of instructions scheduled for each of the instruction cycles; and converting the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.

11. A method for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the method comprising: parsing a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; converting each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; optimizing the instructions of the intermediate codes by: changing, for each instruction cycle of a plurality of instruction cycles, a correspondence between (i) instructions of the intermediate codes to be executed in the same instruction cycle and (ii) the instruction issue units from which the instructions are issued, the optimization unit changing the correspondence without changing dependencies between the instructions of the intermediate codes converted by the converting of each statement, and each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution units; and changing the correspondence between (i) instructions to be executed in a target instruction cycle and (ii) the instruction issue units from which the instructions are issued, to reduce a hamming distance between two instructions including an instruction to be executed in the target instruction cycle and an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit; and converting the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.

12. A computer-readable storage medium encoded with a compiler program for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the compiler program causing a computer to execute a method comprising: parsing a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; converting each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; optimizing scheduling of the instructions of the intermediate codes by: scheduling the instructions of the intermediate codes for each instruction cycle of a plurality of instruction cycles without changing dependencies between the instructions of the intermediate codes, each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution units; and scheduling the instructions of the intermediate codes to reduce a hamming distance between two instructions including (i) an instruction in a target instruction cycle, and (ii) an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit, the scheduling of the instructions of the intermediate codes reducing the hamming distance of instructions scheduled for each of the instruction cycles; and converting the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.

13. A computer-readable storage medium encoded with a compiler program for generating a machine language program for a processor, the processor including a plurality of instruction issue units and a plurality of corresponding execution units, each instruction issue unit issuing instructions to a corresponding execution unit, and each instruction issue unit including instruction registers for storing the instructions issued to the corresponding execution unit, the compiler program causing a computer to execute a method comprising: parsing a source program by extracting, from the source program, a reserved word stored in a storage unit and by carrying out a lexical analysis of the source program; converting each statement included in the parsed source program into intermediate codes according to a predetermined rule stored in the storage unit, the intermediate codes including instructions; optimizing the instructions of the intermediate codes by: changing, for each instruction cycle of a plurality of instruction cycles, a correspondence between (i) instructions of the intermediate codes to be executed in the same instruction cycle and (ii) the instruction issue units from which the instructions are issued, the optimization unit changing the correspondence without changing dependencies between the instructions of the intermediate codes converted by the converting of each statement, and each of the instruction cycles being an instruction cycle that executes instructions in parallel using the execution units; and changing the correspondence between (i) instructions to be executed in a target instruction cycle and (ii) the instruction issue units from which the instructions are issued, to reduce a hamming distance between two instructions including an instruction to be executed in the target instruction cycle and an instruction in an instruction cycle that immediately precedes the target instruction cycle, the two instructions being instructions stored in instruction registers of the same instruction issue unit; and converting the optimized intermediate codes into machine language instructions according to a conversion table stored in the storage unit.
Description



BACKGROUND OF THE INVENTION

(1) Field of the Invention

The present invention relates to a compiler for converting a source program described in a high-level language such as C/C++language into a machine language program, and particularly to a compiler that is capable of outputting a machine language program which can be executed with lower power consumption.

(2) Description of the Related Art

Mobile information processing apparatuses such as mobile phones and personal digital assistants (PDA), which have become widespread in recent years, require reduction of power consumption. Therefore, there is an increasing demand to develop a compiler that is capable of exploiting effectively high functions of a processor used in an information processing apparatus and generating machine-level instructions that can be executed by the processor with low power consumption.

As a conventional compiler, an instruction sequence optimization apparatus for reducing power consumption of a processor by changing execution order of instructions has been disclosed in Japanese Laid-Open Patent Application No. 8-101777.

This instruction sequence optimization apparatus permutes the instructions so as to reduce hamming distances between bit patterns of the instructions without changing dependency between the instructions. Accordingly, it can realize optimization of an instruction sequence, which brings about reduction of power consumption of a processor.

However, the conventional instruction sequence optimization apparatus does not suppose a processor that can execute parallel processing. Therefore, there is a problem that the optimum instruction sequence cannot be obtained even if the conventional optimization processing is applied to the processor with parallel processing capability.

SUMMARY OF THE INVENTION

The present invention has been conceived in view of the above, and aims to provide a compiler that is capable of generating instruction sequences that can be executed by a processor with parallel processing capability and low power consumption.

In order to achieve the above object, the compiler apparatus according to the present invention is a compiler apparatus that translates a source program into a machine language program for a processor including a plurality of execution units which can execute instructions in parallel and a plurality of instruction issue units which issue the instructions executed respectively by the plurality of execution units. The compiler apparatus includes a parser unit operable to parse the source program, and an intermediate code conversion unit operable to convert the parsed source program into intermediate codes. The compiler apparatus also includes an optimization unit operable to optimize the intermediate codes so as to reduce a hamming distance between instructions placed in positions corresponding to the same instruction issue unit in consecutive instruction cycles, without changing dependency between the instructions corresponding to the intermediate codes. Further, the compiler apparatus includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions. Preferably, the optimization unit optimizes the intermediate codes by placing an instruction with higher priority in a position corresponding to each of the plurality of instruction issue units, without changing dependency between the instructions corresponding to the intermediate codes, the instruction with higher priority having a smaller hamming distance from an instruction being placed in a position corresponding to the same instruction issue unit in an immediately preceding cycle.

Accordingly, since it is possible to restrain change in bit patterns of instructions executed by each execution unit, bit change in values held in instruction registers of a processor is kept small, and thus an instruction sequence that can be executed by the processor with low power consumption is generated.

The compiler apparatus according to another aspect of the present invention is a compiler apparatus that translates a source program into a machine language program for a processor including a plurality of execution units which can execute instructions in parallel and a plurality of instruction issue units which issue the instructions executed respectively by the plurality of execution units. The compiler apparatus includes a parser unit operable to parse the source program, and an intermediate code conversion unit operable to convert the parsed source program into intermediate codes. The compiler apparatus also includes an optimization unit operable to optimize the intermediate codes so that a same register is accessed in consecutive instruction cycles, without changing dependency between instructions corresponding to the intermediate codes, and includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions. Preferably, the optimization unit optimizes the intermediate codes by placing an instruction with higher priority in a position corresponding to each of the plurality of instruction issue units, without changing dependency between the instructions corresponding to the intermediate codes, the instruction with higher priority being for accessing a register of an instruction placed in a position corresponding to the same instruction issue unit in an immediately preceding instruction cycle.

Accordingly, access to one register is repeated and change in a control signal for selecting a register becomes small, and thus an instruction sequence that can be executed by the processor with low power consumption is generated.

The compiler apparatus according to still another aspect of the present invention is a compiler apparatus that translates a source program into a machine language program for a processor including a plurality of execution units which can execute instructions in parallel and a plurality of instruction issue units which issue the instructions executed respectively by the plurality of execution units, wherein an instruction which is to be issued with higher priority is predetermined for each of the plurality of instruction issue units. The compiler apparatus includes a parser unit operable to parse the source program, and an intermediate code conversion unit operable to convert the parsed source program into intermediate codes. The compiler apparatus also includes an optimization unit operable to optimize the intermediate codes by placing the predetermined instruction with higher priority in a position corresponding to each of the plurality of instruction issue units, without changing dependency between instructions corresponding to the intermediate codes, and includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions.

Accordingly, if instructions using the same constituent element of a processor are assigned as instructions to be issued by priority by the same instruction issue unit, the instructions using the same constituent element are executed consecutively in the same execution unit. Therefore, an instruction sequence that can be executed by the processor with low power consumption is generated.

The compiler apparatus according to still another aspect of the present invention is a compiler apparatus that translates a source program into a machine language program for a processor including a plurality of execution units which can execute instructions in parallel and a plurality of instruction issue units which issue the instructions executed respectively by the plurality of execution units. The compiler apparatus includes a parser unit operable to parse the source program, and an intermediate code conversion unit operable to convert the parsed source program into intermediate codes. The compiler apparatus also includes an interval detection unit operable to detect an interval in which no instruction is placed in a predetermined number of positions, out of a plurality of positions corresponding respectively to the plurality of instruction issue units in which instructions are to be placed, consecutively for a predetermined number of instruction cycles. Further, the compiler apparatus includes a first instruction insertion unit operable to insert, into immediately before the interval, an instruction to stop an operation of the instruction issue units corresponding to the positions where no instruction is placed, and includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions.

Accordingly, when instructions are not placed in a location corresponding to the instruction issue unit for a certain interval, power supply to the instruction issue unit can be stopped during that interval. Therefore, an instruction sequence that can be executed by the processor with low power consumption is generated.

The compiler apparatus according to still another aspect of the present invention is a compiler apparatus that translates a source program into a machine language program for a processor including a plurality of execution units which can execute instructions in parallel and a plurality of instruction issue units which issue the instructions executed respectively by the plurality of execution units. The compiler apparatus includes a parser unit operable to parse the source program, and an intermediate code conversion unit operable to convert the parsed source program into intermediate codes. The compiler apparatus also includes an optimization unit operable to optimize the intermediate codes by placing instructions so as to operate only a specified number of instruction issue units, without changing dependency between the instructions corresponding to the intermediate codes, and includes a code generation unit operable to convert the optimized intermediate codes into machine language instructions. Preferably, the source program includes unit number specification information specifying the number of instruction issue units used by the processor, and the optimization unit optimizes the intermediate codes by placing the instructions so as to operate only the instruction issue units of the number specified by the unit number specification information, without changing dependency between the instructions corresponding to the intermediate codes.

Thus, according to the instructions specified by the number specification information, the optimization unit can generate an instruction issue unit to which no instruction is issued and stop power supply to that instruction issue unit. Therefore, an instruction sequence, that can be executed by the processor with low power consumption, is generated.

More preferably, the above-mentioned compiler apparatus further comprises an acceptance unit operable to accept the number of instruction issue units used by the processor, wherein the optimization unit optimizes the intermediate codes by placing the instructions so as to operate only the instruction issue units of the number accepted by the acceptance unit, without changing dependency between the instructions corresponding to the intermediate codes.

Accordingly, it is possible to operate only the instruction issue units of the number accepted by the acceptance unit and to stop power supply to other instruction issue units. Therefore, an instruction sequence that can be executed by the processor with low power consumption is generated.

It should be noted that the present invention can be realized not only as the compiler apparatus as mentioned above, but also as a compilation method including steps executed by the units included in the compiler apparatus, and as a program for this characteristic compiler or a computer-readable recording medium. It is needless to say that the program and data file can be widely distributed via a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) and a transmission medium such as the Internet.

As is obvious from the above explanation, the compiler apparatus according to the present invention restrains bit change in values held in an instruction register of a processor, and thus an instruction sequence that can be executed by the processor with low power consumption is generated.

Also, access to one register is repeated and a change in a control signal for selecting a register becomes small, and thus an instruction sequence, that can be executed by the processor with low power consumption, is generated.

Also, since the instructions using the same constituent element can be executed in the same slot consecutively for certain cycles, an instruction sequence, that can be executed by the processor with low power consumption, is generated.

Furthermore, since power supply to a free slot can be stopped, an instruction sequence, that can be executed by the processor with low power consumption, is generated.

As described above, the compiler apparatus according to the present invention allows a processor with parallel processing capability to operate with low power consumption. Particularly, it is possible to generate instruction sequences (a machine language program) suitable for a processor used for an apparatus that is required for low-power operation, like a mobile information processing apparatus such as a mobile phone, a PDA or the like, so the practical value of the present invention is extremely high.

As further information about technical background to this application, Japanese Patent Application No. 2003-019365 filed on Jan. 28, 2003 is incorporated herein by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other objects, advantages and features of the invention will become apparent from the following description thereof taken in conjunction with the accompanying drawings that illustrate a specific embodiment of the invention. In the Drawings:

FIG. 1A.about.FIG. 1D are diagrams showing structures of instructions decoded and executed by a processor in the present embodiment;

FIG. 2 is a block diagram showing a schematic structure of the processor in the present embodiment;

FIG. 3 is a diagram showing an example of a packet;

FIGS. 4 ((a) and (b)) are diagrams for explaining parallel execution boundary information included in a packet;

FIGS. 5A.about.5C are diagrams showing examples of the unit of executing instructions which are created based on parallel execution boundary information of a packet and executed in parallel;

FIG. 6 is a block diagram showing a schematic structure of an arithmetic and logical/comparison operation unit;

FIG. 7 is a block diagram showing a schematic structure of a barrel shifter;

FIG. 8 is a block diagram showing a schematic structure of a divider;

FIG. 9 is a block diagram showing a schematic structure of a multiplication/product-sum operation unit;

FIG. 10 is a timing diagram showing each pipeline operation performed when the processor executes instructions;

FIG. 11 is a diagram showing instructions executed by the processor, the details of the processing and the bit patterns of the instructions;

FIG. 12 is a functional block diagram showing a structure of a compiler according to the present embodiment;

FIG. 13 is a flowchart showing operations of an instruction scheduling unit;

FIG. 14A and FIG. 14B are diagrams showing an example of a dependency graph;

FIG. 15 is a diagram showing an example of a result of instruction scheduling;

FIG. 16 is a flowchart showing operations of optimum instruction fetching processing as shown in FIG. 13;

FIG. 17A and FIG. 17B are diagrams for explaining how to calculate a hamming distance between bit patterns in operation codes;

FIG. 18A and FIG. 18B are diagrams for explaining how to calculate a hamming distance between operation codes with different bit lengths;

FIG. 19 is a flowchart showing operations of an intra-cycle permutation processing unit;

FIG. 20A.about.FIG. 20F are diagrams showing an example of six patterns of instruction sequences;

FIG. 21 is a diagram showing an example of placed instructions;

FIG. 22A.about.FIG. 22F are diagrams for explaining processing for creating instruction sequences (S61 in FIG. 19);

FIG. 23 is a diagram for explaining processing for calculating hamming distances between operation codes (S64 in FIG. 19);

FIG. 24 is a flowchart showing operations of a register assignment unit;

FIG. 25 is a diagram showing variables as assignment objects;

FIG. 26 is a diagram showing an interference graph of variables created based on the example of FIG. 25;

FIG. 27A.about.FIG. 27C are diagrams showing results obtained in the processing of instruction scheduling;

FIG. 28 is a flowchart showing operations of an instruction rescheduling unit;

FIG. 29 is a flowchart showing operations of optimum instruction fetching processing in FIG. 28;

FIG. 30A and FIG. 30B are diagrams for explaining processing for specifying placement candidate instructions (S152 in FIG. 29);

FIG. 31A and FIG. 31B are diagrams for explaining processing for specifying placement candidate instructions (S156 in FIG. 29);

FIG. 32A and FIG. 32B are diagrams for explaining processing for specifying placement candidate instructions (S160 in FIG. 29);

FIG. 33 is a flowchart showing operations of a slot stop/resume instruction generation unit;

FIG. 34 is a diagram showing an example of a scheduling result in which instructions are placed;

FIG. 35 is a diagram showing an example of a scheduling result in which instructions are written as processing for a case where specific one slot is only used consecutively;

FIG. 36 is a diagram showing an example of a scheduling result in which instructions are written as processing for a case where specific two slots are only used consecutively;

FIGS. 37 ((a).about.(d)) are diagrams showing an example of a program status register;

FIGS. 38 ((a).about.(h)) are diagrams showing another example of a program status register;

FIG. 39 is a flowchart showing other operations of the optimum instruction fetching processing as shown in FIG. 28;

FIG. 40A and FIG. 40B are diagrams for explaining processing for specifying a placement candidate instruction (S212 in FIG. 39);

FIG. 41 is a flowchart showing the first modification of the operations of the intra-cycle permutation processing unit 237;

FIG. 42 is a diagram for explaining processing for calculating a hamming distance between instructions (S222 in FIG. 41);

FIG. 43 is a flowchart showing the second modification of the operations of the intra-cycle permutation processing unit 237;

FIG. 44 is a diagram for explaining processing for calculating a hamming distance between register fields (S232 in FIG. 43);

FIG. 45 is a flowchart showing the third modification of the operations of the intra-cycle permutation processing unit 237;

FIG. 46 is a diagram showing an example of placed instructions;

FIG. 47A.about.FIG. 47F are diagrams for explaining processing for creating instruction sequences (S61 in FIG. 45);

FIG. 48 is a diagram for explaining processing for calculating the numbers of register fields (S242 in FIG. 45); and

FIG. 49 is a flowchart showing the fourth modification of the operations of the intra-cycle permutation processing unit 237.

DETAILED DESCRIPTION OF THE INVENTION

The embodiment of the compiler according to the present invention will be explained in detail referring to the drawings.

The compiler in the present embodiment is a cross compiler for translating a source program described in a high-level language such as C/C++ language into a machine language that can be executed by a specific processor (target), and has a feature of reducing power consumption of a processor.

(Processor)

First, an example of a processor realized by the compiler in the present embodiment will be explained referring to FIG. 1A.about.FIG. 11.

A pipeline system having higher parallelity of executable instructions than that of a microcomputer is used for the processor realized by the compiler in the present embodiment so as to execute a plurality of instructions in parallel.

FIG. 1A.about.FIG. 1D are diagrams showing structures of instructions decoded and executed by the processor in the present embodiment. As shown in FIG. 1A.about.FIG. 1D, each instruction executed by the processor has a fixed length of 32 bits. The 0th bit of each instruction indicates parallel execution boundary information. When the parallel execution boundary information is "1", there exists a boundary of parallel execution between the instruction and the subsequent instructions. When the parallel execution boundary information is "0", there exists no boundary of parallel execution. How to use the parallel execution boundary information will be described later.

Operations are determined in 31 bits excluding parallel execution boundary information from the instruction length of each instruction. More specifically, in fields "Op1", "Op2", "Op3" and "Op4", operation codes indicating types of operations are specified. In register fields "Rs", "Rs1" and "Rs2", register numbers of registers that are source operands are specified. In a register field "Rd", a register number of a register that is a destination operand is specified. In a field "Imm", a constant operand for operation is specified. In a field "Disp", displacement is specified.

The first 2 bits (30th and 31st bits) of an operation code are used for specifying a type of operations (a set of operations). The detail of these two bits will be described later.

The operation codes Op2.about.Op4 are data of 16-bit length, while the operation code Op1 is data of 21-bit length. Therefore, for convenience, the first half (16th.about.31st bits) of the operation code Op1 is called an operation code Op1-1, while the second half (11th.about.15th bits) thereof is called an operation code Op1-2.

FIG. 2 is a block diagram showing a schematic structure of a processor in the present embodiment. A processor 30 includes an instruction memory 40 for storing sets of instructions (hereinafter referred to as "packets") described according to VLIW (Very Long Instruction Word), an instruction supply/issue unit 50, a decoding unit 60, an execution unit 70 and a data memory 100. Each of these units will be described in detail later.

FIG. 3 is a diagram showing an example of a packet. It is defined that one packet is the unit of an instruction fetch and is made up of four instructions. As mentioned above, one instruction is 32-bit length. Therefore, one packet is 128 (=32.times.4) bit length.

Again referring to FIG. 2, the instruction supply/issue unit 50 is connected to the instruction memory 40, the decoding unit 60 and the execution unit 70, and receives packets from the instruction memory 40 based on a value of a PC (program counter) supplied from the execution unit 70 and issues three or less instructions in parallel to the decoding unit 60.

The decoding unit 60 is connected to the instruction supply/issue unit 50 and the execution unit 70, and decodes the instructions issued from the instruction supply/issue unit 50 and issues the decoded ones to the execution unit 70.

The execution unit 70 is connected to the instruction supply/issue unit 50, the decoding unit 60 and the data memory 100, and accesses data stored in the data memory 100 if necessary and executes the processing according to the instructions, based on the decoding results supplied from the decoding unit 60. The execution unit 70 increments the value of the PC one by one every time the processing is executed.

The instruction supply/issue unit 50 includes: an instruction fetch unit 52 that is connected to the instruction memory 40 and a PC unit to be described later in the execution unit 70, accesses an address in the instruction memory 40 indicated by the program counter held in the PC unit, and receives packets from the instruction memory 40; an instruction buffer 54 that is connected to the instruction fetch unit 52 and holds the packets temporarily; and an instruction register unit 56 that is connected to the instruction buffer 54 and holds three or less instructions included in each packet.

The instruction fetch unit 52 and the instruction memory 40 are connected to each other via an IA (Instruction Address) bus 42 and an ID (Instruction Data) bus 44. The IA bus 42 is 32-bit width and the ID bus 44 is 128-bit width. Addresses are supplied from the instruction fetch unit 52 to the instruction memory 40 via the IA bus 42. Packets are supplied from the instruction memory 40 to the instruction fetch unit 52 via the ID bus 44.

The instruction register unit 56 includes instruction registers 56a .about.56c that are connected to the instruction buffer 54 respectively and hold one instruction respectively.

The decoding unit 60 includes: an instruction issue control unit 62 that controls issue of the instructions held in the three instruction registers 56a.about.56c in the instruction register unit 56; and a decoding subunit 64 that is connected to the instruction issue control unit 62 and the instruction register unit 56, and decodes the instructions supplied from the instruction register unit 56 under the control of the instruction issue control unit 62.

The decoding subunit 64 includes instruction decoders 64a.about.64c that are connected to the instruction registers 56a.about.56c respectively, and basically decode one instruction in one cycle for outputting control signals.

The execution unit 70 includes: an execution control unit 72 that is connected to the decoding subunit 64 and controls each constituent element of the execution unit 70 to be described later based on the control signals outputted from the three instruction decoders 64a.about.64c in the decoding subunit 64; a PC unit 74 that holds an address of a packet to be executed next; a register file 76 that is made up of 32 registers of 32 bits R0.about.R31; arithmetic and logical/comparison operation units (AL/C operation units) 78a.about.78c that execute operations of SIMD (Single Instruction Multiple Data) type instructions; and multiplication/product-sum operation units (M/PS operation units) 80a and 80b that are capable of executing SIMD type instructions like the arithmetic and logical/comparison operation units 78a.about.78c and calculate a result of 65-bit or less length without lowering the bit precision.

The execution unit 70 further includes: barrel shifters 82a.about.82c that execute arithmetic shifts (shifts of complement number system) or logic shifts (unsigned shifts) of data respectively; a divider 84; an operand access unit 88 that is connected to the data memory and sends and receives data to and from the data memory 100; data buses 90 of 32-bit width (an L1 bus, an R1 bus, an L2 bus, an R2 bus, an L3 bus and an R3 bus); and data buses 92 of 32-bit width (a D1 bus, a D2 bus and a D3 bus).

The register file 76 includes 32 registers of 32 bits R0.about.R31. The registers in the register file 76 for outputting data to the L1 bus, the R1 bus, the L2 bus, the R2 bus, the L3 bus and the R3 bus are selected, respectively, based on the control signals CL1, CR1, CL2, CR2, CL3 and CR3 supplied from the execution control unit 72 to the register file 76. The registers in which data transmitted through the D1 bus, the D2 bus and the D3 bus are written are selected, respectively, based on the control signals CD1, CD2 and CD3 supplied from the execution control unit 72 to the register file 76.

Two input ports of the arithmetic and logical/comparison operation unit 78a are respectively connected to the L1 bus and the R1 bus, and the output port thereof is connected to the D1 bus. Two input ports of the arithmetic and logical/comparison operation unit 78b are respectively connected to the L2 bus and the R2 bus, and the output port thereof is connected to the D2 bus. Two input ports of the arithmetic and logical/comparison operation unit 78c are respectively connected to the L3 bus and the R3 bus, and the output port thereof is connected to the D3 bus.

Four input ports of the multiplication/product-sum operation unit 80a are respectively connected to the L1 bus, the R1 bus, the L2 bus and the R2 bus, and the two output ports thereof are respectively connected to the D1 bus and the D2 bus. Four input ports of the multiplication/product-sum operation unit 80b are respectively connected to the L2 bus, the R2 bus, the L3 bus and the R3 bus, and the two output ports thereof are respectively connected to the D2 bus and the D3 bus.

Two input ports of the barrel shifter 82a are respectively connected to the L1 bus and the R1 bus, and the output port thereof is connected to the D1 bus. Two input ports of the barrel shifter 82b are respectively connected to the L2 bus and the R2 bus, and the output port thereof is connected to the D2 bus. Two input ports of the barrel shifter 82c are respectively connected to the L3 bus and the R3 bus, and the output port thereof is connected to the D3 bus.

Two input ports of the divider 84 are respectively connected to the L1 bus and the R1 bus, and the output port thereof is connected to the D1 bus.

The operand access unit 88 and the data memory 100 are connected to each other via an OA (Operand Address) bus 96 and an OD (Operand Data) bus 94. The OA bus 96 and the OD bus 94 are each 32-bits. The operand access unit 88 further specifies an address of the data memory 100 via the OA bus 96, and reads and writes data at that address via the OD bus 94.

The operand access unit 88 is also connected to the D1bus, the D2 bus, the D3 bus, the L1 bus and the R1 bus and sends and receives data to and from any one of these buses.

The processor 30 is capable of executing three instructions in parallel. As described later, a collection of circuits that are capable of executing a set of pipeline processing including an instruction assignment stage, a decoding stage, an execution stage and a writing stage that are executed in parallel is defined as a "slot" in the present description. Therefore, the processor 30 has three slots, the first, second and the third slots. A set of the processing executed by the instruction register 56a and the instruction decoder 64a belongs to the first slot, a set of the processing executed by the instruction register 56b and the instruction decoder 64b belongs to the second slot, and a set of the processing executed by the instruction register 56c and the instruction decoder 64c belongs to the third slot, respectively.

Instructions called default logics are assigned to respective slots, and the instruction scheduling is executed so that the same instructions are executed in the same slot if possible. For example, instructions (default logics) regarding memory access are assigned to the first slot, default logic regarding multiplication are assigned to the second slot, and other default logic is assigned to the third slot. Note that a default logic corresponds one to one to a set of operations explained referring to FIG. 1AFIG. 1D. In other words, instructions with the first 2 bits of "01", "10" and "11" indicates default logic for the first, second and third slots, respectively.

Default logic for the first slot includes "Id" (load instruction), "st" (store instruction) and the like. Default logic for the second slot includes "mul1", "mul2" (multiplication instructions) and the like. Default logic for the third slot includes "add1", "add2" (addition instructions), "sub1", "sub2" (subtraction instructions), "mov1", "mov2" (transfer instructions between registers) and the like.

FIG. 4 is a diagram for explaining parallel execution boundary information included in a packet. It is assumed that a packet 112 and a packet 114 are stored in the instruction memory 40 in this order. It is also assumed that the parallel execution boundary information for the instruction 2 in the packet 112 and the instruction 5 in the packet 114 are "1" and the parallel execution boundary information for other instructions are "0".

The instruction fetch unit 52 reads the packet 112 and the packet 114 in this order based on values of the program counter in the PC unit 74, and issues them to the instruction buffer 54 in sequence. The execution unit 70 executes, in parallel, the instructions up to the instruction whose parallel execution boundary information is 1.

FIGS. 5A.about.5C are diagrams showing an example of the unit of executing instructions which are created based on parallel execution boundary information of a packet and executed in parallel. Referring to FIG. 4 and FIGS. 5A.about.5C, by separating the packet 112 and the packet 114 at the position of the instructions whose parallel execution boundary information is "1", the units of execution 122.about.126 are generated. Therefore, instructions are issued from the instruction buffer 54 to the instruction register unit 56 in order of the units of execution 122.about.126. The instruction issue control unit 62 controls issue of these instructions.

The instruction decoders 64a.about.64c respectively decode the operation codes of the instructions held in the instruction registers 56a.about.56c, and output the control signals to the execution control unit 72. The execution control unit 72 exercises various types of control on the constituent elements of the execution unit 70 based on the analysis results in the instruction decoders 64a.about.4c.

Take an instruction "add1 R3, R0" as an example. This instruction means to add the value of the register R3 and the value of the register R0 and write the addition result in the register R0. In this case, the execution control unit 72 exercises the following control as an example. The execution control unit 72 supplies to the register file 76 a control signal CL1 for outputting the value held in the register R3 to the L1 bus. Also, the execution control unit 72 supplies to the register file 76 a control signal CR1 for outputting the value held in the register R0 to the R1bus.

The execution control unit 72 further supplies to the register file 76 a control signal CD1 for writing the execution result obtained via the D1 bus into the register R0. The execution control unit 72 further controls the arithmetic and logical/comparison operation unit 78a, receives the values of the register R3 and the R0 via the L1 bus and the L2 bus, adds them, and then writes the addition result in the register R0 via the D1 bus.

FIG. 6 is a block diagram showing a schematic structure of each of the arithmetic and logical/comparison operation units 78a.about.78c. Referring to FIG. 6 and FIG. 2, each of the arithmetic and logical/comparison operation units 78a .about.78c includes: an ALU (Arithmetic and Logical Unit) 132 which is connected to the register file 76 via the data bus 90; a saturation processing unit 134 which is connected to the register file 76 via the ALU 132 and the data bus 92 and executes processing such as saturation, maximum/minimum value detection and absolute value generation; and a flag unit 136 which is connected to the ALU 132 and detects overflows and generates condition flags.

FIG. 7 is a block diagram showing a schematic structure of each of the barrel shifters 82a.about.82c. Referring to FIG. 7 and FIG. 2, each of the barrel shifters 82a.about.82c includes: an accumulator unit 142 having accumulators M0 and M1 for holding 32-bit data; a selector 146 which is connected to the accumulator M0 and the register file 76 via the data bus 90 and receives the values of the accumulator M0 and a register; a selector 148 which is connected to the accumulator M1 and the register file 76 via the data bus 90 and receives the value of the accumulator M1 and a register; a higher bit barrel shifter 150 which is connected to the output of the selector 146; a lower bit barrel shifter 152 which is connected to the output of the selector 148; and a saturation processing unit 154 which is connected to the outputs of the higher bit barrel shifter 150 and the lower bit barrel shifter 152.

The output of the saturation processing unit 154 is connected to the accumulator unit 142 and the register file 76 via the data bus 92.

Each of the barrel shifters 82a.about.82c executes arithmetic shift (shift in 2's complement system) or logical shift (unsigned shift) of data by operating its own constituent elements. It normally receives or outputs 32-bit or 64-bit data. Shift amount of the data to be shifted, which is stored in the register in the register file 76 or the accumulator in the accumulator unit 142, is specified using the shift amount stored in another register or an immediate value. Arithmetic or logical shift of data is executed within a range between 63 bits to the left and 63 bits to the


Free Web Sudoku Puzzles.
Solve with your browser.
9         3   7 1
6   2     4      
  1           6  
      6 5     2  
    4 9   7 1    
  2     4 8      
  5           4  
      4     9   6
8 4   1         2
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!