Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
Title: Ball cage
Patent Number: 6,922,897 Issued on 08/02/2005 to Adams,   et al.

Title: Overvoltage protection circuits that utilize capacitively bootstrapped variable voltages
Patent Number: 6,798,629 Issued on 09/28/2004 to Proebsting

Title: Power frequency electromagnetic field compensation system
Patent Number: 6,798,632 Issued on 09/28/2004 to Holmes,   et al.

Title: Trip device comprising an improved man-machine interface and circuit breaker comprising such a trip device
Patent Number: 6,798,630 Issued on 09/28/2004 to Del Vecchio,   et al.

Title: Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode
Patent Number: 6,865,663 Issued on 03/08/2005 to Barry

Title: Intake module having integrally housed ECU
Patent Number: 6,910,456 Issued on 06/28/2005 to Umemoto,   et al.

Title: Predictive processing method in a semiconductor processing facility
Patent Number: 6,766,285 Issued on 07/20/2004 to Allen, Jr.,   et al.

Title: Weighing apparatus
Patent Number: 6,809,270 Issued on 10/26/2004 to Fujita

Title: Internal combustion engine comprising a hydraulic system
Patent Number: 6,854,431 Issued on 02/15/2005 to Gaessler,   et al.

Title: Scarecrow gene, promoter and uses thereof
Patent Number: 6,809,234 Issued on 10/26/2004 to Benfey,   et al.

Title: Biopsy needle
Patent Number: 7,131,951 Issued on 11/07/2006 to Angel

Title: Biopsy needle
Patent Number: 7,131,951 Issued on 11/07/2006 to Angel

Title: Method for controlling a circumferential register in a web-fed rotary press
Patent Number: 6,766,737 Issued on 07/27/2004 to Glockner,   et al.

Title: Method and apparatus for measuring spinal distortions
Patent Number: 7,131,952 Issued on 11/07/2006 to Dickholtz, Sr.,   et al.

Title: Method and apparatus for measuring spinal distortions
Patent Number: 7,131,952 Issued on 11/07/2006 to Dickholtz, Sr.,   et al.

Title: Liquid crystal display having compensation capacitor
Patent Number: 7,142,261 Issued on 11/28/2006 to Chiang,   et al.

Title: Interchangeable flexible die
Patent Number: 6,766,733 Issued on 07/27/2004 to Collins

Title: Substituted cycloalkyl P1' hepatitis C virus inhibitors
Patent Number: 6,878,722 Issued on 04/12/2005 to Campbell,   et al.

Title: Multiaxis punch device
Patent Number: 6,766,723 Issued on 07/27/2004 to Yasoda,   et al.

Title: Image pickup device
Patent Number: 7,142,241 Issued on 11/28/2006 to Mukai

Title: Automatic self cleaning bladder relief system and failsafe
Patent Number: 7,131,964 Issued on 11/07/2006 to Harvie

Title: Automatic self cleaning bladder relief system and failsafe
Patent Number: 7,131,964 Issued on 11/07/2006 to Harvie

Title: Household appliance using water, namely, a washing machine, with improved device for reducing the water hardness
Patent Number: 6,766,812 Issued on 07/27/2004 to Gadini

Title: Semiconductor laser device
Patent Number: 6,768,755 Issued on 07/27/2004 to Inoue,   et al.

Title: Apparatus and method for performing symbolic resolution of modules using static representations of a trace
Patent Number: 6,766,511 Issued on 07/20/2004 to Berry,   et al.

Title: Polymer coated capacitor films
Patent Number: 6,798,642 Issued on 09/28/2004 to Decker,   et al.

Title: Gas discharge laser, method of operating a gas discharge laser, and use of a sintered filter
Patent Number: 6,798,814 Issued on 09/28/2004 to Geiger,   et al.

Title: Rotation sensor
Patent Number: 6,860,159 Issued on 03/01/2005 to Jin,   et al.

Title: Methods and apparatus for encoding LDPC codes
Patent Number: 6,961,888 Issued on 11/01/2005 to Jin,   et al.

Title: Lithographic apparatus, programmable patterning structure, device manufacturing method, and device manufactured thereby
Patent Number: 7,141,340 Issued on 11/28/2006 to Bleeker

Title: Method and apparatus for measurement using piezoelectric sensor
Patent Number: 6,989,623 Issued on 01/24/2006 to Zeighami

Title: Multi-mode mobile communications device with continuous mode transceiver and methods therefor
Patent Number: 6,957,081 Issued on 10/18/2005 to Leyh,   et al.

Title: Electrical wiring device with multiple types of wire terminations
Patent Number: 7,140,887 Issued on 11/28/2006 to Poh,   et al.

Title: System and method for annotation on a moving image
Patent Number: 7,119,814 Issued on 10/10/2006 to Meron,   et al.

Title: Dual-function three-axis positioning system
Patent Number: 7,084,533 Issued on 08/01/2006 to Botos,   et al.

Title: Hard bodied high capacity catch basin filtration system
Patent Number: 6,872,029 Issued on 03/29/2005 to Allard,   et al.

Title: Article information providing system and mediate apparatus
Patent Number: 7,020,682 Issued on 03/28/2006 to Homma,   et al.

Title: Floor hockey puck
Patent Number: 7,140,989 Issued on 11/28/2006 to Poruchny

Title: Method of allowing multiple, hardware embedded configurations to be recognized by an operating system
Patent Number: 7,020,723 Issued on 03/28/2006 to Beaudoin,   et al.

Title: Method for producing alkanolamines
Patent Number: 7,119,231 Issued on 10/10/2006 to Frauenkron,   et al.

Title: Digital watermark screening and detection strategies
Patent Number: 6,768,809 Issued on 07/27/2004 to Rhoads,   et al.

Title: Single data line sensing scheme for TCCT-based memory cells
Patent Number: 7,006,398 Issued on 02/28/2006 to Yoon,   et al.

Title: Quick-connecting coupler for hoses, pipes and faucets
Patent Number: 7,140,645 Issued on 11/28/2006 to Cronley

Title: Multi-layer golf ball
Patent Number: 7,140,978 Issued on 11/28/2006 to Nealon,   et al.

Title: Radio data communication apparatus and radio data communication method
Patent Number: 6,970,710 Issued on 11/29/2005 to Kikuchi

Title: Recessed luminaire having a dome-shaped reflector
Patent Number: 6,883,940 Issued on 04/26/2005 to Grajetzky,   et al.

Title: Interpolating a pixel from an intermediate line of a field
Patent Number: 7,142,249 Issued on 11/28/2006 to Hahn,   et al.

Title: Method for performing a camera function in a mobile communication terminal
Patent Number: 7,119,827 Issued on 10/10/2006 to Kang

Title: Bearing assembly equipped with rotation sensor to determine rotation and position of rotating element
Patent Number: 6,956,367 Issued on 10/18/2005 to Fujikawa,   et al.

Title: Solid electrolytic capacitor and method for producing the same
Patent Number: 6,790,384 Issued on 09/14/2004 to Konuma,   et al.

Title: Disposable diaper
Patent Number: 6,890,327 Issued on 05/10/2005 to Suzuki,   et al.

Title: Case tab-lock slitting and flap sealer in combination with a continuous radial motion case packing apparatus and method
Patent Number: 6,883,296 Issued on 04/26/2005 to Hartness,   et al.

Title: Jar lid opener
Patent Number: 6,935,207 Issued on 08/30/2005 to Mazza

Title: Multiple discharge-servo curve control method and device for an electrical discharge machine
Patent Number: 6,941,187 Issued on 09/06/2005 to Lu,   et al.

Title: Solar cell unit with removable layer
Patent Number: 6,809,252 Issued on 10/26/2004 to Winkeler

Title: Tire with improved endurance
Patent Number: 6,766,840 Issued on 07/27/2004 to Pereira,   et al.

Title: Receiving circuit, mobile terminal with receiving circuit, and method of receiving data
Patent Number: 6,768,769 Issued on 07/27/2004 to Hokao

Title: Gas laser oscillator
Patent Number: 6,768,761 Issued on 07/27/2004 to Hongu,   et al.

Title: System and method for noise reduction in thermodilution for cardiac measurement
Patent Number: 7,131,950 Issued on 11/07/2006 to Hamilton

Title: System and method for noise reduction in thermodilution for cardiac measurement
Patent Number: 7,131,950 Issued on 11/07/2006 to Hamilton

Title: Active pixel sensor array reset
Patent Number: 7,142,240 Issued on 11/28/2006 to Hua,   et al.

Title: CDMA receiver, path detection method, and recording medium on which path detection control program is recorded
Patent Number: 6,768,729 Issued on 07/27/2004 to Ohsuge

Title: Field adjustable pilot guard
Patent Number: 6,766,820 Issued on 07/27/2004 to Hoss

Title: Binaural synchronization
Patent Number: 6,768,802 Issued on 07/27/2004 to Baechler

Title: Vehicle seat air conditioning system
Patent Number: 6,871,696 Issued on 03/29/2005 to Aoki,   et al.

Title: Valve mechanism with a variable valve opening cross-section
Patent Number: 6,766,778 Issued on 07/27/2004 to Hammer

Title: Cooling module with axial blower and pressure regulated cross-flow fan
Patent Number: 6,766,774 Issued on 07/27/2004 to Kussmann

Title: Fluid conduction utilizing a reversible unsaturated siphon with tubarc porosity action
Patent Number: 6,766,817 Issued on 07/27/2004 to da Silva

Title: System and method for molecular optical emission
Patent Number: 7,115,916 Issued on 10/03/2006 to Avouris,   et al.

Title: Semiconductor device and semiconductor device producing system
Patent Number: 7,115,903 Issued on 10/03/2006 to Isobe,   et al.

Title: Administrating system of image forming apparatus and image forming apparatus
Patent Number: 6,999,191 Issued on 02/14/2006 to Yamada,   et al.

Title: Gimballed conditioning apparatus
Patent Number: 6,949,016 Issued on 09/27/2005 to de la Llera,   et al.

Title: Restrictive costs in network systems
Patent Number: 7,042,846 Issued on 05/09/2006 to Bauer

Title: Organic light emitting diode including ditches in a substrate
Patent Number: 6,825,612 Issued on 11/30/2004 to Lai

Title: Color liquid crystal panel having opening in color filter varied according to color, method for manufacturing the same, and color liquid crystal display device employing the same
Patent Number: 7,142,269 Issued on 11/28/2006 to Ikeno,   et al.

Method and apparatus for accurate profiling of computer programs Number:6,934,935 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Method and apparatus for accurate profiling of computer programs

Abstract: An object code expansion profiler equips a program for execution profiling by preprocessing the object code files of the program so as to add profiling monitoring code to the beginning of all or substantially all functions. The preprocessing includes, for each function, the steps of grouping the function's instructions into basic blocks, counting the number of cycles required to execute the instructions of the basic block, and inserting special monitoring code with the basic block. The special monitoring code is executed each time the basic block is executed, and updates the profiling information to reflect the number of cycles required to execute the basic block. Special handling is provided for profiling calls to the Operating System (OS). The resultant profiling information is converted into a call graph image most useful for human users. For each arc in the graph connecting a calling-function/parent-node to a called-function/child node, the displayed arc image has a width logarithmically proportional to the self+descendants time for the called function.

Patent Number: 6,934,935 Issued on 08/23/2005 to Bennett,   et al.


Inventors: Bennett; James (Menlo Park, CA); Anderson; Mark (Palo Alto, CA); Na; Choon Piaw (Saratoga, CA); Hastings; Reed (La Honda, CA)
Assignee: International Business Machines Corporation (Armonk, NY)
Appl. No.: 557737
Filed: April 25, 2000

Current U.S. Class: 717/127; 717/150; 717/151; 717/156; 717/158; 717/130; 718/100; 718/102
Intern'l Class: G06F 009/44; G06F 009//46
Field of Search: 717/127,130,151,156,158,136,150 718/100,102


References Cited [Referenced By]

U.S. Patent Documents
4845615Jul., 1989Blasciak.
4937740Jun., 1990Agarwal et al.
5047919Sep., 1991Sterling et al.
5142679Aug., 1992Owaki et al.
5164969Nov., 1992Alley et al.
5193180Mar., 1993Hastings.
5212794May., 1993Pettis et al.
5247651Sep., 1993Clarisse.
5265254Nov., 1993Blasciak et al.
5313616May., 1994Cline et al.
5333304Jul., 1994Christensen et al.
5335344Aug., 1994Hastings.
5355484Oct., 1994Record et al.
5355487Oct., 1994Keller et al.
5359533Oct., 1994Ricka et al.
5450586Sep., 1995Kuzara et al.
5465258Nov., 1995Adams.
5539907Jul., 1996Srivastava et al.
5732273Mar., 1998Srivastava et al.
5740443Apr., 1998Carini.
5828883Oct., 1998Hall.
5963740Oct., 1999Srivastava et al.
6049666Apr., 2000Bennett et al.
6126329Oct., 2000Bennett et al.


Other References

"Effective Simulation of Multiprogramming", W.P. Dawkins et al., ACM SIGMETRICS Performance Evaluation Review, Proceedings of the conference on Measurement and modeling of computer systems, Apr. 1990, vol. 18 Issue 1.
"Profile Guided Code Positioning", Karl Pettis and Robert C. Hansen, Hewlett-Packard Company, California Language Laboratory, Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and implementation, Jun. 1990.
Graham et al., GPROF; "A Call Graph Execution Profiler," p. 120-126 (1-16) 1982.
Hilfingen. "A Memory Allocation Profiler for C and Lisp Programs," p. 223-47, 1988.
Ponder et al., "Inaccuracies in Program Profilers," p. 459-467, 1987.
Graham et al., "An Execution Profiler for Modular Programs," p. 671-685, Mar. 10, 1983.
Smith, Michael D., "Tracing With Pixie," Apr. 4, 1991, Stanford University Technical Report No. CSL-TR-91-497, p. 1-29.
"Pixie", UNIX man p. 1-2.
Larus, James R. et al., "Re-Writing Executable Files to Measure Program Behavior," Mar. 25, 1992, Univ. of Wisconsin, Computer Science Dept.
Ball, Thomas, "Optimally Profiling and Tracing Programs," Sep. 6, 1991, Univ. of Wisconsin, Computer Science Department.

Primary Examiner: Banankhah; Majid

Parent Case Text



CROSS REFERENCE TO RELATED APPLICATIONS

This is a continuation of application Ser. No. 08/871,247, filed Jun. 9, 1997, now U.S. Pat. No. 6,126,329 which is a continuation of application Ser. No. 08/074,428, filed, Jun. 8, 1993, now abandoned which are both hereby incorporated by reference.
Claims



1. A computer implemented method for profiling execution of a computer program having functions, the method comprising the steps of:

a) augmenting the program by performing the following augmenting steps for substantially all functions of the program:

i) selecting a function;

ii) adding function entry profiling code to the selected function;

iii) adding function exit profiling code to the selected function; and

b) executing the augmented program and collecting profiling data for substantially all executed functions of the program.

2. The method of claim 1, wherein the program augmenting step comprises performing said augmenting steps for all functions of said program, and wherein the executing step comprises collecting profiling data for all executed program functions.

3. The method of claim 1, wherein the step of adding function entry profiling code comprises adding function entry parallel stack simulation code for maintaining a parallel stack indicating call chains, and wherein the step of adding function exit profiling code comprises adding function exit parallel stack simulation code for maintaining the parallel stack.

4. The method of claim 1, wherein the step of adding function exit profiling code comprises adding code to determine a self+descendants time for a call to the selected function and to incorporate the self+descendants time into profiling information indexed by an identifier for the selected function.

5. The method of claim 4, wherein the step of adding function exit profiling code comprises adding code to incorporate the self+descendants time into profiling information indexed by a partial call chain for the call to the selected function.

6. The method of claim 5, wherein the partial call chain identifies the selected function.

7. The method of claim 6, wherein the partial call chain also identifies an immediate caller of the selected function.

8. The method of claim 1, wherein the step of adding function exit profiling code comprises adding code to determine timing statistics for a call to the selected function and to incorporate the timing statistics into profiling information indexed by an identifier for the selected function and by a partial call chain for the call to the selected function.

9. The method of claim 1, wherein the program has relocatable object files containing substantially all of the functions of the program, and wherein the step of augmenting the program comprises adding profiling code to substantially all relocatable object files of the program.

10. The method of claim 9, wherein said relocatable object files comprise instructions, and wherein the step of augmenting the program further comprises for each function the steps of:

parsing the function instructions into blocks; and

adding block profiling code to each block.

11. The method of claim 9, wherein said relocatable object files comprise instructions, and wherein the step of augmenting the program further comprises for each function the steps of:

parsing the function instructions into basic blocks, wherein a basic block consists of a contiguous sequence of at least one instruction of which the first instruction is a program control transfer destination, the last instruction is a program control transfer instruction, and any instructions between the first and last instructions are neither program control transfer destinations nor program control transfer instructions; and

for each basic block, performing the steps of selecting a basic block;

analyzing the selected basic block to determine a fixed number of clock cycles for the selected basic block; and

adding basic block profiling code so as to be executed whenever the basic block is executed, said basic block profiling code operating to add the determined fixed number of clock cycles to a clock cycle accumulator for the function.

12. The method of claim 11, wherein the added basic block profiling code operates to add the determined fixed number of clock cycles to a clock cycle accumulator for the selected basic block of the function.

13. The method of claim 11, wherein the step of analyzing the selected basic block comprises determining whether the selected basic block contains a variably determinate length operating system call, and wherein the step of adding basic block profiling code comprises, if the selected basic block contains a variably determinate length operating system call, the steps of

adding simulation maintenance code to simulate constraints that determine execution time for the variably determinate length operating system call;

adding simulation evaluation code to evaluate the constraints and to determine a number of clock cycles required to execute the variably determinate length operating system call; and

adding simulation profiling code to add the determined number of clock cycles to a clock cycle accumulator for the function.

14. The method of claim 13, wherein the added simulation profiling code operates to add the determined fixed number of clock cycles to a clock cycle accumulator for the selected basic block of the function.

15. The method of claim 11, wherein the step of analyzing the selected basic block comprises determining whether the selected basic block contains an operating system (OS) call, and wherein the step of adding basic block profiling code comprises, for at least one selected basic block containing an OS call, the OS call timing profiling steps of

adding timing start code to determine a start time of day immediately before executing the OS call;

adding timing end code to determine an end time of day immediately after executing the OS call;

adding OS call timing calculation code to determine from the start time and end time a number of clock cycles required to execute the OS call; and

adding OS call timing recording code to add the determined number of clock cycles to a clock cycle accumulator for the function.

16. The method of claim 15, wherein said OS call timing profiling steps are performed for substantially only basic blocks containing indeterminate length OS calls.

17. The method of claim 15, wherein the added OS call timing recording code operates to add the determined fixed number of clock cycles to a clock cycle accumulator for the selected basic block of the function.

18. A computer system configured as a new machine by a computer program, comprising means for performing the steps as recited in claim 1.

19. A computer system configured as a new machine by a computer program, comprising means for performing the steps as recited in claim 11.

20. A computer implemented method for profiling execution of a computer program having a function, the method comprising the steps of:

a) augmenting the program by adding function profiling code to the function, said function profiling code, for each particular call to the function when executed, serving to determine an execution time for the particular call to the function and to incorporate said execution time into profiling data for the function, said function profiling code including at least function entry profiling code;

b) executing the augmented program and collecting said profiling data for the function; and

c) presenting said profiling data to a user.

21. The method of claim 20, wherein the function has a varying execution time over a plurality of calls when said program is executed, wherein said profiling data comprises a global minimum execution time for the function and a global maximum execution time for the function over all calls to the function, and wherein said step of augmenting the program by adding function profiling code comprises adding function profiling code to determine the global minimum execution time for the function and the global maximum execution time for the function.

22. The method of claim 21, wherein substantially all calls to the function can be identified by one of a set of partial call chains, and wherein said step of augmenting the program by adding function profiling code comprises adding function profiling code to determine profiling data indexed by each partial call chain in said set of partial call chains.

23. The method of claim 22, wherein each said partial call chain identifies a caller of said function, wherein said profiling data comprises for each said caller a caller-wise minimum execution time for the function and a caller-wise maximum execution time for the function over all calls to the function originating said caller, and wherein said step of augmenting the program by adding function profiling code comprises adding function profiling code to determine for each said caller the caller-wise minimum execution time for the function and the caller-wise maximum execution time for the function.

24. A computer system configured as a new machine by a computer program, comprising means for performing the steps as recited in claim 20.

25. A computer implemented method for profiling execution of a computer program having functions, the method comprising the steps of:

a) augmenting the program by performing the following augmenting steps for substantially all functions of the program:

i) selecting a function;

ii) adding function profiling code to the function, the function profiling code including;

function entry profiling code added to the beginning of the selected function, the function entry profiling code including function entry parallel stack simulation code for maintaining a parallel stack indicating call chains,

function exit profiling code added to the end of the selected function, the function exit profiling code including function exit parallel stack simulation code for maintaining the parallel stack, and

function call profiling code for recording dynamic call information obtained from the parallel stack;

b) executing the augmented program and collecting profiling data for substantially all executed functions of the program;

c) processing the recorded dynamic call information to create a dynamic call graph; and

d) presenting the dynamic call graph to a user.

26. A computer system configured as a new machine by a computer program, comprising means for performing the steps as recited in claim 25.

27. A computer implemented method for profiling execution of a computer program having functions, the functions having instructions, the method comprising the steps of:

a) augmenting the program by adding profiling code to the program, including at least function entry profiling code;

b) executing the augmented program and collecting profiling data for functions of the program, said profiling data including self+descendants times for the functions;

c) constructing a call graph for the program; and

d) displaying to a user a representation of the call graph including arcs connecting nodes, wherein each arc links a parent function to a child function and has a width corresponding to the self+descendants time for the child function.

28. A computer system configured as a new machine by a computer program, comprising means for performing the steps as recited in claim 27.

29. A computer system configured as a new machine by a computer program, comprising means for performing the steps of:

a) augmenting the program by adding function profiling code to the program;

b) executing the augmented program and collecting profiling data for functions of the program, said profiling data including self+descendants times for the functions;

c) constructing a call graph showing function calls for the program;

d) providing a pruning time filter value; and

e) displaying to a user a representation of the call graph including arcs connecting nodes, wherein each arc links a parent function to a child function, wherein a set of arcs is automatically pruned from said displayed call graph representation, wherein each of said set of said automatically pruned arcs connects to a child function having a self+descendants time less than said pruning time filter value.
Description



BACKGROUND OF THE INVENTION

The present invention relates generally to a method and apparatus for profiling computer programs. In particular, one aspect of the present invention relates to a method for inserting additional instructions and data into existing relocatable object files of a computer program to collect very detailed and accurate profiling information. Another aspect of the present invention relates to a method for converting profiling information into image data for communication to a user/developer.

Profiling is a technique commonly used by software developers to gain information about the operation of their code. This information can then be used to improve and optimize the code. Profiling is done by processing the program under development with a profiler. The profiler adds code to the executable file so that it records various types of statistical data as the program runs. The type of data recorded varies with the profiler used.

One traditional and widely used profiler under UNIX is known as prof. Object code files are supplied to prof, which links the object files and adds monitoring routines to be executed at the beginning and end of the program. The initial routines set up a program sampler triggered by timing interrupts. The program sampler, which is invoked with a period of 1/100 of a second, records the value of the program counter register. The routines added to the end of the program take the saved data and create an output file showing the time apparently spent in the various routines; this output can be organized as a histogram.

This information can be useful to a programmer by indicating which routines consume the most execution time and are therefore the best places to focus on improvement. prof suffers from a number of limitations: it requires Operating System ("OS") support for its interrupts; it shows where the execution is at the sampling times, but does not show "why" (does not identify the call chain that led to current function being called); and it relies on timed interrupts with a 1/100 second sampling period. The sampling period causes sampling errors because with today's fast processing speeds a great many on instructions can be executed in 1/100 of a second; many routines have total execution times shorter than this. The execution of short routines can go entirely unnoticed by prof, and sampling errors can cause significant inaccuracies in the measured execution times even for longer routines. The 1/100 sampling period could be theoretically shortened, but this has been found to add excessive overhead, so prof implementations generally do not allow "tuning" of the sampling rate. Furthermore, because the timing interrupts that trigger the sampling mechanism are not generated during OS function calls, OS function calls appear "free" under prof, whereas in actuality they (and therefore the program functions calling them) might be responsible for a primary portion of the total execution time.

Another commonly used profiler, known as gprof, relies on recompilation of source code to add more monitoring features than are present in prof. In addition to the interrupt sampling performed by prof, gprof adds code to the beginning of each function to record which function called it. A significant problem arises, however, if the program employs library code for which source files are unavailable. gprof is then unable to process the functions defined in these files, and their callers cannot then be recorded. When the execution sequence passes into one of these unmonitored functions, the call trace is severed. If the unmonitored function initiates calls that lead back into a monitored function, this calling of the monitored function will be disconnected from the remainder of the call trace (such situations are termed "spontaneous function calls").

A simple program, in which start calls main, main calls function A, function A calls functions B and C, and function B calls function D, might result in gprof output statistics similar to those shown in Table I. The time column shows in seconds first the self time for the function, then the self+descendants time, which is the sum of the self time for the function of concern plus the self times of all the descendants of the function of concern. Typical gprof output tables would typically also include indications of how many times each function was called by each of its caller functions.
TABLE I
routine callers time
start  0.1/101.1
main (start)  1.0/101.0
A (main)  10.0/100.0
B (A) 20.0/60.0
C (A) 30.0/30.0
D (B) 40.0/40.0


Typically, a programmer might then manually convert this table into a graph, such as shown in FIG. 1A, but this process is very complicated and painstaking for larger programs. To improve the ease of use of gprof output, some postprocessors are now available that can process the source code for a program to produce a static call graph and then overlay the gprof statistics on top of the static call graph. (In prior art systems the gprof statistics have been shown either by histograms next to each graph node, or by color coding of graph nodes.) The need for source code, however, is a serious drawback to such postprocessing, and significantly limits its usefulness.

Further limitations to the usefulness of gprof include those stemming from the use of timing interrupts as discussed above with reference to prof (sampling errors and inability to track OS calls), and poor handling of multiply called functions. For instance, suppose that in the above example, function D was called by both functions B and C, in which case the call graph would be as illustrated in FIG. 1B. The self time for function D would then have to be split up for allocation between the self+descendants times for functions B and C.

To make this allocation gprof relies upon an assumption of an average case distribution. That is, gprof assumes that all calls to a particular function require the same amount of time for completion. The self time for function D would then be allocated to the self+descendants times for functions B and C proportionally to the number of times that each of them called function D. However, because the arguments passed to a function can drastically effect the amount of time required to complete the function call, the average case distribution assumption employed by gprof can result in significant inaccuracies in reported times.

For example, if functions B and C both called function D 40 times, and the calls originated by function B consumed three fourths of function D's time (with the calls originated by function C consuming the remaining one fourth), the correct profiling statistics would be as shown in Table II. Because of its average case distribution assumption, however, gprof would allocate function D's time equally between functions B and C, and would prepare inaccurate profiling statistics as shown in Table III.
TABLE II
routine callers time
start  0.1/101.1
main (start)  1.0/101.0
A (main)  10.0/100.0
B (A) 20.0/50.0
C (A) 30.0/40.0
D (B) 40.0/40.0
TABLE III
routine callers time
start  0.1/101.1
main (start)  1.0/101.0
A (main)  10.0/100.0
B (A) 20.0/40.0
C (A) 30.0/50.0
D (B) 40.0/40.0


Like prof, gprof concentrates on zero order profiling statistics. That is, timing statistics for a particular function are recorded with that function as the only reference point. gprof does not record any first order or other higher order timing distribution statistics, that is, timing statistics regarding a particular first function having been called by a particular second function. The only rudimentary first order statistics of any kind recorded by gprof are the number of times a called function was called by each of its caller functions. Because gprof cannot record first order timing statistics, however, gprof is forced to use the average case distribution assumption to calculate self+descendants time in many situations, as described above.

An improved profiler that remedies the deficiencies of prior art profiling techniques is clearly desirable.

SUMMARY OF THE INVENTION

According to the present invention an improved method and apparatus are provided for performing execution profiling. A first aspect of the invention is directed to a method for equipping a program for execution profiling that comprises the step of preprocessing the object code files of the program so as to add profiling monitoring code to the beginning of all or substantially all functions, without needing source code, so that profiling information is available for the entire execution of the program.

According to one facet of this first aspect of the invention, timing information is collected by forming tallies of the number of cycles required to execute the profiled instructions. The preprocessing includes, for each function, the steps of grouping the function's instructions into basic blocks, counting the number of cycles required to execute the instructions of the basic block, and inserting special monitoring code with the basic block. (Later, when the target program is run, the special monitoring code is executed each time the basic block is executed, and updates the profiling information to reflect the number of cycles required to execute the basic block. In some embodiments the initial profiling code for a function also records at least a partial call chain, indicating where the function was called from, to enable compiling second order or higher order timing statistics.)

According to another facet of this first aspect of the invention, special handling is provided for profiling calls to the Operating System (OS). OS calls can be categorized into three groups: (1) calls that may require an indeterminate amount of time ("indeterminate length"); (2) calls that substantially always require a certain, fixed amount of time ("fixed determinate length"); and (3) calls that substantially always require an amount of time determined by identifiable program status constraints, but which therefore vary according to program status ("variably determinate length"). The preprocessing step includes inserting code before and after at least indeterminate length OS calls to start a timing procedure before the OS call, end the timing procedure after the OS call, determine the elapsed time and then incorporate this into the profiling information. In some embodiments, the time required for fixed determinate length OS calls is determined during preprocessing and then tracked during execution in the same manner as for basic blocks, described above. In preferred embodiments, the preprocessing step includes adding simulation code to profile variably determinate length OS calls. The simulation code models or tracks the program status constraints that determine how long the OS call will require. For each variably determinate length OS call, code is added that evaluates the simulated program status constraints, determines from this the time required for the OS call, and updates the profiling information accordingly.

According to yet another facet of this first aspect of the invention, the preprocessing step includes adding terminal profiling code at the end of each function of the program. The terminal profiling code for a called function passes profiling information, including the self-plus-descendants time for the called function, back to the caller, so the profiling code for the caller function can accurately maintain self-plus-descendants time without the need for inaccurate distribution assumptions such as the average case distribution assumed by the prior art.

A second basic aspect of the present invention is directed to a method of converting profiling timing information into useful image data for human users. A first facet of this aspect of the invention is directed to using the detailed profiling information gathered as discussed above to automatically create a dynamic call graph, rather than having to process source code and simply producing a static call graph.

A second facet of this aspect of the invention results from discovery of the fact that human users have a superior intuitive understanding of much of the basic timing statistics, especially the self+descendants time, as a "river of time," in which the total amount of time used by the program is allocated to, or "passes through" the main function, after which it splits into main "tributaries" for the primary routines called from the main function, with these main tributaries splitting into smaller tributaries for the secondary routines called by the primary routines, etc. According to this facet of the invention, therefore, visual presentations of profiling data are constructed so as to correspond to the human user's "river of time" intuitive understanding of the data. A method of converting profiling information into a call graph according to this facet of the invention comprises, for each arc in the graph connecting a calling-function/parent-node to a called-function/child node, the steps of converting the self+descendants time for the called function into a width measure, and displaying the call graph with each arc having a display width corresponding to the width measure determined for that arc. In a preferred embodiment, the display width of an arc is computed to be logarithmically proportional to the self+descendants time for the called function.

A further understanding of the nature and advantages of the invention may be realized by reference to the remaining portions of the specification and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a call graph with profiling information for a simple program.

FIG. 2 is a block diagram showing a relocatable object file being expanded by a profiling expansion means according to an aspect of the invention into a new relocatable object file.

FIG. 3 is a flowchart illustrating a general procedure for implementing the profiling scheme of the present invention by modifying the object files for an executable program.

FIG. 4 is a flowchart illustrating the general process of adding to a function the profiling code of the preferred embodiment.

FIG. 5 illustrates a flowchart showing details of the process of adding time profiling code to each basic block, according to a preferred embodiment of the present invention.

FIGS. 6A-6E illustrate the operation of various types of profiling code during execution of the target application.

FIG. 7 shows a view selection window of a particular embodiment of the present invention that allows a user to select various view windows for displaying data.

FIG. 8 shows a reports display window of a particular embodiment of the present invention.

FIG. 9 shows a coverage display window of a particular embodiment of the present invention.

FIG. 10 shows a function detail display window of a particular embodiment of the present invention.

FIGS. 11A-D illustrate function detail windows for functions A, B, C, and D (of the program described with reference to FIG. 1B and Table II), respectively.

FIGS. 12A-C show call graph windows of a particular embodiment of the present invention.

FIG. 13 shows a flow chart illustrating the process of automatically time filtering functions for display according to an aspect of the present invention.

FIGS. 14A-C show examples of a call graph displayed according to this aspect of the preferred embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The preferred embodiment of the present invention employs numerous novel techniques for providing enhanced profiling capabilities to a user. These techniques improve all three major prongs of the profiling process: preprocessing an application to add profiling code, executing the augmented application to collect profiling information, and then presenting the collected information to the user.

I. Augmenting the Application and Collecting Profiling Data

One aspect of the present invention is directed to a method for equipping a program for execution profiling by preprocessing the object code files of the program so as to add profiling monitoring code to all or substantially all functions of the program. The detailed analysis and preprocessing made possible by object code insertion improves both the profiling coverage of the program and the profiling accuracy. Object code insertion also enables the use of a technique according to another aspect of the invention, which is the maintenance of profiling timing information according to the number of CPU cycles required to perform the instructions of the application routines. The execution cycles for most instructions are determined during preprocessing; for other instructions, profiling code is added to dynamically determine their timing during execution.

A. Profiling by Object Code Insertion

In the preferred embodiment, the insertion of additional code is performed in the manner taught by U.S. Pat. No. 5,193,180, which is herein incorporated by reference in its entirety, for all purposes, and with special reference to its teachings regarding the implementation of monitoring schemes. The particular embodiment described herebelow is intended for a SUN SPARC station, so the description is particular in some respects to that system.

FIG. 2 illustrates a preexisting object code file 1 ("oldfile.o") being augmented by profiling expansion means 5 to form a new object code file 1′ ("newfile.o"). In the preferred embodiment, profiling expansion means 5 is a general purpose computer having a memory and operating under the control of a computer program. In the this embodiment, all object files of the target application program are processed by a profiling program that uses the above expansion means to insert profiling monitoring code to each function of the program. All of the object code files for an executable program are processed, and instructions are added to implement the profiling scheme described below when the application program is executed. Because source code is not needed, all of the files and functions of the program can be monitored. This provides complete profiling coverage of the program, rather than only partial coverage where source code was unavailable (as with the prior art).

The general procedure of implementing a profiling scheme to have full coverage for an executable program, by modifying all of the object code files for the executable program, linking the modified program and then running it, is illustrated in FIG. 3. A first object file or library for the executable program is selected in block 100. If the file is determined to be a simple object file rather than a library, in block 110, then the object file is processed in block 120 to implement the profiling, by the expansion process described in the above referenced U.S. Pat. No. 5,193,180. If the file is determined to be a library in block 110, then each object file that contributes to the library is processed in block 130, in the same manner that a simple object file is processed by block 120. Then, in block 140, the library is rebuilt from the modified object files.

After the object file or library has been processed, block 150 determines if any unprocessed files remain for the executable file. If so, block 160 selects an unprocessed file and then the steps of blocks 110-150 are repeated. Once it is determined in block 150 that all files for the original executable program have been processed, all necessary linkage is performed in block 170. In the preferred embodiment this linkage includes linkage to a special library of profiling routines. These profiling routines include a program start routine and a program exit routine that are linked to execute at the start and finish, respectively, of the program. Also included in the profiling library are function entry and exit profiling routines, and various other timing routines described below. After linkage, the program is executed in block 180; during this execution, the desired profiling is performed.

B. Preprocessing for Cycle-Counting Profiling

In the preferred embodiment, profiling expansion means 5 is provided with detailed information regarding the computer system on which the profiled program is to be run. This information includes the type of CPU employed by the computer system, the clock speed or clock cycle period of the CPU, the cache architecture, the presence of coprocessors, and any other information necessary to determine the number of clock cycles required to execute the instructions in the instruction set for that CPU.

Profiling initialization and profiling termination routines are added so as to be executed at the beginning and end, respectively, of the program. Also, every function in every object file for the application program is processed to have added to it the profiling code of the present invention. The general process of adding to a function the profiling code of the preferred embodiment is illustrated by the flowchart of FIG. 4. Step 230 of FIG. 4, the step of adding basic block profiling code, is illustrated in further detail in FIG. 5. The operation of the profiling code is explained primarily with reference to FIG. 6.

In the process of FIG. 4, first, at step 200, the instructions of the function are parsed into basic blocks. A basic block is a contiguous sequence of instructions that has no branches into or out of its body except as follows: a basic block begins with an instruction that is a program control transfer destination-either the destination of a branch (including the default or fall-through destination of a conditional branch) or the destination of a call instruction, and ends with a program control transfer instruction-either a branch or exit instruction. If the computer architecture entails delay slots following branches, such delay slots are considered part of the basic block of the branch statement they follow. The smallest possible basic block is two instructions: the destination of a branch/call instruction and that is itself a branch/exit instruction, followed by its delay slot. In some embodiments a call instruction, which is akin to a branch followed by a return, is treated as a basic block terminator, although in the preferred embodiment call statements do not terminate basic blocks.

The usefulness of parsing the code into basic blocks is that each basic block is always executed as a unit: if the first instruction is executed then the entire block is executed; if the first instruction is not, then the remainder of the block is not. After parsing the function code into basic blocks, each of the basic blocks is analyzed at step 210, as described further below. Next, at step 220, function entry profiling code, basic block profiling code, and function exit profiling code are added to the function as determined according to the above analysis. In the preferred embodiment the function entry and exit profiling codes take the form of calls to special routines in a profiler library, which are added to the beginning of the function and to any basic blocks ending in a function exit instruction, respectively. It should be noted that the parsing, analysis, and code insertion steps can be intermixed.

The process of profiling code to each basic block is illustrated in further detail in FIG. 5. At step 300, basic block entry code is added to the start of the basic block. (Later, during execution of the application program, this basic block entry code will signal another entry into the basic block, by incrementing an executions accumulator, which tallies how many times the basic block was executed). In the preferred embodiment step 300 is only performed if source code is available for the function and if the source code has been compiled for debugging, because this particular profiling information is only used in the show-coverage display of source code, described below. The profiling code determines whether source code is available by looking for debugging symbols provided by the compiler.

Next, at step 310, for all of the instructions requiring a fixed number of clock cycles to execute, these cycles are tallied, including the delay slot for the terminating branch. This step includes an architecture-specific analysis of the instructions to determine the required number of cycles for execution. In the preferred embodiment, this analysis includes identifying stalls, such as caused by loads and moves, and accounting for the amount of time required, including the stall. (On many systems, the instruction following a load or move will be executed when the load has been initiated, without waiting for it to be completed. An exception occurs when the instruction following the load requires the use of the destination address or register of the load instruction, in which case a stall occurs until the load has been completed.)

After the fixed instruction cycles for the basic block have been tallied, fixed cycle profiling code is added to the basic block at step 320. (Later, during execution of the application program, each time the basic block is executed the fixed cycle profiling code will increment the self time accumulator for the current function by the determined number of cycles.) In the preferred embodiment, fixed length Operating System (OS) calls, discussed further below, are treated in steps 310 and 320 just like other fixed execution length instructions and charged to the self time of the current function. In alternative embodiments, however, they are tracked as independent functions and their execution time is not tallied in step 310. Instead, step 320 includes adding code to update self time accumulators for the OS call and to charge the execution time for the OS call to the self+descendants accumulator for the present function. Similarly, in the preferred embodiment other OS calls, discussed below, are charged against the self time of the current function, but could instead be tracked as independent functions.

After step 320, at step 330 any desired simulation profiling code is added to the basic block. This is generally added only to basic blocks containing any variably determinate length instructions, but the completeness of the simulation may require that simulation code be added even to some basic blocks that do not contain any variably determinate length instructions. Finally, at step 340, if the basic block includes an indeterminate length instruction, the indeterminate length instruction is wrapped in timing profiling code. Variably determinate length instructions and indeterminate length instructions are described in further detail below, in the discussion regarding OS calls.

By performing execution timing according to information obtained during preprocessing, profilers according to the present invention essentially create virtual cycle counters. That is, they generally do not count the actual number of cycles taken by the program during the profiling run, but rather, most cycles are counted according to what the preprocessing determines the unaltered program would take in the absence of additional profiling code.

C. Execution of Cycle-Counting Profiling Code

FIGS. 6A-6E illustrate the operation of various types of profiling code during execution of the target application at step 180 of FIG. 3. The two main data structures relied on by the profiling code are individual function accumulators, and a simulated parallel call stack.

In some embodiments, the function accumulators are statically linked into the application code, substantially adding to the symbol table. In the preferred embodiment, however, function accumulators are located dynamically, as discussed below. Each function accumulator has a set of primary profiling registers for the overall function, and an array of registers for maintaining information on the usage distribution for basic blocks (including the number of times "hit" and the total number of cycles consumed). The primary registers of the function accumulator track 1) the number of calls to the function; 2) the minimum and maximum self times for the function, over all calls; 3) the sum of the self times for the function, over all calls; 4) the sum of the self+descendants times for the function, over all calls; and 5) the head of a linked list of caller records for all callers of the function. The min and max self times are particularly helpful when profiling functions that undergo substantial initialization the first time they are called. The primary registers optionally also track the number of register window spills (see below) and the aggregate amount of time spent in them, and the number of and aggregate time for OS calls.

Each of the caller records contains an identification of the caller, a cumulative counter for the number of times the current function has been called by the particular caller function, and a cumulative counter for the total number of self+descendants cycles from the current function that are attributable to the particular caller function. Note that the caller identification can simply be a pointer to the caller's accumulator. In some embodiments the caller records also track min and max self times for the current function when called by that particular caller function. These caller records serve to track profiling data by a call chain one call deep (the caller function); more extensive caller records can be maintained and indexed by longer call chains so as to provide higher order timing statistics.

The parallel stack serves both as a source for local counters that are used to store the timing statistics for the each extant instantiation of each function, and also as a call chain structure to allow for the proper attribution of self+descendants time. Each frame on the parallel stack identifies the corresponding function, contains or points to the local variables tracking the that particular function call, and, so as to properly profile "tail call" optimized code, identifies the exit point for the function. This latter tracking can often cause the simulated parallel stack to differ substantially from the actual stack.

Tail call optimization is a technique allowable when function A calls function B, and function B as its last instruction calls function C and returns this value to function A. Under a standard compiler, the call to function C would result in another frame being pushed on the stack, and the return from function C would be followed by an immediate return with the same value from function B to function A. In order to minimize the depth of the stack, in call chain optimization the call to function C would result in function B essentially giving its position on the stack over to function C, so that at the return from function C, the immediately preceding function on the stack, to which function C would return, would be function A (bypassing function B). If the parallel stack were also maintained in this manner the correct profiling information could not be attributed to function B. So, according to an aspect of the present invention, the frame for function B would remain on the parallel stack, but the frame for function C would point to function A as its exit point.

In some embodiments, frames on the parallel call stack simply identify each calling function by name, or by some other identifier for the function as a whole. In other embodiments, however, frames on the parallel stack frame also identify call site. In these embodiments, if function A has two distinct call instructions to function B, each of these calls would be distinguished on the parallel stack frame, and would be distinguished in the tracked profiling data as well. For example, the execution times for function B would get rolled up to function A in the usual manner, but presentations of the allocation of function B's times to its various callers would have one allocation to the first call site in function A and a separate allocation to the second call site in function A.

When presenting information about a call site for which a linker symbol is unavailable, such as in the above case with the two call sites within function A, the call site will be labeled as an offset from the closest previous location for which a symbol is available. For example, the first of the above mentioned call sites in function A might be identified as "A+20", and the second call site might be identified as "A+65".

1) The Function Entry Profiling Routine

FIG. 6A illustrates the operation of the function entry profiling routine. First, at step 400 the location of the function accumulator is looked up in a global table. The first time the function is run, the accumulator location table entry is initialized by executing a set of instructions created during the preprocessing step. These instructions are
sethi V1, %g3
jmpl [%g4 + FNOFFSET], %g2
or V2, %g3

During the preprocessing step the position of the accumulator relative to the function offset was fixed and the values V1 and V2 were determined so that the first and third instructions would cause g3 to contain the location of the accumulator relative to the jump1 statement. The second statement produces an object code having a constant value, and is thus used as a flag that can be searched for either to discover the accumulator locating instructions, or to determine the existence of the function. This latter aspect can be used for counting and reporting the total number of functions in the program, whether executed or not, and whether or not they are assigned symbols in the symbol table.

The second step of the function entry code is to initialize a new frame on the parallel stack at step 401, and initialize the local counters, which include self time and self+descendants time. Next, at step 402, the caller record is updated. This step includes the substeps of first determining the caller from the parallel call stack, and then examining the caller linked list from the current function accumulator to see if there is already an entry. If not, the an entry is created. Then, the number of times hit counter for the caller record is updated. Next, at step 402, the return point for the current function is recorded. Finally, at step 403, the global number of times hit counter for the current function is updated.

2) The Function Exit Profiling Routine

The operation of the function exit profiling code is illustrated in FIG. 6B. When executed, this routine first updates the current function accumulator according to the local counts, as indicated by step 410. Next, at step 411, the caller record and function accumulator for the call function indicated at the end of the stack are updated according to the current local counts. This updating includes updating self+descendants time for the caller function. Following step 411, the parallel stack is rolled back to the next stack frame at step 412, and at step 413 the new stack end frame is compared to the recorded exit point. If not equal, steps 411-413 are repeated. The self+descendants times are thus "rolled up" from the lower level functions to the higher. So, the profiling code for the caller function can accurately maintain self-plus-descendants time without the need for inaccurate distribution assumptions such as the average case distribution assumed by the prior art.

3) Basic Block Timing

FIG. 6C illustrates the steps performed by the fixed-cycle profiling code. These steps are step 420 of adding the number of fixed cycles for the basic block (previously determined during preprocessing) to the self time and to the self+descendants time for the current function, and step

Free Web Sudoku Puzzles.
Solve with your browser.
2   5   1   9 8  
                 
7     2       4  
          6   7  
  4 3 9   1 8 6  
  1   4          
  7       8     5
                 
  9 8   4   7   3
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!