Senior Fitness - Exercise and Nutrition for Aging Men and Women
FREE Article Feed for your website.
Home Ownership Magazine
Party Planning Information
Article Marketing Resources
Bio-Medical Research Article Database
Informative Articles on Life, Love and Happiness
Tutorials on Business to Writing
Famous Quotes from Famous People
Song Lyric Information
New US Patent Information
Comprehensive List of Content by Category
Online Auctions and Shopping Related Articles
Article Search
Most Recent Articles
 

aspen nightlife the ultimate taxi
Category:
Travel  

Ideas for Deck Designs
Category:
Home And Family  

How Your Bank Can Save You Money
Category:
Marketing  

Best Destinations For Florida Family Vacations
Category:
Travel  

The Success of British Airways
Category:
Travel  

How Does Cosmetic Dentistry Work
Category:
Health / Fitness  

Essential Elements the Perfect Opportunity must Possess
Category:
Marketing  

Two Important Questions Every Network Marketer Must Know How To ...
Category:
Business  

Selling software online How do you present your software
Category:
Marketing  

Important Information on Sleep Disorders
Category:
Health / Fitness  

Stamps Collecting has Never Been So Easy
Category:
Entertainment / Television  

Myths and Misconceptions About Starting an Online Business
Category:
Marketing  

Break Into the High Flying Crowd
Category:
Marketing  

Attending Camp with a Friend
Category:
Sports  

Coping with the pain
Category:
Home And Family  

Perinate Herpes Simplex Viral Infection
Category:
Health / Fitness  

Off Line Marketing Secrets to Getting More Customers
Category:
Marketing  

Baby Shower Poems How to Write Baby Poems Like a Pro
Category:
Home And Family  

Simple Ways To Debt Relief
Category:
Finance / Investment  

From Domain s Purchase To The Real Gain
Category:
Business  

South Africa s Convenience Store Market A Toddler Amongst Sprint...
Category:
Business  

Does Your Online Copy Talk
Category:
Marketing  

Your Home Is Your Sanctuary
Category:
Home And Family  

Acne Prevention Do and Dont s
Category:
Health / Fitness  

Sarcopenia As we Age Muscle Loss Occurs
Category:
Health / Fitness  

Looking For A Home Based Business Opportunity K I S S
Category:
Business  

Cialis
Category:
Self Help  

How To Drop Your Weight and Become Healthier Using These 7 Every...
Category:
Health / Fitness  

EMPLOYEE ENGAGEMENT AND MENTAL HEALTH
Category:
Business  

Eating Out and Loosing Weight
Category:
Health / Fitness  

The Surefire Increase To Your Traffic From Yesterday
Category:
Marketing  

When To Use A Collection Agency
Category:
Finance / Investment  

Pakistan Pharma Industry going International
Category:
Business  

6 Secret Signs of an Easy Home Business
Category:
Business  

How old should you be before buying a loft bed
Category:
Home And Family  

Using Autoresponders To Multiply Marketing Power Save Time
Category:
Marketing  

Health Insurance Quotes
Category:
Finance / Investment  

Informative Free Report Guides You To Antenna Cell Flashing Phon...
Category:
Business  

Cruise stocks a risk vs reward analysis
Category:
Business  

Instant Lottery Tickets How To Make Money With Losing Lottery Ti...
Category:
Entertainment / Television  

Bird Flu Vaccines What is Taking So Long
Category:
Health / Fitness  

A Solid Choice for Business cards
Category:
Business  

Secured loans for unemployed tone down the bitterness of unemplo...
Category:
Finance / Investment  

Cashing in on Coca Cola Memorabilia New Ideas for Old Art
Category:
Home And Family  

10 Skin Care Tips Look Stunning in Your 40s
Category:
Health / Fitness  

5 Ways to Manage your Diet for Diabetes
Category:
Health / Fitness  

Marquis Theater A Modern Musical Experience
Category:
Entertainment / Television  

Get Online Knowledge About Alcoholism Treatment
Category:
Health / Fitness  

Kissing Tips Make a Kiss More Passionate
Category:
Self Help  

Make Your Office a Paper Free Zone
Category:
Business  

How to Submit Articles on the Internet
Category:
Business  

Mutual Funds and Their Risks
Category:
Business  

The Cost of Diabetes and Free Diabetic Supplies
Category:
Health / Fitness  

When You Go On Vacation This Summer
Category:
Travel  

6 Simple Ways to Create the Best Most Fantastic Valentines
Category:
Home And Family  

Type of computer games
Category:
Entertainment / Television  

Pregnancy and Diabetes What You Should Know
Category:
Health / Fitness  

Chew slowly and digest the rules
Category:
Business  

An Introduction to CD Mastering
Category:
Hobbies / Pastimes  

WiMAX to constitute a major share of wireless broadband market
Category:
Marketing  

Acne Products The Different Categories
Category:
Home And Family  

Trading the Forex Markets with the Forex Trading Machine
Category:
Finance / Investment  

Energy Savings by Use of the Correct Spray Nozzle
Category:
Business  

Digging Deep To Get The Most From RSS Technology for Marketing
Category:
Marketing  

If You Want To Be Successful in Trading There s Only One Thing Y...
Category:
Finance / Investment  

Choosing the Right Wedding Music
Category:
Home And Family  

The Truth About Vitamin Deficiencies
Category:
Health / Fitness  

Online Casino Gamble
Category:
Hobbies / Pastimes  

Plasma Television Myths and Facts
Category:
Home And Family  

Generate MEANINGFUL Traffic to Your Site
Category:
Marketing  

Understanding Legal Advice
Category:
Real Estate  

Where adsense should appear
Category:
Marketing  

The process of buying a new home from a home builder
Category:
Real Estate  

How to sell property to overseas property buyers
Category:
Finance / Investment  

SELLING INFORMATION PRODUCTS What Sells What Doesn t
Category:
Marketing

Comparing hierarchically-structured documents Number:7,437,664 from the United States Patent and Trademark Office (PTO) owispatent

Home    Author Login    Submit Article    Article Search    Add Your Link    Edit Your Link    Contact Us    Advertising    Disclaimer

   

 
Web LinkGrinder.com

Top Breaking News
     Greek, Cypriot Leaders Resume Unification Talks in Nicosia by Nathan Morley
     Indonesia Tobacco Sales Grow, Raising Health Fears
     South Korea Allows Top Defector to Travel Overseas by VOA News

Title: Comparing hierarchically-structured documents

Abstract: Described is a method and system for comparing two XML documents, usually represented as two logical dependency trees, and providing their differences as a set of tree operations. The set of tree operations may be used to transform one tree to the other. A first phase constructs an XML tree of nodes for each file, and a second, link tree construction phase builds a tree of link objects that relate nodes in the left tree to nodes in the right tree. Construction of the link tree generally operates by mapping equal subtrees in the left and right trees to each other, linking mapped subtrees to each other, removing any crossing links, linking groups, and filling gaps in the link tree. A third output phase uses the link tree to write an output file, such as comprising an XML document of change (e.g., insert and delete) operations.

Patent Number: 7,437,664 Issued on 10/14/2008 to Borson


Inventors: Borson; Niklas (Seattle, WA)
Assignee: Microsoft Corporation (Redmond, WA)
Appl. No.: 10/174,210
Filed: June 18, 2002


Current U.S. Class: 715/234 ; 715/209
Current International Class: G06F 15/00 (20060101)
Field of Search: 715/513,514,209,234


References Cited [Referenced By]

U.S. Patent Documents
6237006 May 2001 Weinberg et al.
6714939 March 2004 Saldanha et al.
6732102 May 2004 Khandekar
6848078 January 2005 Birsan et al.
7096421 August 2006 Lou
2003/0084424 May 2003 Reddy et al.
2004/0250211 December 2004 Wakita et al.
2005/0144598 June 2005 Sabadell et al.
2006/0159272 July 2006 Ishiguro et al.

Other References

Sleator, Daniel and Robert Tarjan, "A data structure for dynamic trees", Annual ACM Symposium of Theory of Computing, ACM Press, 1981,pp. 114-122. cited by examiner .
Chakrabarti, K and S. Mehrotra, "The Hybrid Tree: an index structure for high dimensional featurespaces", Mar. 23-26, 2999, pp. 440-447. cited by examiner .
Zhang et al., "Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems" 1989 Society for Industrial and Applied Mathematics, vol. 18, No. 6, pp. 1245-1262, Dec. 1989. cited by other.

Primary Examiner: Desai; Rachna
Attorney, Agent or Firm: Workman Nydegger

Claims



What is claimed is:

1. In a computer system, a method, comprising: accessing a first file of hierarchically structured data; generating, from the first file, a first tree structure, the first tree structure having at least one first group of data that may be referenced as a group, the at least one first group being less than the whole first tree; accessing a second file of hierarchically structured data; generating, from the second file, a second tree structure, the second tree structure having at least second one group of data that may be referenced as a group, the at least one second group being less than the whole second tree; recursively iterating over and comparing nodes in the first tree structure with nodes in the second tree structure to develop a third tree structure, the third tree structure being a sparse link tree based on a comparison of the first and second tree structures, the sparse link tree comprising link objects relating nodes and groups of the first tree structure to nodes and groups of the second tree structure which are equal as between the first tree structure and the second tree structure, and excluding link objects relating nodes and groups of the first tree structure to nodes and groups of the second tree structure which are not equal as between the first tree structure and the second tree structure, the sparse link tree having gaps therein for unlinked nodes and groups which are not equal as between the first tree structure and the second tree structure; determining whether any links of the sparse link tree cross and, when links cross, removing at least one link object relating nodes or groups of the first tree structure to nodes and groups of the second tree structure so as to eliminating crossing of links in the sparse tree; converting the sparse link tree into a fourth tree structure, the fourth tree structure being a complete link tree formed by filling the gaps of unlinked nodes within the sparse link tree, the complete link tree relating the first tree structure to the second tree structure; linking the at least one group in the first tree to the at least one group in the second tree; and processing the complete link tree to output a set of at least one difference between the first file and the second file such that at least one difference is identified between the at least one first group and the at least one second group.

2. The method of claim 1 wherein developing a sparse link tree comprises, mapping equal subtrees in the first and second trees to each other.

3. The method of claim 1 wherein developing a sparse link tree comprises, detecting a subtree in the first tree that equals a subtree in the second tree, and inserting a link node into the sparse link tree, the link node referencing the subtree in each of the first and second trees.

4. The method of claim 3 wherein determining if any links of the sparse tree cross further comprises determining whether the link node references to the first tree are in the same order as the link node references to the second tree, and if not, removing at least one link object from the sparse link tree.

5. The method of claim 1 wherein linking groups includes, detecting whether a group in one tree equals a group in the other tree, and if so, linking the roots of the groups.

6. The method of claim 1 wherein linking the at least one group in the first tree to the at least one group in the second tree comprises, inserting a link node into the sparse link tree.

7. The method of claim 1 wherein linking groups includes, determining whether a group in one tree structure is the union of two or more groups in the other tree structure, and if so, modifying the complete link tree, including: creating a link object for the root of a larger group; creating link objects for the roots of smaller groups; and adjusting pointers in the complete link tree such that the link objects for the smaller groups are children of the link object for the larger group.

8. The method of claim 1 wherein linking groups includes, determining whether a group in one tree structure is the union of two or more groups in the other tree structure, and if so, inserting one-way links into the complete link tree for the root of a group that occurs in one tree structure but not the other.

9. The method of claim 1 wherein removing at least one link object comprises unlinking at least one node in each of the first and second tree structures to eliminate the intersection.

10. The method of claim 9 wherein unlinking at least one node comprises, unlinking nodes that are in both the first and second tree structures.

11. The method of claim 9 wherein unlinking at least one node comprises, unlinking nodes that are only in one of the first and second tree structures.

12. The method of claim 9 wherein unlinking at least one node comprises, determining a first value corresponding to unlinking nodes that are in both the first and second tree structures, determining a second value corresponding to unlinking nodes that are in one of the first and second tree structures, determining a third value corresponding to unlinking nodes that are in the other of the first and second tree structures, and unlinking the nodes that correspond to the lowest of the first, second or third values.

13. The method of claim 9 wherein determining if any links of the sparse link tree cross includes, determining whether two or more intersections are related, finding a least-cost set of nodes that can be unlinked to eliminate the related intersections, and unlinking the nodes of that set.

14. The method of claim 13 wherein finding the least-cost set of nodes comprises constructing a Boolean expression for each intersection, and finding the least-cost set of nodes for which the expression is true.

15. The method of claim 1 wherein filling gaps in the sparse link tree comprises traversing the first and second tree to detect unlinked ancestor nodes, and linking unlinked ancestor nodes.

16. The method of claim 1 wherein filling gaps in the sparse link tree comprises, traversing the first and second tree to detect adjacent unlinked sibling nodes, and grouping unlinked siblings under a single unlinked node.

17. The method of claim 1 wherein processing the complete link tree comprises, outputting a set of tree instructions.

18. The method of claim 1 wherein processing the complete link tree comprises, outputting an insert instruction for content that corresponds to a node present in the second tree structure but not present in the first tree structure.

19. The method of claim 1 wherein processing the complete link tree comprises, outputting a delete instruction for content that corresponds to a node present in the first tree structure but not present in the second tree structure.

20. The method of claim 1, wherein generating, from the first file, a first tree structure comprises placing each element of the first file in the first tree structure, and wherein generating, from the second file, a second tree structure, comprises placing each element of the second file in the second tree structure.

21. The method of claim 1, further comprising: computing a hash of each node in said first tree structure and said second tree structure; using the hash of any leaf node in computing a hash for its parent node, such that each subtree has a hash value computed that depends on its child nodes; and comparing hash values of subtrees of said first tree structure to said second tree structure, wherein if a hash value of a subtree in said first tree structure matches a hash value of a subtree in said second tree structure, said subtree is added to the sparse link tree, wherein if a hash value of a subtree in said first tree structure does not match a hash value of a subtree in said second tree structure, said subtree is not added to said sparse link tree.

22. A computer-readable storage medium having computer-executable instructions for performing a method, comprising: accessing a first file of hierarchically structured data to provide a first tree structure therefrom; accessing a second file of hierarchically structured data to provide a second tree structure therefrom; recursively iterating over and comparing nodes in the first tree structure with nodes in the second tree structure to develop a third tree structure based on a comparison of the first and second tree structures, the third tree structure being a sparse link tree comprising link objects relating nodes and groups of the first tree structure to nodes and groups of the second tree structure which are equal as between the first tree structure and the second tree structure, and excluding link tree objects relating nodes and groups of the first tree structure to nodes and groups of the second tree structure which are not equal as between the first tree structure and the second tree structure, the sparse link tree having gaps therein for unlinked nodes and groups which are not equal as between the first tree structure and the second tree structure; determining whether any links of the sparse link tree cross and, when links cross, removing at least one link object relating nodes or groups of the first tree structure to nodes and groups of the second tree structure so as to eliminating crossing of links in the sparse tree; converting the sparse link tree into a fourth tree structure, the fourth tree structure being a complete link tree formed by filling the gaps of unlinked nodes within the sparse link tree, the complete link tree relating the first tree structure to the second tree structure; linking at least one group in the first tree structure to at least one group in the second tree structure, wherein the at least one first group in the first tree is less than the first file and the at least one group in the second tree is less than the second file; and processing the complete link tree to output a set of at least one difference between the first file and the second file such that at least one difference is identified between the at least one first group and the at least one second group.

23. The computer-readable storage medium of claim 22 wherein developing a sparse link tree comprises, mapping equal subtrees in the first and second trees to each other.

24. The computer-readable storage medium of claim 22 wherein developing a sparse link tree comprises, detecting a subtree in the first tree that equals a subtree in the second tree, and inserting a link node into the sparse link tree, the link node referencing the subtree in each of the first and second trees.

25. The computer-readable storage medium of claim 24 having further computer-executable instructions comprising, determining whether the link node references to the first tree are in the same order as the link node references to the second tree, and if not, removing at least one link node from the link tree.

26. The computer-readable storage medium of claim 22 wherein linking groups includes, detecting whether a group in one tree equals a group in the other tree, and if so, linking the roots of the groups.

27. The computer-readable storage medium of claim 22 wherein linking at least one group in the first tree structure to at least one group in the second tree structure comprises, inserting a link node into the sparse link tree.

28. The computer-readable storage medium of claim 22 wherein linking groups includes, determining whether a group in one tree structure is the union of two or more groups in the other tree structure, and if so, modifying the complete link tree, including: creating a link object for the root of a larger group; creating link objects for the roots of smaller groups; and adjusting pointers in the complete link tree such that the link objects for the smaller groups are children of the link object for the larger group.

29. The computer-readable storage medium of claim 22 wherein linking groups includes, determining whether a group in one tree structure is the union of two or more groups in the other tree structure, and if so, inserting one-way links into the complete link tree for the root of a group that occurs in one tree structure but not the other.

30. The computer-readable storage medium of claim 22 wherein removing at least one link object comprises unlinking at least one node in each of the first and second tree structures to eliminate the intersection.

31. The computer-readable storage medium of claim 30 wherein unlinking at least one node comprises, unlinking nodes that are in both the first and second tree structures.

32. The computer-readable storage medium of claim 30 wherein unlinking at least one node comprises, unlinking nodes that are only in one of the first and second tree structures.

33. The computer-readable storage medium of claim 30 wherein unlinking at least one node comprises, determining a first value corresponding to unlinking nodes that are in both the first and second tree structures, determining a second value corresponding to unlinking nodes that are in one of the first and second tree structures, determining a third value corresponding to unlinking nodes that are in the other of the first and second tree structures, and unlinking the nodes that correspond to the lowest of the first, second or third values.

34. The computer-readable storage medium of claim 30 wherein determining if any links of the sparse link tree cross includes, determining whether two or more intersections are related, finding a least-cost set of nodes that can be unlinked to eliminate the related intersections, and unlinking the nodes of that set.

35. The computer-readable storage medium of claim 34 wherein finding the least-cost set of nodes comprises constructing a Boolean expression for each intersection, and finding the least-cost set of nodes for which the expression is true.

36. The computer-readable storage medium of claim 22 wherein filling gaps in the sparse link tree comprises traversing the first and second tree to detect unlinked ancestor nodes, and linking unlinked ancestor nodes.

37. The computer-readable storage medium of claim 22 wherein filling gaps in the sparse link tree comprises, traversing the first and second tree to detect adjacent unlinked sibling nodes, and grouping unlinked siblings under a single unlinked node.

38. The computer-readable storage medium of claim 22 wherein processing the complete link tree comprises, outputting a set of tree instructions.

39. The computer-readable storage medium of claim 22 wherein processing the complete link tree comprises, outputting an insert instruction for content that corresponds to a node present in the second tree structure but not present in the first tree structure.

40. The computer-readable storage medium of claim 22 wherein processing the complete link tree comprises, outputting a delete instruction for content that corresponds to a node present in the first tree structure but not present in the second tree structure.

41. A computer-implemented method for comparing hierarchically-structured documents, comprising: accessing a first file of hierarchically structure data; generating, from the first file, a first tree structure, the first tree structure having at least one first group of data that may be referenced as a group, the at least one first group being less than the whole first tree; accessing a second file of hierarchically structure data; generating, from the second file, a second tree structure, the second tree structure having at least one second group of data that may be referenced as a group, the at least one second group being less than the whole second tree; recursively iterating over and comparing nodes in the first tree structure with nodes in the second tree structure to develop a third tree structure, the third tree structure being a sparse link tree based on a comparison of the first and second tree structures, the sparse link tree comprising link objects specifically pointing to nodes and groups of the first tree structure and to nodes and groups of the second tree structure, and pointing to only those nodes and groups which are equal as between the first tree structure and the second tree structure, and excluding link objects with pointers to nodes and groups of the first tree structure to nodes and groups of the second tree structure, the sparse link tree having gaps therein for unlinked nodes and groups which are not equal as between the first tree structure and the second tree structure; determining whether the links objects related to the equal nodes and groups of the first and second tree structures are in the same order as the nodes and groups of both the first and second tree structures, and when they are not, determining that the link objects have crossing links; after determining the link objects have crossing links, determining which of the crossing links to remove, wherein determining which of the crossing links to remove comprises determining that one of a pair of link nodes needs to be unlinked and, for each of the pair of link nodes, summing values of subtree members that must also be unlinked if the respective link node is unlinked; unlinking the link node of the pair of link nodes which has the lowest summed value; repeating the steps of determining that one of a pair of link nodes needs to be unlinked and unlinking the link node until no pairs remain to be checked for crossing links; converting the sparse link tree into a fourth tree structure, the fourth tree structure being a complete link tree formed by filling the gaps of unlinked nodes within the sparse link tree, the complete link tree relating the first tree structure to the second tree structure; linking the at least one group in the first tree to the at least one group in the second tree; and processing the complete link tree to output a set of at least one difference between the first file and the second file such that at least one difference is identified between the at least one first group and the at least one second group.
Description



FIELD OF THE INVENTION

The present invention relates generally to computer systems, and more particularly to hierarchically-structured documents such as XML (eXtensible Markup Language) formatted documents.

BACKGROUND OF THE INVENTION

The eXtensible Markup Language (XML) is a markup language that allows users to describe data in hierarchically-structured documents or equivalent files. In general, the data is not only present in an XML document, but is described in some way. For example, various sets of text in an XML document might be tagged as separate paragraphs, whereby a program interpreting the document would know something about the text's organization.

XML is a simplified subset of SGML (Standard Generalized Markup Language) that removes some of SGML's more complex features to simplify programming. XML is a defined non-proprietary standard, so XML-formatted information is accessible and reusable by any XML-compatible software, in contrast to proprietary formats used by many conventional programs such as traditional word processors. In other words, XML can be used to store any kind of structured information in a manner that enables it to be communicated between computers, including those that are otherwise unable to communicate. The format is robust, persistable and verifiable.

XML allows the flexible development of user-defined document types that are stored, transmitted and/or processed in some manner, while providing information content that is richer and easy to use, (e.g., relative to HTML), because the descriptive and hypertext linking abilities of XML are much greater than those of HTML.

As XML and XML documents are becoming extremely popular, various tools are needed to work with XML technology. One such tool that would benefit users would provide a way to compare two XML documents. File comparison has a wide range of uses, generally known from word processor utilities and the like that perform line-oriented comparisons, such as those that compare text.

However, while such line-oriented comparisons systems are straightforward to implement, they are also rather limited, and do not fit the hierarchical nature of the structure of XML documents. What it needed is a comparison method and system that are tree-oriented, to match the hierarchical structure of structured documents such as XML documents.

SUMMARY OF THE INVENTION

Briefly, the present invention provides a tree-oriented comparison system and method that compares two XML (or other hierarchically-structured) documents and reports their differences as a set of tree operations. The tree operations may be stored in a well-formed XML document. A tree-oriented comparison is more useful than a line-oriented comparison because with tree operations, it is possible to selectively roll back changes in the original hierarchically structured documents, while still maintaining a well-formed tree. For example, an application may use a change document (e.g., an XML document) comprising tree operations that was created with the present invention to provide users with a tool that enables interactive acceptance or rejection of changes that had previously been made to one of the two XML input files.

To construct the set of tree operations, in a first phase referred to as an input phase, a comparison mechanism (and/or process) reads both input files into memory, and constructs an XML tree of nodes for each file, referred to as a left tree and a right tree, respectively. Once the left and right trees are built, a second, link tree construction phase builds a tree of link objects that relate nodes in the left tree to nodes in the right tree. Then, a third, or output phase uses the link tree to write an output file, such as comprising an XML document of change operations. With this change document, for example, a tool that applied all of the changes therein to the left tree would wind up with the right tree, or vice-versa, while in another application, a tool enables the changes to be individually viewed and selectively applied.

In general, in the input phase, the comparison mechanism reads both input files into memory, and converts them to a standard character encoding that is used internally. The comparison mechanism then constructs an XML tree of nodes for each input file. In one implementation, each node in an XML tree is an object of type XmlRoot, XmlElem, or XmlText, wherein the three node types are based on the abstract base class XmlNode, such that an XML tree may be thought of as a tree of XmlNode objects. Further, each node may have a beginning, zero or more children, and an end.

A second, link tree construction phase builds a tree of link objects that relates nodes in the left tree to nodes in the right tree, including subtrees, wherein a subtree is a node together with its descendants. Construction of the link tree generally operates by a number of steps, including mapping equal subtrees in the left and right trees to each other, linking mapped subtrees to each other, removing any crossing links, linking groups, and filling gaps in the link tree.

The mapping equal subtrees step finds subtrees of the left tree that equal subtrees of the right tree, and maps the corresponding nodes of the left and right subtrees to each other. Once equal subtrees are mapped, the other steps create the link tree, comprising a tree of link objects (or nodes), wherein each link object points to a node in the left tree, the right tree, or both; each node in the left and right trees may be associated with a link node, except for descendants of mapped nodes (as only the root nodes of mapped subtrees are linked to each other); and the order and hierarchy of the link nodes matches the order and hierarchy of the corresponding input tree (e.g., XML) nodes in both the left and right trees.

Because the mapped nodes in the two input trees may be equal but not in the same order, links may logically cross other links. Such crossing links are removed by comparing each pair of adjacent link nodes in the left tree to determine whether the nodes they point to in the right tree are in order. When two adjacent link nodes are not in order, one is unlinked, including unmapping the XML nodes in the corresponding subtrees. Since the crossing link may be removed by unlinking either of the adjacent nodes, a least-cost option is used to decide which to unlink.

When crossing links are removed, the order of the nodes in the link tree matches the order of the corresponding nodes in the left and right trees. Groups are then linked, wherein a group is a set of linked nodes in an XML tree that is defined by a common ancestor nearer than the root of the tree. The nearest common ancestor of all the nodes in the group is called the root of the group. A first step in processing groups enumerates the groups in the left and right trees, giving a left group tree and a right group tree.

Relationships between groups in the left tree and groups in the right tree are found, and group rules applied to each. For example, groups that intersect are found, and subtrees selectively unlinked until there are no intersecting groups. Another group rule links the roots of equal groups to each other. Then remaining groups are linked. Linking groups may include inserting one-way links into the link tree for the roots of groups that occur in one XML tree but not the other, wherein such groups are the unions of single- or multi-element groups in the other tree.

Application of the group rules provides an intermediate link tree. To complete the link tree, vertical and horizontal gaps are filled in by traversing the tree in separate passes, inserting link objects in the vertical pass, and linking unlinked siblings in a horizontal pass.

Once the link tree is fully constructed, an output (e.g., XML change) file is generated that describes differences between the two input files in terms of tree operations. For example, the change file may include insertion and deletion tags to represent the changes from one file to the other. In this manner, the change file is tree oriented to match the hierarchical structure of structured documents.

Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram representing an exemplary computer system into which the present invention may be incorporated;

FIG. 2 is a block diagram generally representing components for constructing a link tree and an output file of tree operations from the link tree in accordance with an aspect of the present invention;

FIG. 3 is a representation of a tree constructed from an XML file;

FIG. 4 is a flow diagram generally representing the overall construction process in accordance with an aspect of the present invention;

FIG. 5 is a flow diagram generally representing the mapping of equal subtrees in the left and right input trees to one another, in accordance with an aspect of the present invention;

FIG. 6 is a diagram generally representing left and right trees constructed from hierarchical files and having logical links between mapped subtrees, in accordance with an aspect of the present invention;

FIG. 7 is a flow diagram generally representing the linking of mapped subtrees to one another, in accordance with an aspect of the present invention;

FIG. 8 is a diagram generally representing left and right trees used to build a sparse tree containing mapped subtrees, in accordance with an aspect of the present invention;

FIG. 9 is a flow diagram generally representing the removing of crossing links, in accordance with an aspect of the present invention;

FIGS. 10A and 10B are diagrams generally representing the removal of crossing links in left and right trees constructed from hierarchical files and having logical links between mapped subtrees, in accordance with an aspect of the present invention;

FIG. 11 is a flow diagram generally representing the linking of groups, in accordance with an aspect of the present invention;

FIG. 12 is a diagram generally representing the construction of a complete link tree from left and right input trees, in accordance with an aspect of the present invention;

FIG. 13 is a diagram generally representing groups of nodes in input trees;

FIG. 14 is a diagram generally representing relationships between groups of nodes in input trees;

FIG. 15 is a diagram generally representing the simplification of trees;

FIGS. 16-19 are diagrams generally representing the linking of roots of equal groups, in accordance with an aspect of the present invention;

FIGS. 20-23, 24A-24C, 25A and 25B are diagrams generally representing the handling of unions of groups, in accordance with an aspect of the present invention;

FIGS. 26A-26C and 27-29 are diagrams generally representing the handling of intersecting groups, in accordance with an aspect of the present invention;

FIGS. 30-33, 34A, 34B, 35A and 35B are diagrams generally representing the handling of related intersecting groups, in accordance with an aspect of the present invention;

FIG. 36 is a diagram generally representing the unlinking of nodes in accordance with an aspect of the present invention;

FIG. 37 is a diagram generally representing the linking of equal groups in accordance with an aspect of the present invention;

FIGS. 38A-38D are diagrams generally representing the linking of unions of groups in accordance with an aspect of the present invention;

FIGS. 39A and 39B are flow diagrams generally representing the filling of gaps in the link tree in respective vertical and horizontal passes, in accordance with an aspect of the present invention; and

FIGS. 40-42 comprise a flow diagram representing the construction of an output file of tree operations from the link tree, in accordance with an aspect of the present invention.

DETAILED DESCRIPTION

Exemplary Operating Environment

FIG. 1 illustrates an example of a suitable computing system environment 100 on which the invention may be implemented. The computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.

The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

With reference to FIG. 1, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110. Components of the computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.

The computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 110 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 110. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.

The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, FIG. 1 illustrates operating system 134, file system 135, application programs 136, other program modules 137 and program data 138.

The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 1 illustrates a hard disk drive 141 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.

The drives and their associated computer storage media, discussed above and illustrated in FIG. 1, provide storage of computer-readable instructions, data structures, program modules and other data for the computer 110. In FIG. 1, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146 and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 136, other program modules 137, and program data 138. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers herein to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 20 through input devices such as a tablet (electronic digitizer) 164, a microphone 163, a keyboard 162 and pointing device 161, commonly referred to as mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. The monitor 191 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 110 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 110 may also include other peripheral output devices such as speakers 195 and printer 196, which may be connected through an output peripheral interface 194 or the like.

The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. For example, in the present invention, the computer system 110 may comprise source machine from which data is being migrated, and the remote computer 180 may comprise the destination machine. Note however that source and destination machines need not be connected by a network or any other means, but instead, data may be migrated via any media capable of being written by the source platform and read by the destination platform or platforms.

When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 1 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

Comparing Hierarchically-Structured Documents

As generally represented in FIG. 2, the present invention is directed to a tree-oriented comparison system and method 200 that compares two XML (or similarly hierarchically-structured) document files 202.sub.1 and 202.sub.2, and reports their differences as a set of tree operations. To this end, in a first phase referred to as an input phase, a comparison mechanism (and/or process) 204.sub.1 reads both input files 202.sub.1 and 202.sub.2 into memory, and constructs an XML tree of nodes for each file. For purposes of description, the two XML trees that are constructed in the first phase are referred to as a left tree 206.sub.1 and a right tree 206.sub.2, respectively, although as can be readily appreciated, other names would be equivalent, e.g., first and second trees, current and previous trees (based on document versions), and so forth.

Some processing of the left and right trees may also be performed at this time. For example, to simplify later comparisons between the two trees, a hash computation is taken of each node, and a hash value associated with that node. To this end, the computation starts with each of the leaf nodes, computes a hash, and stores it in association with each respective leaf node. The hash of the leaf node is then used in computing a hash value for its parent node, and that hash for the next parent up, and so on. In this manner, each subtree has a hash value computed therefor that depends on its child nodes and their child nodes. If the hash value of a node equals the hash value of another node, then it is known that those nodes and the subtrees thereunder are equal. Other processing may be done at this time, such as to determine size, relative values and the like of each node, however such processing may be deferred until needed for a given node.

In accordance with one aspect of the present invention, a second, link tree construction phase 204.sub.2 builds a tree of link objects 208 that relates nodes in the left tree 206.sub.1 to nodes in the right tree 206.sub.2. The link tree 208 need not be an XML document, but rather is a temporary tree that is manipulated as described below to relate the left and right trees to one another. In general, the link tree 208 is built by walking the left and right trees 202.sub.1 and 202.sub.2, matching subtrees therein by their parent nodes' hash values, and maintaining pointers between equal subtrees. The link tree 208 is then manipulated according to various link group rules.

When the link tree 208 is complete, a third, or output phase of the comparison mechanism 204.sub.3 uses the link tree 208 to write an output file 210. In one implementation, the output file 210 comprises a well-formed XML document, also referred to as the change tree, or change document, since it is a tree structured document that contains the change operations that describe differences between the trees. Note that while in FIG. 2 a single comparison mechanism/process is shown as accomplishing the three phases, (as indicated in FIG. 2 by the dashed line connecting the blocks 204.sub.1-204.sub.3), it is understood that the phases may be implemented by more than one component, e.g., a separate component may perform each phase.

Turning to a general explanation of the first, or input phase, in this phase the comparison mechanism 204.sub.1 reads both input files 202.sub.1 and 202.sub.2 into memory, such as the RAM 132 (FIG. 1) and converts them to a standard character encoding that is used internally. The comparison mechanism (phase 204.sub.1) then constructs the left and right XML tree of nodes 206.sub.1 and 206.sub.2 for each input file 202.sub.1 and 202.sub.2, respectively. Hash values and possibly size values may be determined at this time, as described above.

In one implementation, each node in the XML trees 206.sub.1 and 206.sub.2 is an object, either of type XmlRoot, XmlElem or XmlText, wherein the three node types are based on the abstract base class XmlNode, such that an XML tree may be thought of as a tree of XmlNode objects. Each node has a beginning, zero or more children, and an end.

The XmlRoot object represents the document as a whole, and its beginning comprises everything before the document element's start tag, as generally described below with reference to FIG. 3. In a typical XML document, this may include processing instructions, declarations, and white space. The XmlRoot object's end comprises everything after the document element's end tag, typically comprising any trailing white space. The only child of an XmlRoot object should be the XmlElem object for the document element. Note however, that XmlRoot objects may sometimes be used later, to combine multiple sibling elements into a single subtree; in such a case, the XmlRoot object represents the root of a subtree, rather than the root of the entire document.

An XmlElem object represents an XML element, and its beginning is the element's start tag, including any attributes. An XmlElem object's children are child elements and text nodes, and (later) possibly XmlRoot objects for nodes that have been grouped into subtrees. The XmlElem object's end comprises the end tag, if any. The start and end tags are converted to canonical form for comparison purposes.

An XmlText object represents a block of text, i.e., parsed character data. The XmlText object's beginning is the character data, which may be normalized according to a white space handling option. An XmlText object should have no children, and its end should be the empty string.

FIG. 3, based on the example in the table below, shows how a document may be converted to an XML tree 300:

TABLE-US-00001 <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="my_stylesheet.xsl"?> <?DOCTYPE topic SYSTEM "my_schema.dtd"> <topic type="overview"> <title>Example Page</title> <body> <p>First paragraph.</p> <p>Second paragraph.</p> </body> </topic>

As represented in FIG. 3, the XmlRoot node 302 contains the information prior to the <topic> XmlElem node 304. Hierarchically below the <topic> XmlElem node 304 are <title> and <body> XmlElem nodes, 306 and 308, respectively. The <title> XmlElem node 306 has an example page XmlText node 310 as a child, while the <body> XmlElem node 308 has two paragraph XmlElem nodes as children, 312 and 314, each paragraph XmlElem node having respective first and second paragraphs 316 of text as XmlText children 318.

As represented in the flow diagram of FIG. 4, following completion of the input phase (step 400), the comparison mechanism compares the left and right trees 206.sub.1 and 206.sub.2, as represented by step 402. If the trees are equal, (e.g., as determined by equal hash values associated with their root nodes), an appropriate output indicating "no differences" is generated at step 404, and the comparison mechanism/process ends.

In the event that the left and right trees 206.sub.1 and 206.sub.2 are unequal, a link tree 208 is constructed at step 404 that relates the left tree 206.sub.1 and the right tree 206.sub.2. As will be understood, construction of the link tree 208 is typically the most significant part of the comparison mechanism, processing-wise, and is represented by steps 408, 410, 412, 414 and 416, each of which are further explained via corresponding FIGS. 5, 7, 9, 11 and 39A-39B, respectively. Thus, construction of the link tree 208 involves step 408, which maps equal subtrees in the left and right trees to each other, wherein a subtree is a node, together with its descendants. Step 410, described below, links mapped subtrees to each other, while step 412 removes crossing links. Step 414 links groups, while step 416 represents filling gaps in the link tree 208.

As represented by step 408, the flow diagram of FIG. 5, and the example trees of FIG. 6, a general goal of mapping equal subtrees is to find subtrees of the left tree that equal subtrees of the right tree, and map the nodes of the left and right subtrees that correspond to each other. In an XML tree, a subtree may be a simple leaf node, such as a text node or empty element, or it may comprise an element together with the elements and/or text nodes it contains. Two subtrees are equal if their root nodes are equal, and their corresponding subtrees are equal. Note that this corresponds to a recursive comparison. The hash calculations facilitate the comparisons.

One way of mapping equal subtrees works as generally represented in FIGS. 5 and 6, wherein step 500 finds a pair of anchor points A and A', where A is the root of a unique subtree in the left tree 602, A' is the root of a unique subtree in the right tree 604, and subtree A equals subtree A'. In FIG. 6, unique subtrees in the trees 602 and 604 that have an equal counterpart subtree are each represented via a dashed box. A subtree is unique if the tree of which it is a part contains no other subtree equal to it.

Step 502 maps the subtrees A and A' to each other, such as via a data structure or the like that contains pointers to the root nodes (e.g., their offsets) of the subtrees. Step 502 entails mapping nodes A and A' to each other, and mapping the descendants of A and the corresponding descendants of A' to each other. In FIG. 6, links are represented by dashed arrows between the subtrees. Note that for purposes of clarity, individual links between mapped root nodes and mapped descendants are not shown.

Step 504 tests whether adjacent siblings of A and A', such as B and B', are the roots of equal (but not necessarily unique) subtrees. If so, step 504 branches to step 506 to map those subtrees to each other. Step 508 then repeats the above-described process for other adjacent siblings until none remain.

Once any adjacent siblings have been mapped, step 510 repeats the above process for other anchor points, until there are none remaining. When none remain, step 512 looks for any remaining unmapped text nodes, and if at least one is found, splits each into smaller pieces, e.g., one node per word (as delineated by whitespace), via step 514. Then the process is run again (e.g., once) to find additional matches among the smaller pieces.

At this time, the mapped subtrees are known, whereby a general goal of the remaining steps is to create the link tree 208 (FIG. 2). The link tree 208 is a tree of link objects in which each link node points to a node in the left tree 206.sub.1, the right tree 206.sub.2, or both. Each node in the left and right trees 206.sub.1 and 206.sub.2 is associated with exactly one link node, except for descendants of mapped nodes, that is, only the root nodes of mapped subtrees are linked to each other. The order and hierarchy of the link nodes matches the order and hierarchy of the corresponding XML nodes in both the left and right trees, 206.sub.1 and 206.sub.2, respectively.

These requirements can be expressed more formally:

Given two XML nodes A and B (in either the left or right tree), the link tree 208 must contain two nodes L(A) and L(B) that point to A and B respectively. Moreover, if A is an ancestor of B then L(A) must be an ancestor of L(B), and if A comes before B then L(A) must come before L(B). This may be expressed as follows (wherein an arrow with the head pointing toward the root is used to denote ancestry, and the inequality operators denote order):

For any two nodes A and B in an XML tree, A.rarw.B.ident.L(A).rarw.L(B) and A<B.ident.L(A)<L(B)

These requirements may be met via steps 410, 412, 414 and 416 of FIG. 4, which are each further described in corresponding FIGS. 5, 7, 9, 11 and 39A-39B, respectively. The following sections describe steps in the creation of the link tree 208.

The first part of the link tree 208 that is created is the root node, (represented in FIG. 7 by step 700), which points to the roots of the left and right trees. Note that the roots of the left and right trees are empty of content (except for the content outside the document element, which is not subject to comparison) and always compare equal.

In FIG. 7, the process iterates (via steps 702, 708, and 710) over the nodes of the left tree to find the root nodes of the mapped subtrees, essentially looking for whether each left subtree is mapped to a right subtree. For each left subtree that is mapped, step 704 branches to step 706 where a new link node is added to the link tree 208. For example, as represented in FIG. 8, if B is the root of a mapped subtree in the left tree 800 and B' is the node it is mapped to in the right tree 802, a new link node L(B, B') is added as the last child of the link root, where B and B' can be considered as the pointers to their respective nodes in the left and right trees 800, 802. Note that at the time that the subtrees have been mapped, the link tree 804 is "flat" (its height is two), and there are no one-way links, that is, every link points to nodes in both the left and right trees. Such a tree is referred to herein as a sparse tree 804, as it only contains mapped subtrees, not unmapped ones.

At this point, because the links were added in the order that the left tree was traversed, the order of the nodes in the link tree matches the order of the corresponding linked nodes in the left tree. However, the order of the link nodes does not necessarily match the order of the corresponding nodes in the right tree. This is because there may be crossing links, such as represented in the example of FIG. 6, where the link between the A and A' nodes/subtrees cross with the link between the B and B' nodes/subtrees. To find crossing links, each pair of adjacent link nodes is compared, shown in the flow diagram of FIG. 9 via steps 900 and 902. The link nodes are in order if the nodes they point to in the right tree are in order, that is, L(A,A')<L(B,B').ident.A'<B'.

If two adjacent link nodes L.sub.n and L.sub.n+1 are not in order at step 902, to remove the crossing link, either one can be unlinked, either L.sub.n or unlink L.sub.n+1. To unlink a link node, the nodes in the corresponding subtrees are also unmapped. However, choosing to not unlink L.sub.n means that not only L.sub.n+1 needs to be unlinked, but also one or more subsequent link nodes. Conversely, choosing to not unlink L.sub.n+1 means that not only L.sub.n needs to be unlinked, but also one or more previous link nodes. To determine which is more optimal to unlink, in one implementation, the values of the subtree members that would have to be unlinked in each case are summed, (steps 904 and 906), with the least-cost option (lowest summed value) chosen as the solution at step 908. Note that the values may have been previously determined, or can be determined at the time of the unlink operation. Steps 910 and 912 repeat the process until no pairs remain to be checked for crossing links.

By way of example, consider the following fragments from the first and second XML files:

TABLE-US-00002 Left File Right File <p>This paragraph gets <p>This is the first moved. </p> paragraph of the rest of the document. </p> <p>This is the first <p>This is the second paragraph of the rest of the paragraph of the rest of the document. </p> document. </p> <p>This is the second <p>This paragraph gets paragraph of the rest of the moved. </p> document. </p>

FIG. 10A shows how this appears, e.g., when the comparison mechanism 204.sub.2 builds a link tree 1002 for the left file, an XmlElem node is created for each of the three <p> elements and a child XmlText node for the text within each <p> element. Similarly, the right tree 1004 would contain three XmlElem nodes, each with a child XmlText node (not shown). For purposes of the present example, the XmlElem nodes in the left tree are designated A, B, and C, and the elements to which they are mapped are designated A', B', and C', respectively, while the child text nodes are not shown.

When the comparison mechanism 204.sub.2 maps equal subtrees, each paragraph in the left tree is mapped to the matching paragraph in the right tree, as indicated by the dashed lines between the nodes. Thus, each XmlElem node is the root of a mapped subtree composed of two nodes (the other node being the XmlText node, not shown).

However, note that the order of the nodes in the right tree differs from those in the left three; i.e., A<B<C, but B'<C'<A'. As a result, after the comparison mechanism links the mapped subtrees, L(A,A'), L(B,B'), and L(C,C') are the link nodes. When the comparison mechanism compares the first two link nodes (step 902), they are found to be not in order, as A'>B' because A is the first paragraph if the left file, but A' is the last paragraph of the right file. Because A' is also greater than C', the two choices are to unlink L(A,A'), or unlink both L(B,B'), and L(C,C'). Choosing the least-cost solution via steps 904 and 906, the comparison mechanism unlinks L(A,A') and unmaps the corresponding nodes at step 908, resulting in the link being logically removed as represented in FIG. 10B.

At this time, the order of the nodes in the link tree matches the order of the corresponding nodes in the left and right trees. However, the hierarchical relationships between the nodes in the link tree need to match those between the corresponding nodes in the left and right trees. To this end, an evaluation of how linked nodes are grouped in each XML tree, by virtue of having common ancestors, is performed.

In general, a


Free Web Sudoku Puzzles.
Solve with your browser.
    4 3 9   5    
    5     1      
9     4       1 7
              3 5
  4 1       9 7  
3 2              
6 8       3     2
      5     6    
    7   8 2 1    
What is it?



Add Your Site · Terms Of Service · Privacy Policy


DISCLAIMER
Linkgrinder is a free service that searches the Internet and indexes all files found so that you may search quickly and easily for shared files. These files are created and made available individually by users whose identity we are not aware of and who we have no control over. In essence we function like a search engine tool; these files ARE NOT STORED OR SERVED BY OUR NETWORK. We are not responsible for any materials obtained by using our service. We do not monitor any of the contents of these files. These files may contain viruses, illegal materials, materials inappropriate for minors, offensive files and the like. BY USING OUR SERVICE, YOU ASSUME FULL RESPONSIBILITY FOR DOWNLOADING THESE MATERIALS AND WILL INDEMNIFY US FOR ANY DAMAGES THAT MAY BE INCURRED.

For More Specific Information VIEW OUR TERMS OF SERVICE.

Thank you and Enjoy!