Title: Method and apparatus for topology database re-synchronization in communications networks having topology state routing protocols
Abstract: There is provided a method and apparatus for synchronization of topology state information between two network nodes in a communications network. The communications network has a routing protocol for intermittent advertisement of local state information throughout the network. The two network nodes include a requesting node which initiates a request for topology state synchronization and a replying node which receives the request. The replying node communicates with the requesting node to provide topology state information to the requesting node which is not possessed by the requesting node when the requesting node initiates its request. The method includes the step of selecting, prior to the request being made by the requesting node to the replying node, between a first and a second mode of synchronization. The first mode provides for topology state synchronization which entails withdrawal of the intermittent advertisement of local state information as it pertains respectively to the requesting node and to the replying node. The second mode provides for topology state synchronization which maintains the intermittent advertisement of local state information as it pertains respectively to the requesting node and to the replying node.
Patent Number: 6,876,625 Issued on 04/05/2005 to McAllister,   et al.
| Inventors:
|
McAllister; Shawn P. (Manotic, CA);
Rajsic; Carl (Nepean, CA)
|
| Assignee:
|
Alcatel Canada Inc. (Kanata, CA)
|
| Appl. No.:
|
663793 |
| Filed:
|
September 18, 2000 |
| Current U.S. Class: |
370/221; 370/225; 370/254 |
| Intern'l Class: |
H04L 012//28 |
| Field of Search: |
370/16.1,84-85,216,223,252,254,377,410,221,225
707/203-204
709/201,227
713/501
714/1-6
|
References Cited [Referenced By]
U.S. Patent Documents
| 5473599 | Dec., 1995 | Li et al.
| |
| 5687168 | Nov., 1997 | Iwata | 370/255.
|
| 5970502 | Oct., 1999 | Salkewicz et al. | 707/201.
|
| 5974114 | Oct., 1999 | Blum et al. | 379/9.
|
| 6230164 | May., 2001 | Rekieta et al. | 707/201.
|
| 6473408 | Oct., 2002 | Rochberger et al. | 370/255.
|
Other References
Private Network-Network Interface Specification Version 1.0,
af-pnni-0055.000, Mar. 1996, The ATM Forum Technical Committee, pp
47-172.*
Knight, et. al., "Virtual Router Redundancy Protocol" (The Internet
Society, Network Working Group Request for Comments 2338, Apr. 1998).
Li, et. al, "Cisco Hot Standby Router Protocol (HRSP)" (The Internet
Society, Network Working Group Request for Comments 2281, Mar. 1998).
Moy, J., "OSPF Version 2" (The Internet Society, Network Working Group
Request for Comments 2328, Apr. 1998).
"Private Network-Network Interface Specification Version 1.0 (PNNI 1.0)",
af-pnni-0055.000 (The ATM Forum, Technical Committee, Mar. 1996).
|
Primary Examiner: Chin; Wellington
Assistant Examiner: Shew; John
Attorney, Agent or Firm: McCarthy Tetrault LLP
Claims
What is claimed is:
1. A method for recovery from a failure which affects an active routing
entity in a communications network, the active routing entity being
associated with a network node of the communications network, the
communications network comprising a routing protocol for intermittent
advertisement of local state information throughout the network and
further comprising an inactive routing entity to which network connections
of the network node can be diverted from the active routing entity upon
the said failure, the method comprising the steps of:
(a) upon the failure, executing an activity switch between the active
routing entity and the inactive routing entity, wherein network
connections of the network node are diverted from the active routing
entity to the inactive routing entity to thereby transform the inactive
routing entity into a newly active routing entity;
(b) following the activity switch, exchanging topology state information
between the newly active routing entity and each immediately adjacent
neighbour node of the network node associated with said failure such that
the newly active routing entity and every said immediately adjacent
neighbour node respectively possess synchronized topology state
information; and
wherein the exchange of topology slate information between the newly active
routing entity and each said immediately adjacent neighbour node is
performed without withdrawal, by the network node associated with said
failure and by each said immediately adjacent neighbour node, of the said
intermittent advertisement of local state information as it pertains
respectively to the network node associated with said failure and to each
said immediately adjacent neighbour node.
2. The method according to claim 1, further comprising the step of
transmitting topology state information to the inactive routing entity
prior to the failure, whereby the active routing entity and the inactive
routing entity both share a common understanding of overall network
topology status immediately following said transmission of topology state
information.
3. The method according to claim 2, wherein the transmission of topology
state information is periodic.
4. The method according to claim 3, wherein the periodic transmission of
topology state information to the inactive routing entity is from the
active routing entity.
5. The method according to claim 2, flier comprising the step of
transmitting local state information to the inactive routing entity prior
to the failure, whereby the active routing entity and the inactive routing
entity both share a common understanding of local status immediately
following said transmission of local state information.
6. The method according to claim 5, wherein the transmission of local state
information is periodic.
7. The method according to claim 6, wherein the periodic transmission of
local state information to the inactive routing entity is from the active
routing entity.
8. The method according to claim 5, wherein the local state information
comprises local link status information and local nodal status
information.
9. The method according to claim 8, wherein the local link status
information is selected from the group consisting of link characteristics,
link operational status and port identifiers.
10. The method according to claim 8, wherein the local nodal status
information is selected from the group consisting of node identifier, peer
groups identifier, distinguished node election status, distinguished node
leadership state and local reachable addresses.
11. The method according to claim 7, wherein the active routing entity and
the inactive routing entity each forms part of the network node associated
with said failure.
12. The method according to claim 11, wherein the active routing entity and
the inactive routing entity are each implemented by way of distinct
physical components.
13. The method according to claim 5, wherein the communications network is
an Asynchronous Transfer Mode (ATM) network and the routing protocol for
intermittent advertisement of local state information throughout the
communications network is the PNNI protocol.
14. The method according to claim 5, wherein the communications network is
an Internet Protocol (EP) network and the routing protocol for
intermittent advertisement of local state information throughout the
communications network is the Open Shortest Path First (OSPF) protocol.
15. The method according to claim 13, wherein topology state information
transmitted from the active routing entity prior to the failure is
extracted from a topology database associated with the active routing
entity, and wherein topology state information which is exchanged
following the activity switch between the newly active routing entity and
every said immediately adjacent neighbour node is extracted from a
topology database respectively associated with the newly active routing
entity and every said immediately adjacent neighbour node.
16. The method according to claim 15, wherein topology state information is
transmitted from the active routing entity to the inactive routing entity
prior to the failure by bundling the topology state information into PNNI
Topology State Elements (PTSE).
17. The method according to claim 15, wherein topology state intonation is
exchanged following the activity switch between the newly active routing
entity and every said immediately adjacent neighbour node by bundling the
topology state information into PNNI Topology State Elements (PTSE).
18. The method according to claim 16, wherein each PTSE is encapsulated
within PNNI Topology State Packets (PTSP) for said transmission.
19. The method according to claim 17, wherein each PTSE is encapsulated
within PNNI Topology State Packets (PTSP) for said exchange.
20. The method according to claim 15, wherein the newly active routing
entity, prior to the said exchange of topology state information with each
said immediately adjacent neighbour node, notifies each said immediately
adjacent neighbour node that the said exchange is to take place without
said withdrawal of the said intermittent advertisement of local state
information.
21. The method according to claim 20, wherein the notifying of the said
exchange of topology state information without said withdrawal of the said
intermittent advertent of local state information takes place by way of a
flag in a notification message sent to each immediately adjacent neighbour
node of the network node associated with said failure.
22. The method according to claim 21, wherein the notification message is a
PNNI Database Summary packet in which said flag is provisioned.
23. A network element for recovery from a failure in a communications
network which includes a routing protocol for intermittent advertisement
of local state information throughout the network, the network element
comprising:
an active routing entity, wherein the active routing entity is associated
with topology state information concerning the communications network;
an inactive routing entity, wherein an activity switch is executed between
the active routing entity and the inactive routing entity upon failure of
the active routing entity to thereby divert network connections from the
active routing entity to the inactive routing entity and transform the
inactive routing entity into a newly active routing entity;
a database synchronization processor, wherein the database synchronization
processor effects an exchange of topology state information between the
newly active routing entity and each immediately adjacent neighbour node
of the network element following the activity switch such that the newly
active routing entity and every said immediately adjacent neighbour node
respectively possess synchronized topology state information, and wherein
the said exchange of topology state information between the newly active
routing entity and each said immediately adjacent neighbour node is
performed without withdrawal, by the network node associated with said
failure and by each said immediately adjacent neighbour node, of the said
intermittent advertisement of local state information as it pertains
respectively to the network element and to each immediately adjacent
neighbour node.
24. The network element according to claim 23, wherein the topology state
information is transmitted to the inactive routing entity prior to tbe
failure of the active routing entity, such that the active routing entity
and the inactive routing entity both share a common understanding of
overall network topology status following said transmission of topology
state information.
25. The network element according to claim 24, wherein the transmission of
topology state information is periodic.
26. The network element according to claim 25, wherein the periodic
transmission of topology state information to the inactive routing entity
is from the active routing entity.
27. The network element according to claim 24, wherein local state
information is transmitted to the inactive routing entity prior to the
failure of the active routing entity, whereby the active routing entity
and the inactive routing entity both share a common understanding of local
status immediately following said transmission of local state information.
28. The network element according to claim 27, wherein the transmission of
local state intonation is periodic.
29. The network element according to claim 28, wherein the periodic
transmission of local state information to the inactive routing entity is
from the active routing entity.
30. The network element according to claim 24, wherein the local state
information comprises link status information and local nodal status
information.
31. The network element according to claim 30, where the local link status
information is selected from the group consisting of link characteristics,
link operational status and port identifiers.
32. The network element according to claim 31, wherein the local nodal
status information is selected from the group consisting of node
identifier, peer group identifier, distinguished node election status,
distinguished node leadership status and local reachable addresses.
33. The network element according to claim 32, wherein the active routing
entity and the inactive routing entity are each implemented by way of
distinct physical components.
34. The network element according to claim 26, wherein the communications
network is an Asynchronous Transfer Mode (ATM) network and the routing
protocol for intermittent advertisement of local stare information
throughout the communications network is the PNNI protocol.
35. The network element according to claim 26, wherein the communications
network is an Internet Protocol (IP) network and the routing protocol for
intermittent advertisement of link status information throughout the
communications network is the Open Shortest Path First (OSPF) protocol.
36. The network element according to claim 34, wherein topology state
information transmitted from the active routing entity prior to the
failure of the active routing entity is extracted from a topology database
associated with the active routing entity, and wherein topology state
information which is exchanged following the activity switch between the
newly active routing entity and every said immediately adjacent neighbour
node is extracted from a topology database respectively associated with
the newly active routing entity and every said immediately adjacent
neighbour node.
37. The network element according to claim 23, wherein topology state
information is transmitted from the active routing entity to the inactive
routing entity prior to the failure by bundling the topology state
information into PNNI Topology State Elements (PTSE).
38. The network element according to claim 23, wherein topology state
information is exchanged following the activity switch between the newly
active routing entity and every said immediately adjacent neighbour by
bundling the topology state information into PNNI Topology State Elements
(PTSP).
39. The network element according to claim 37, wherein each PTSE is
encapsulated within PNNI Topology State Packets (PTSP) for said
transmission.
40. Th network element according to claim 38, wherein each PTSE is
encapsulated within PNNI Topology State Packets (PTSP) for said exchange.
41. The network element according to claim 23, wherein the newly active
routing entity, prior to the said exchange of topology state information
with each said immediately adjacent neighbour node, notifies each said
immediately adjacent neighbour node that the said exchange is to take
place without said withdrawal of the said intermittent advertisement of
local state information.
42. The network element according to claim 41, wherein the notifying of the
said exchange of topology state information without said withdrawal of the
said intermittent advertisement of local state information takes place by
way of a flag in a notification message sent to each immediately adjacent
neighbour node of the network element.
43. The network element according to claim 42, wherein the notification
message is a PNNI Database Summary packet in which said flag is
provisioned.
44. A method for synchronization of topology state information between two
network nodes in a communications network, the communications network
comprising a routing protocol for intermittent advertisement of local
state information throughout the network, the two network nodes comprising
a requesting node which initiates a request for topology site
synchronization and a replying node which receives said request and which
communicates with the requesting node to provide topology state
information to the requesting node which is not possessed by the
requesting node when same initiates said request, the method comprising
the step of selecting prior to said request being made by the requesting
node to the replying node, between a first and a second mode of
synchronization, the first said mode providing for topology state
synchronization which entails withdrawal, by the said requesting node and
by the said replying node, of the said intermittent advertisement of local
state information as it pertains respectively to the requesting node and
to the replying node, and the second said mode providing for topology
state synchronization which maintains the said intermittent advertisement
of local state information as it pertains respectively to the requesting
node and to the replying node.
45. The method according to claim 44, further comprising the step of
exchanging topology state information between the requesting node and the
replying node once the request for topology state synchronization has been
made by the requesting node to the replying node.
46. The method according to claim 45, wherein prior to the said exchange of
topology state information, the requesting node notifies the replying node
that the said exchange is to take place according to one of the said first
mode and the said second mode of synchronization.
47. The method according to claim 46, wherein said notification takes place
by way of a flag in a notification message sent by the requesting node to
the replying node.
48. The method according to claim 47, wherein the topology state
information which is exchanged between the requesting node and the
replying node is extracted from topology state databases each respectively
associated with the said requesting and replying nodes.
49. The method according to claim 47, wherein the communications network is
an Asynchronous Transfer Mode (ATM) network and the routing protocol for
intermittent advertisement of local state information throughout the
network is the PNNI protocol.
50. The method according to claim 47, wherein the communications network is
an Internet Protocol (IP) network and the routine protocol for
intermittent advertisement of local state information throughout the
network is the Open Shortest Path First (OSPF) protocol.
51. The method according to claim 49, wherein the notification message is a
PNNI Database Summary packet in which said flag is provisioned.
52. The method according to claim 51, wherein the topology state
information is exchanged by bundling the topology state information into
PNNI Topology State Elements (PTSE).
53. The method according to claim 52, wherein each PTSE is encapsulated
within PNNI Topology State Packets (PTSP).
54. A network element for synchronization of topology state information
between two network nodes in a communications network, the communications
network comprising a routing protocol for intermittent advertisement of
local state information throughout the network, the two network nodes
comprising a requesting node which initiates a request for topology state
synchronization and a replying node which receives said request and which
communicates with the requesting node to provide topology state
information to the requesting node which is not possessed by the request
node when same initiates said request, the network element selectively
operating in one of two modes of synchronization, wherein a first mode
thereof effects topology state synchronization between the requesting node
and the replying node which entails withdrawal, by the said requesting
node and by the said replying node, of the said intermittent advertisement
of local state information as it pertains respectively to the requesting
node and to the replying node, and wherein a second mode thereof effects
topology state synchronization between the requesting node and the
replying node which maintains the said intermittent advertisement of local
state information as it pertains respectively to the requesting node and
to the replying node.
55. The network element according to claim 54, wherein topology state
information is exchanged between the requesting node and the replying node
once the request for topology state synchronization has been made by the
requesting node to the replying node.
56. The network element according to claim 55, wherein prior to the said
exchange of topology state information, the requesting node notifies the
replying node that the said exchange is to take place according to one of
the said first mode and the said second mode of synchronization.
57. The network element according to claim 56, wherein said notification
takes place by way of a flag in a notification message sent by the
requesting node to the replying node.
58. The network element according to claim 56, wherein the topology state
information which is exchanged between the requesting node and the
replying node is extracted from topology state databases each respectively
associated with the said requesting and replying nodes.
59. The network element according to claim 57, wherein the communications
network is an Asynchronous Transfer Mode (ATM) network and the routing
protocol for intermittent advertisement of local state information
throughout the network is the PNNI protocol.
60. The network element according to claim 57, wherein the communications
network is an Internet Protocol (IP) network and the routing protocol for
intermittent advertisement of local state information throughout the
network is the Open Shortest Path First (OSPF) protocol.
61. The network element according to claim 59, wherein the notification
message is a PNNI Database Summary packet in which said flag is
provisioned.
62. The network element according to claim 61, wherein the topology state
information is exchanged by bundling the topology state information into
Pi Topology State Elements (PTSE).
63. The network element according to claim 62, wherein each PTSE is
encapsulated within PNNI Topology State Packets (PTSP).
64. A method for synchronization of topology state information between a
first network node and a second network node in a communications network,
the communications network comprising a routing protocol for exchange of
local state information throughout the network, the first network node
initiating a request for topology state synchronization and the second
network node receiving said request and communicating with the first
network node to provide topology state information to the first network
node, the topology state synchronization taking place according to a first
mode thereof wherein the said exchange of local state information, as it
pertains respectively to the first network node and to the second network
node, is not withdrawn.
65. The method according to claim 64, wherein a second mode of topology
state synchronization is provided, the second mode of topology state
synchronization entailing withdrawal, by the first network node and by the
second network node, of the said exchange of local state information as it
pertains respectively to the first network node and to the second network
node.
66. The method according to claim 65, further comprising the step of
selecting, prior to said request being initiated by the first network node
to the second network node, between the first mode of topology state
synchronization and the second mode of topology state synchronization.
67. The method according to claim 66, wherein the second network node
communicates with the first network node to provide topology state
information to the first network node which is not possessed by the first
network node when same initiates said request.
68. The method according to claim 67, wherein prior to the said exchange of
topology state information, the first network node notifies the second
network node that the said exchange of topology state information is to
take place according to one of the said first mode and the said second
mode of synchronization.
69. The method according to claim 68, wherein said notification takes place
by way of a flag in a notification message sent by the first network node
to the second network node.
70. The method according to claim 69, wherein the topology state
information which is exchanged between the first network node and the
second network node is extracted from topology state databases each
respectively associated with the said first and second network nodes.
71. The method according to claim 70, wherein the communications network is
an Asynchronous Transfer Mode (ATM) network and the routing protocol for
exchange of local state information throughout the network is the PNNI
protocol.
72. The method according to claim 70, wherein the communications network is
an Internet Protocol (IP) network and the routing protocol for exchange of
local state information throughout the network is the Open Shortest Path
First (OSPF) protocol.
73. The method according to claim 71, wherein the notification message is a
PNNI Database Summary packet in which said flag is provisioned.
74. The method according to claim 73, wherein the topology state
modification is exchanged by bundling the topology state information into
PNNI Topology State Elements (PTSE).
75. The method according to claim 74, wherein each PTSE is encapsulated
within PNNI Topology State Packets (PTSP).
76. A network element for synchronization of topology state information
between a first network node and a second network node in a communications
network, the communications network comprising a routing protocol for
exchange of local state information throughout the network, the first
network node initiating a request for topology state synchronization and
the second network node receiving said request and communicating with the
first network node to provide topology state information to the first
network node, the topology state synchronization taking place according to
a first mode thereof wherein the said exchange of local state information,
as it pertains respectively to the first network node and to the second
network node, is not withdrawn.
77. The network element according to claim 76, wherein a second mode of
topology state synchronization is provided, the second mode of topology
state synchronization entailing withdrawal, by the first network node and
by the second network node, of the said exchange of local state
information as it pertains respectively to the first network node and to
the second network node, the network element selectively operating in one
of the two said modes of topology state synchronization.
78. The network element according to claim 77, wherein prior to said
request being initiated by the first network node to the second network
node, the network element selects between the first mode of topology state
synchronization and the second mode of topology state synchronization.
79. The network element according to claim 78, wherein prior to said
request being initiated by the first network node to the second network
node, the first network node notifies the second network node that the
said exchange of topology state information is to take place according to
one of the first mode of topology state synchronization and the second
mode of topology state synchronization.
80. The network element according to claim 79, wherein the second network
node communicates with the first network node to provide topology state
information to the first network node which is not possessed by the first
network node when same initiates said request.
81. The method according to claim 80, wherein said notification takes place
by way of a flag in a notification message sent by the first network node
to the second network node.
82. The method according to claim 81, wherein the topology state
information which is exchanged between the first network node and the
second network node is extracted from topology state databases each
respectively associated with the said first and second network nodes.
83. The method according to claim 82, wherein the communications network is
an Asynchronous Transfer Mode (ATM) network and the routing protocol for
exchange of local state information throughout the network is the PNNI
protocol.
84. The method according to claim 82, where the communications network is
an Internet Protocol (IP) network and the routing protocol for exchange of
local sate information throughout the network is the Open Shortest Path
First (OSPF) protocol.
85. The method according to claim 83, wherein the notification message is a
PNNI Database Summary packet in which said flag is provisioned.
86. The method according to claim 85, wherein the topology state
information is exchanged by bundling the topology state information into
PNNI Topology State Elements (PTSE).
87. The method according to claim 86, wherein each PTSE is encapsulated
within PNNI Topology State Packets (PTSP).
Description
FIELD OF THE INVENTION
The present invention relates generally to the field of network topology
database re-synchronization in communications networks having topology
state routing protocols and more particularly, to a method and apparatus
for effecting network topology database re-synchronization in such
networks. For example, the present invention is well-suited to database
re-synchronization in the context of redundancy recovery following a nodal
failure or a reset affecting an active routing entity associated with a
network node.
BACKGROUND OF THE INVENTION
Topology state routing protocols are employed in communications networks in
order to disseminate or advertise topology state information among nodes
and node clusters within such networks. The advertised topology state
information is in turn utilized to compute optimized paths for
communications throughout a given network. As used in the present
application, reference to topology state information signifies state
information for the network domain as a whole. In certain network
protocols, topology state information includes both link state information
and nodal state information. For instance, link state information will
include such attributes as link characteristics, link operational status,
port identifiers and remote neighbour information concerning adjacent
neighbour nodes. Nodal state information will include such attributes as
node identifiers, peer group identifiers, distinguished node election
status, distinguished node leadership status and local reachable address
information.
Whereas topology state information will refer to state information for a
network domain as a whole, the present application will make reference to
local state information when dealing with state information which is
locally originated by a particular network node. Local link status
information will reflect a given node's understanding of the status of
communication with its peer nodes. Thus, local link status information,
similarly to topology link status information, will also include such
attributes as link characteristics, link operational status, port
identifiers and remote neighbour information concerning adjacent neighbour
nodes, but these will pertain to a given network node as opposed to a
variety of nodes forming part of a network domain. Likewise, local nodal
state information will comprise such attributes as node identifiers, peer
group identifiers, distinguished node election status, distinguished node
leadership status and local reachable address information. Again, these
will pertain to a given node when reference is made to local nodal state
information, instead of pertaining to the network domain as a whole when
reference is made to topology nodal state information. In the present
application, reference to state information will signify both topology
state information and local state information.
In some known topology state protocols, certain nodes in a communications
network may take on distinguished or additional responsibilities in order
to make the routing function for the network operate properly. For
instance, in the Open Shortest Path First (OSPF) IP routing protocol as
described in J. Moy: "OSPF Version 2", STD 54, RFC 2328, dated April 1998,
a node identified as the Designated Router (DR) would assume such
responsibilities. Similarly, in the Private Network-Node Interface or
Private Network-to-Network Interface (PNNI) protocol, responsibilities of
this nature are assumed by a node termed the Peer Group Leader (PGL). The
PNNI protocol is specified in the documents entitled: (i) "Private Network
Interface Specification Version 1.0", ATM Forum document no.
af-pnni-0055.000 dated March 1996, (ii) "Private Network--Network
Interface Specification Version 1.0 Addendum (Soft PVC MIB)", ATM Forum
document no. af-pnni-0066.000 dated September 1996 and (iii) "Addendum to
PNNI V 1.0 for ABR parameter negotiation", ATM Forum document no.
af-pnni-0075.000 dated January 1997, together with amendments found in
(iv) "PNNI V1.0 Errata and PICS, ATM Forum document no. af-pnni-0081.000
dated May 1997 (hereafter all of the foregoing documents (i) through (iv),
inclusively, are collectively referred to as the "PNNI Specification").
The PNNI Specification is hereby incorporated by reference.
A given physical node within a network space may acquire distinguished
network responsibilities of the type mentioned above by a process known as
distributed election. In a scheme of distributed election, all nodes at a
particular level of a network hierarchy will communicate to select the
node which is to assume additional tasks or responsibilities in relation
to the topology state protocol. Those skilled in this art will understand
that performing the process of distributed election will take varying
amounts of time depending on the particular network environment. As well,
if due to downtime the distinguishing position is not being filled by a
given network node, the routing functions of a portion of the network or
of the network domain as a whole may exhibit reduced capabilities or
inefficiency during the downtime interval. Thus, it can be expected that
in communications networks which utilize topology state protocols, a
recovery interval must be tolerated by the network routing system
subsequent to the failure of a network node. For instance, this may occur
to varying degrees of severity whenever the failed node impacts the
functions of an elected network node having the additional
responsibilities referred to earlier.
Certain routing protocols specify a given level of node redundancy. This
redundancy is intended to reduce the recovery time of the network routing
system in the event of a failure that affects a node which performs
distinguished protocol functions of the kind mentioned previously. For
example, in the OSPF protocol, the use of a Backup Designated Router (BDR)
is specified. The Backup Designated Router is mandated to detect a failure
affecting the currently appointed Designated Router. Upon detecting such a
failure, the Backup Designated Router will be called upon to take recovery
action to declare itself the new Designated Router in the place of the
failed former Designated Router. All other routers on the affected portion
of the shared network will thereafter be notified of the existence of the
new Designated Router node. Thus, although it is not necessary to
re-execute a dynamic election process under the OSPF protocol following a
failure which impacts a Designated Router node, a network routing outage
of some duration will nevertheless be experienced by all routers and hosts
on the shared network that were originally served by the failed Designated
Router node. This is because the affected routers and hosts participate in
recovering the functions of the network routing system following a failure
which impacts their associated Designated Router node.
On the other hand, in the PNNI protocol, no provision is currently made for
distinguished node redundancy. As such, the distributed election process
and its associated protocol actions must be re-executed upon any failure
affecting a distinguished network node. In the PNNI protocol, a physical
node which performs the Peer Group Leader function at one level of the
topology hierarchy may be performing this function at several other levels
of the hierarchy. Thus, a failure affecting such a physical node may very
well impact a large part of the aggregated network. Furthermore, there is
no provision in the current PNNI protocol for a backup Peer Group Leader.
Thus, a failure which affects a multilevel Peer Group Leader of the kind
described above must be detected by all logical nodes which form part of
the various Peer Groups that are represented by the multilevel Peer Group
Leader. These logical nodes at different levels of the network hierarchy
must thereafter elect a new Peer Group Leader. As with the example given
previously in relation to the OSPF protocol, the failure of the Peer Group
Leader may be known to many nodes and hence such nodes must generally all
participate in recovering the affected functions of the routing system.
Given this, the failure of a Peer Group Leader in a PNNI network may
conceivably impact a large portion of the network and may in many
circumstances cause disruption of the routing behaviour of the network for
a period of time which may be unacceptable to service providers or end
users.
The discussion above has addressed the impact of a failure affecting a
network node which has distinguished responsibilities. However, it will be
appreciated by those versed in this art that a failure concerning an
ordinary physical or logical node which does not possess distinguished
responsibilities will also result in some measure of disruption to the
routing capabilities of the neighbouring nodes or devices that are
serviced by the failed ordinary node. Although in some node architectures
it may be possible to retain certain network functions such as packet
forwarding or call processing in the event of a routing function failure,
topology state protocols such as OSPF and PNNI require each network node
of a domain to synchronize a topology database with its neighbours before
being admitted to the routing system. Such topology database
synchronization must take place in these network protocols in order to
recover from the failure of a node. The synchronization process may
consume seconds or minutes in the overall scheme of recovery, depending on
the circumstances. During the synchronization, network devices serviced by
the failed node will be impacted and hence routing functionality may very
well suffer disruption. While the discussion above has focussed on the
challenges surrounding recovery from a nodal failure, those skilled in
this art will understand that analogous problems arise stemming from other
events which would require a node to undertake a synchronization of its
topology database, for instance a reset of the routing processor
associated with a network node.
Certain mechanisms have been developed in the prior art to ensure a
switchover between distinct routers in a manner that is transparent to
hosts which use a failed router. The Hot Standby Router Protocol described
in T. Li, B. Cole, P. Morton and D. Li: "Cisco Hot Standby Router Protocol
(HSRP)", RFC 2281, dated March 1998, and the IP Standby Protocol according
to P. Higginson and M. Shand: "Development of Router Clusters to Provide
Fast Failover in IP Networks", 9 Digital Technical Journal, No. 3, dated
Winter 1997, are two examples of such transparent router switchover
schemes. However, as will be explained in greater detail below, switchover
mechanisms of this type do not generally ensure that the switchover will
be universally transparent to the routers or nodes in the network beyond
the particular hosts or nodes immediately adjacent the failed node. In the
prior art, the failure of a node is typically recovered by means of a
distinct and different node. It would therefore be advantageous to provide
a mechanism that would allow the failure of a routing component of a node
to be recovered by another routing component of the same node in a manner
transparent to all nodes but its immediate neighbours.
Accordingly, prior art topology state routing protocols present problems
and challenges when faced with a situation of recovery from a nodal
failure or with other situations which may require a node to synchronize
its topology database once it has previously done so, and these problems
and challenges arise whether or not the node immediately affected by the
failure has distinguished responsibilities. First, known recovery
mechanisms typically disrupt the routing functions of at least a part of a
network and cause a service impact to certain of the devices utilizing the
network. The portion of the network affected will vary in the
circumstances. For instance, the impacted portion of the network can be
expected to be more extensive for a node performing distinguished
functions than is the case for a node that does not perform such
functions. As well, the impacted portion can be expected to be more
expansive for a failure concerning a PNNI Peer Group Leader than for one
which influences an OSPF Designated Router. Second, the time required to
recover from a node or link failure will vary, but may be in the order of
up to several minutes or longer. As mentioned above, this time frame may
be unacceptable to certain service providers or end users. Third, since
many nodes will have to be made aware of the failure and are therefore
required to participate in the recovery process, network resources in the
nature of bandwidth and processing time will be diverted. This will
detract from other network activities in general and may decrease the
performance and stability of the network routing system in particular.
It is therefore generally an object of the present invention to seek to
provide a method and apparatus for database re-synchronization in a
network having a topology state routing protocol, particularly well-suited
to the context of redundancy recovery following a nodal failure associated
with the routing entity of a network node, and pursuant to which some of
the problems exhibited by alternative prior art techniques and devices may
in some instances be alleviated or overcome.
SUMMARY OF THE INVENTION
According to a first broad aspect of the present invention, there is
provided a method for recovery from a failure which affects an active
routing entity in a communications network, the active routing entity
being associated with a network node of the communications network, the
communications network comprising a routing protocol for intermittent
advertisement of local state information throughout the network and
further comprising an inactive routing entity to which network connections
of the network node can be diverted from the active routing entity upon
the failure, the method comprising the steps of: (a) upon the failure,
executing an activity switch between the active routing entity and the
inactive routing entity, wherein network connections of the network node
are diverted from the active routing entity to the inactive routing entity
to thereby transform the inactive routing entity into a newly active
routing entity; (b) following the activity switch, exchanging topology
state information between the newly active routing entity and each
immediately adjacent neighbour node of the network node associated with
said failure such that the newly active routing entity and every said
immediately adjacent neighbour node respectively possess synchronized
topology state information; and wherein the exchange of topology state
information between the newly active routing entity and each said
immediately adjacent neighbour node is performed without withdrawal, by
the network node associated with said failure and by each said immediately
adjacent neighbour node, of the said intermittent advertisement of local
state information as it pertains respectively to the network node
associated with said failure and to each said immediately adjacent
neighbour node.
According to a second broad aspect of the present invention, there is
provided a network element for recovery from a failure in a communications
network which includes a routing protocol for intermittent advertisement
of local state information throughout the network, the network element
comprising: an active routing entity, wherein the active routing entity is
associated with topology state information concerning the communications
network; an inactive routing entity, wherein an activity switch is
executed between the active routing entity and the inactive routing entity
upon failure of the active routing entity to thereby divert network
connections from the active routing entity to the inactive routing entity
and transform the inactive routing entity into a newly active routing
entity; a database synchronization processor, wherein the database
synchronization processor effects an exchange of topology state
information between the newly active routing entity and each immediately
adjacent neighbour node of the network element following the activity
switch such that the newly active routing entity and every said
immediately adjacent neighbour node respectively possess synchronized
topology state information, and wherein the said exchange of topology
state information between the newly active routing entity and each said
immediately adjacent neighbour node is performed without withdrawal, by
the network node associated with said failure and by each said immediately
adjacent neighbour node, of the said intermittent advertisement of local
state information as it pertains respectively to the network element and
to each immediately adjacent neighbour node.
According to a third broad aspect of the present invention, there is
provided a method for synchronization of topology state information
between two network nodes in a communications network, the communications
network comprising a routing protocol for intermittent advertisement of
local state information throughout the network, the two network nodes
comprising a requesting node which initiates a request for topology state
synchronization and a replying node which receives said request and which
communicates with the requesting node to provide topology state
information to the requesting node which is not possessed by the
requesting node when same initiates said request, the method comprising
the step of selecting, prior to said request being made by the requesting
node to the replying node, between a first and a second mode of
synchronization, the first said mode providing for topology state
synchronization which entails withdrawal, by the said requesting node and
by the said replying node, of the said intermittent advertisement of local
state information as it pertains respectively to the requesting node and
to the replying node, and the second said mode providing for topology
state synchronization which maintains the said intermittent advertisement
of local state information as it pertains respectively to the requesting
node and to the replying node.
According to a fourth broad aspect of the present invention, there is
provided a network element for synchronization of topology state
information between two network nodes in a communications network, the
communications network comprising a routing protocol for intermittent
advertisement of local state information throughout the network, the two
network nodes comprising a requesting node which initiates a request for
topology state synchronization and a replying node which receives said
request and which communicates with the requesting node to provide
topology state information to the requesting node which is not possessed
by the requesting node when same initiates said request, the network
element selectively operating in one of two modes of synchronization,
wherein a first mode thereof effects topology state synchronization
between the requesting node and the replying node which entails
withdrawal, by the said requesting node and by the said replying node, of
the said intermittent advertisement of local state information as it
pertains respectively to the requesting node and to the replying node, and
wherein a second mode thereof effects topology state synchronization
between the requesting node and the replying node which maintains the said
intermittent advertisement of local state information as it pertains
respectively to the requesting node and to the replying node.
According to a fifth broad aspect of the present invention, there is
provided a method for synchronization of topology state information
between a first network node and a second network node in a communications
network, the communications network comprising a routing protocol for
exchange of local state information throughout the network, the first
network node initiating a request for topology state synchronization and
the second network node receiving said request and communicating with the
first network node to provide topology state information to the first
network node, the topology state synchronization taking place according to
a first mode thereof wherein the said exchange of local state information,
as it pertains respectively to the first network node and to the second
network node, is not withdrawn.
According to a sixth broad aspect of the present invention, there is
provided a network element for synchronization of topology state
information between a first network node and a second network node in a
communications network, the communications network comprising a routing
protocol for exchange of local state information throughout the network,
the first network node initiating a request for topology state
synchronization and the second network node receiving said request and
communicating with the first network node to provide topology state
information to the first network node, the topology state synchronization
taking place according to a first mode thereof wherein the said exchange
of local state information, as it pertains respectively to the first
network node and to the second network node, is not withdrawn.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic representation of a hierarchical network topology
associated with a network domain operating according to the PNNI routing
protocol in which the method and apparatus of the present invention may be
implemented, and showing a parent-child relationship between groups of
nodes forming part of the network topology;
FIG. 2 is a state machine diagram which illustrates various states and
transition events for a neighbouring peer Finite State Machine of the PNNI
routing protocol as known in the prior art;
FIG. 3 is a state machine diagram which illustrates various states and
transition events for a neighbouring peer Finite State Machine of the PNNI
routing protocol as modified to implement the present invention; and
FIG. 4 is a block diagram of a hot redundant network element in which the
method of the present invention may be implemented.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
Redundancy techniques for network components or devices, such as hot
redundancy techniques, are generally well known to those skilled in this
art. With reference to FIG. 1, these techniques will be explained using
the illustrative example of a communications network in the form of a PNNI
network domain 30. However, those skilled in this art will understand that
the present invention may be applied or adapted to other types of networks
as well, for instance Internet Protocol (IP) networks for which
intermittent advertisement of local state information is accomplished by
the Open Shortest Path First (OSPF) routing protocol. As well, the present
invention is suited not only to situations of recovery from failures
associated with a routing entity of a network node, but also to other
contexts where it may be necessary or desirable for a network node to
re-synchronize its topology database.
Topology State Routing Protocols and Topology Database Synchronization
The communications network 2 has a network domain 30 which is comprised of
a plurality of network nodes 32 to 41, each of which typically are
switching systems. The network nodes 32 to 41 are interconnected by way of
physical or logical links 42 to 53 that respectively attach two given
switching systems of the network domain. The network element or node 56
(also labeled "A.1.2") of the PNNI network domain 30 is shown as having
assumed the role of Peer Group Leader for the parent Peer Group labelled
"PG(A)", and the presence of node 36 at the level of the parent Peer Group
is the consequence of the leader status of the node 56. Node 36 also
represents a network domain in the form of the child Peer Group 55 (also
labelled "PG(A.1)") which comprises lower-level network nodes 56 to 60.
The lower-level network nodes 56 to 60 are interconnected by way of
physical or logical links 62 to 67 each attaching two given lower-level
switching systems. The functions which define the Peer Group Leader of
PG(A.1) are implemented on the switching system which contains lower-level
node 56 (also labelled "A.1.2"). PG(A.1) is a child peer group of PG(A)
and is represented in PG(A) as a logical node 36, implemented within the
physical switching system 56. Similarly, the parent Peer Group labelled
"PG(A)" may itself be a child Peer Group represented at a higher level of
the routing hierarchy by a single logical node (not shown).
According to known redundancy techniques, the particular node, switch or
other network entity for which fault tolerance protection is desired
usually provides at least two routing processors within a single network
element. A routing processor performs the function of maintaining
connectivity to its adjacent neighbour nodes and of sharing topology state
information with those nodes. Preferably, the routing processors will be
configured by way of distinct physical components. For instance, the
physical components may each be in the form of distinct hardware cards
provisioned within the same network switch, such as within the network
node 56 ("A.1.2"). Where two processors are provided for redundancy
purposes, one of the physical components in question will assume the role
of the active routing entity for the redundant network element and the
other of the physical components will thus be an inactive routing entity
therefor.
Upon detecting the failure of the active routing entity, the inactive
routing entity is called into service to take over the functions of the
failed active routing entity. This procedure is termed an activity switch.
Because both of these routing entities are associated with the same node
(e.g. the network node 56), the node itself need not relinquish any of its
distinguished responsibilities. As well, only immediate neighbouring nodes
of the failed node in the form of immediately adjacent parent Peer Group
nodes (e.g. the network nodes 34, 35, 37, 38) and any immediately adjacent
child Peer Group nodes (e.g. the network nodes 57, 59, 60) need be called
upon or otherwise enlisted to take part in network recovery. However, as
discussed below, current topology state protocols may nevertheless cause
more nodes than those immediately neighbouring the failed node to be
impacted during the recovery process (e.g. the network nodes 32, 33, 39,
40, 41, 58), thereby increasing the time required for recovery to take
place as well as the network resources consumed in the process.
Existing capabilities and techniques may be utilized to implement a scheme
of redundancy protection in a given network architecture, such as the PNNI
network domain 30. For instance, these capabilities and techniques may
include the management of activity status within the various network nodes
and the synchronization of state information between the active and
inactive routing components. This state information for a network topology
is typically stored in a synchronization database, also called a topology
database, which is associated with each network node of a routing domain.
Typically, the synchronization database will be stored within the network
nodes in question. Database synchronization is an existing topology state
routing protocol mechanism which ensures that adjacent nodes within a
network share a common view of the overall topology of the network. Some
signalling protocols, for instance ITU-T Q.2931, have mechanisms such as
Status Enquiry schemes to perform a synchronization of the call state
between two network nodes.
One problem with some known schemes of redundancy is that when a failure
occurs at a network node such as node 56, which implements a higher-level
node 36, the affected links in PG (A) 44, 45, 47 and 51 to and from the
failed node stop being advertised after some time or while a new PGL in PG
(A.1) begins taking over responsibility to implement higher level node 36.
In other words, when the newly active routing processor initiates a
database synchronization with its peers, the current PNNI protocol will
call for the advertisement of local state information from each of the
nodes involved in the synchronization to be removed or withdrawn until
such time as the synchronization has take