Graceful restart and long-lived graceful restart

Available since MOSK 23.2 for Tungsten Fabric 21.4 only TechPreview

Graceful restart and long-lived graceful restart are vital mechanisms within BGP (Border Gateway Protocol) routing, designed to optimize the routing tables convergence in scenarios where a BGP router restarts or a networking failure is experienced, leading to interruptions of router peering.

During a graceful restart, a router can signal its BGP peers about its impending restart, requesting them to retain the routes it had previously advertised as active. This allows for seamless network operation and minimal disruption to data forwarding during the router downtime.

The long-lived aspect of the long-lived graceful restart extends the graceful restart effectiveness beyond the usual restart duration. This extension provides an additional layer of resilience and stability to BGP routing updates, bolstering the network ability to manage unforeseen disruptions.

Caution

Mirantis does not generally recommend using the graceful restart and long-lived graceful restart features with the Tungsten Fabric XMPP helper, unless the configuration is done by proficient operators with at-scale expertise in networking domain and exclusively to address specific corner cases.

Configuring graceful restart and long-lived graceful restart

Tungsten Fabric Operator allows for easy enablement and configuration of the graceful restart and long-lived graceful restart features through the TFOperator custom resource:

spec:
  settings:
    settings:
      gracefulRestart:
        enabled: <BOOLEAN>
        bgpHelperEnabled: <BOOLEAN>
        xmppHelperEnabled: <BOOLEAN>
        restartTime: <TIME_IN_SECONDS>
        llgrRestartTime: <TIME_IN_SECONDS>
        endOfRibTimeout: <TIME_IN_SECONDS>
spec:
  features:
    control:
      gracefulRestart:
        enabled: <BOOLEAN>
        bgpHelperEnabled: <BOOLEAN>
        xmppHelperEnabled: <BOOLEAN>
        restartTime: <TIME_IN_SECONDS>
        llgrRestartTime: <TIME_IN_SECONDS>
        endOfRibTimeout: <TIME_IN_SECONDS>
Graceful restart and long-lived graceful restart settings

Parameter

Default value

Description

enabled

false

Enables or disables graceful restart and long-lived graceful restart features.

bgpHelperEnabled

false

Specifies the time interval, when the Tungsten Fabric control services act as a graceful restart helper to the edge router or any other BGP peer by retaining the routes learned from this peer and advertising them to the rest of the network as applicable.

Note

BGP peer should support and be configured with graceful restart for all of the address families used.

xmppHelperEnabled

false

Specifies the time interval, when the datapath agent should retain the last route path from the Tungsten Fabric Controller when an XMPP-based connection is lost.

restartTime

300

Configures a non-zero restart time in seconds to advertise for graceful restart capability from peers.

llgrRestartTime

300

Specifies the amount of time in seconds the vRouter datapath should keep advertised routes from the Tungsten Fabric control services, when an XMPP connection between the control and vRouter agent services is lost.

Note

When graceful restart and long-lived graceful restart are both configured, the duration of the long-lived graceful restart timer is the sum of both timers.

endOfRibTimeout

300

Specifies the amount of time in seconds a control node waits to remove stale routes from a vRouter agent Routing Information Base (RIB).