Use BGP Timers and BFD to Speed Up BGP Convergence

Out-of-the-box EBGP is notoriously slow to converge – it can take up to three minutes to detect a failed EBGP neighbor. It’s possible to tweak BGP timers to detect a failed neighbor in a few seconds. Still, it’s much better to combine BGP with Bidirectional Forwarding Detection (BFD) – a lightweight protocol that can detect a link- or node failure in milliseconds.

In this lab, you’ll use both mechanisms:

  • You’ll tweak the BGP timers to detect a link failure within 10 seconds.
  • You’ll enable BFD on an EBGP neighbor to reduce the failure detection time to approximately a second.

Lab topology

The routers in your lab use the following BGP AS numbers. Your routers advertise an IPv4 prefix each; X1 does not advertise BGP prefixes.

Node/ASN Router ID Advertised prefixes
AS65000
r1 10.0.0.1 192.168.42.0/24
AS65001
r2 10.0.0.2 192.168.43.0/24
AS65100
x1 10.0.0.10

Device Requirements

Start the Lab

You can start the lab on your own lab infrastructure or in GitHub Codespaces (more details):

  • Change directory to basic/7-bfd
  • Execute netlab up
  • Log into your devices (R1,R2) with netlab connect node and verify their configuration.

If you’re using netlab, you’ll get a fully configured lab, including BGP prefix origination on R1 and R2 and EBGP sessions between R1/R2 and X1. If you’re using another lab platform, you’ll have to do a fair amount of prep work1.

Checking the Convergence Time

  • Log into R2 and enable debugging/logging of BGP updates (example: Cisco IOS) or monitoring of the IP routing table (example: Arista EOS event monitor). If your platform does not have similar functionality, you’ll have to inspect the BGP table on R2 every few seconds.
  • Log into R1 and remove the IP address from the R1-X1 link.

Note

It would be easier to shut down the R1-X1 link, but that trick doesn’t work with devices like Arista EOS that tear down the BGP session before shutting down the link.

I used Arista EOS event monitor on R2, added an IP address on the R1 Ethernet1 interface, and removed it as soon as the BGP session was established. It took BGP almost exactly three minutes to detect the failed EBGP session between X1 and R1:

r2#sh event-monitor route match-ip 192.168.42.0/24
2023-09-16 16:02:25.953617|192.168.42.0/24|default|ebgp|0|200|updated|24
2023-09-16 16:05:27.071501|192.168.42.0/24|default|ebgp|0|0|removed|25

Reduce the BGP Timers

You can reduce BGP session timers to improve BGP convergence:

  • On the R1-X1 EBGP session, set the keepalive timer to three seconds and the hold timer (timeout) to nine seconds.
  • Clear the EBGP session if needed2 – the BGP timers are negotiated during the BGP session establishment phase.

Tip

FRRouting default BGP timers depend on its profile. The default profile (datacenter) already uses low BGP timers. netlab changes the FRRouting profile to traditional to return those timers to the default 60/180 seconds.

Verification:

  • Verify that you reduced the BGP timers with a command similar to show ip bgp neighbor detail.
  • Repeat the BGP convergence measurements – X1 should revoke the BGP prefix advertised by R1 within nine seconds.

Configure BFD

While some BGP implementations allow you to use very small BGP timers (for example, a one-second keepalive timer), you should use something other than that approach if you care about BGP convergence speed. It’s much better to combine BGP with BFD:

  • Configure BFD on the EBGP neighbor session on R1
  • Clear the BGP session if needed

Warning

Similarly to what you had to do to get BGP up and running, you have to modify the /etc/frr/daemons file and restart FRRouting to start the BFD daemon (more details). netlab starts the BFD daemon in FRRouting nodes when starting the lab, but you should still check whether it’s running – use this procedure, but check for bfdd.

Verification:

  • Verify that you have a working BFD session between R1 and X1. Most implementations display the BFD status of a BGP neighbor somewhere within the show ip bgp neighbor details (or similar) command. Some implementations have BFD-specific show commands, such as show bfd neighbors or show bfd peers.
  • Repeat the BGP convergence measurements – X1 should revoke the BGP prefix advertised by R1 almost immediately.

Automated Verification

You can use the netlab validate command if you use FRRouting on X1. The validation tests check the BGP timers on the R1-X1 EBGP session and the state of the R1-X1 BFD session.

This is the printout you should get after completing the lab exercise:

Reference Information

This lab uses a subset of the 4-router lab topology. The following information might help you if you plan to build custom lab infrastructure:

Lab Wiring

Origin Device Origin Port Destination Device Destination Port
r1 Ethernet1 x1 eth1
r2 Ethernet1 x1 eth3

Lab Addressing

Node/Interface IPv4 Address IPv6 Address Description
r1 10.0.0.1/32 Loopback
Ethernet1 10.1.0.1/30 r1 -> x1
r2 10.0.0.2/32 Loopback
Ethernet1 10.1.0.5/30 r2 -> x1
x1 10.0.0.10/32 Loopback
eth1 10.1.0.2/30 x1 -> r1
eth3 10.1.0.6/30 x1 -> r2

  1. I did tell you to use netlab, didn’t I? 

  2. Some BGP implementations tear down BGP sessions when you change the BGP timers.