Monday, December 27, 2004

BGP: Control route flaps using damping

Again, another BGP hints...

As a result of link failures and restorations, router reloads, and other events, repeated route withdrawals and re-announcements may occur. This instability, often referred to as flapping, imposes a processing burden on BGP routers, as they must process the flaps by repeatedly updating the route table and propagating the changes to their peers.

RFC 2439 describes a solution, called route flap damping, or sometimes also called dampening. The algorithm described in this RFC is based on assigning a penalty to each route flap. When the penalty exceeds a configured limit, the prefix will be suppressed. Further withdrawals and re-announcements of the prefix will not be accepted, nor propagated to peers. The penalty value will decay over time, so that eventually the prefix will be accepted again.
As a result, a few flaps in a short time, or multiple flaps over a longer period, will not cause a prefix to be suppressed, but multiple flaps in a short time will cause a prefix to be temporarily suppressed. The more unstable a prefix is, the longer it will be suppressed.
The RIPE Routing workgroup has published recommendations for setting appropriate configuration parameters for route flap damping. The document recommends to start damping after 4 consecutive flaps in a row.
The proposed decay values are dependent on prefix length. For short prefixes (/21 and shorter), the maximum time a prefix is suppressed is 30 minutes, for /22 and /23, it is 45 minutes, whereas /24 and longer prefixes can be suppressed for 60 minutes. In addition, several prefixes, such as the DNS root servers, should never be suppressed. These are called golden networks in the document.
The golden networks web page also shows example configuration fragments for Cisco and Juniper routers based on the parameters recommended by the RIPE routing work group. The open source Zebra routing suite cannot be configured to do prefix length based damping. If you use Zebra, you can only configure a single damping policy.
However, not everyone is convinced that route flap damping is actually beneficial to global BGP stability. In a presentation given at the October 2002 NANOG meeting, Randy Bush, Tim Griffin and Zhuoqing Morley Mao show that even a single withdrawal/re-announcement can be observed as multiple flaps across the internet.
As a result, even minor instabilities may lead to prefixes being suppressed. Since it is hard to see whether your prefix is being suppressed by another party, these situations may be hard to debug.


No comments: