Introduction
Many activities essential to modern life depend on the internet, and the internet depends on the Border Gateway Protocol (BGP). That means BGP performance can have an impact on a global scale. If BGP performance significantly degrades or there is an issue with the routing protocol, economies around the globe can lose massive amounts of productivity and dollars.
As a result, BGP monitoring is a vital aspect of modern networking. In simple terms, BGP monitoring is the process of monitoring BGP operations in real time to detect faults and performance issues. When a problem is detected, administrators are notified so they can perform remediation, or in some cases, systems can be configured to automatically self-heal or reroute traffic.
As a routing protocol, BGP is subject to a variety of potential problems. Performance issues, malfunctions, and outages can result from misconfigurations, equipment failure, accidents, or even malicious attacks. Further, because BGP enables connectivity between ISPs, troubleshooting and root cause analysis can become complex.
For this reason, effective BGP monitoring and observability is an essential tool for modern ISPs. With the right approach to BGP monitoring, ISPs can continuously monitor BGP performance to ensure optimal routing and detect and mitigate issues such as:
- Route leaks- Based on RFC7908, a route leak is formally defined as the “propagation of a BGP announcement(s) beyond their intended scope.” The scope is defined by BGP import and export policies that ASes use to regulate the set of routes exchanged over a BGP session.
- Route hijacking- Route hijacking occurs when an AS claims to be the origin of a route that belongs to another AS.
There are a couple of ways to approach BGP monitoring and observability.
{{banner-14="/design/banners"}}
The classic approach uses route collectors. These are nothing more than simple servers that mimic the role of border routers and which establish sessions with BGP routers found in various organizations. Unlike active BGP routers, route collectors will typically only collect incoming BGP messages without generating any messages or routing traffic themselves, with the exception of the sending of occasional beacons to study BGP convergence times. Thus, they are able to receive in real-time, the best routes chosen by the BGP decision process of the connected devices, and the data collected allows the analysis of the routing characteristics of the connected AS. Data is stored in Multi-threaded Routing Toolkit (MRT) export format - defined in RFC6396 - and such files are publicly shared on the websites of participating projects.
The most recent development involves the BGP Monitoring Protocol (BMP), which is defined in RFC 7854 and helps standardize BGP monitoring. At a high level, monitoring with BMP works by monitoring nodes using the protocol to obtain detailed views of BGP sessions. However, the overall BGP monitoring process is much more nuanced. This series of articles is intended to help you better understand this fundamental aspect of modern routing, which is why we’re going to kick off by taking a deep dive into the topic of BGP monitoring and observability.
BGP monitoring can consist of any one of these methodologies, or any combination of them, delivering a more enhanced picture of the current state of BGP. In this piece, we’ll set up the articles to come by exploring the key capabilities of BGP monitoring.
{{banner-15="/design/banners"}}
Executive summary
BGP monitoring systems leverage the above mechanisms to collect data from various sources. The collected data is processed, analyzed, visualized, and presented to administrators in the form of reports, graphs, and dashboards, allowing them to evaluate the operation of BGP in their own autonomous system (AS) as well as in neighboring ASes.
A typical BGP monitoring system should have the following capabilities:
Capabilities of BGP monitoring
Before MRT and BMP, BGP monitoring was achieved in various different ways, including screen scraping, and using network utilities such as traceroute. These methods, however, were best effort attempts to monitor with tools that were not designed for BGP.
Traceroute can be used for ad hoc queries and has recently been leveraged for automation, but still has limitations. Screen-scraping on the other hand is an inelegant method that captures information from the output of commands executed at a device’s command line interface (CLI). In order for it to function, automation systems must be customized to each router vendor's specific output format, which can change between router OS versions.
New approaches to BGP monitoring are superior tolegacy approaches as they are purpose-built for continuous real-time monitoring.
The key innovation that makes these approaches so powerful is the use of route collectors. Route collectors are devices that establish BGP sessions with the ASes they monitor. These ASes share routing information with the route collectors just like any other BGP peering. The only difference is that the route collectors do not forward user traffic, nor do they share any routes with BGP routers in the cooperating ASes. They are only observers.
By directly capturing routing data, collectors help enable BGP monitoring systems to perform the specific functions discussed in the next sections.
BGP route monitoring
BGP route monitoring involves actively monitoring BGP prefix advertisements from participating ASes. Specifically, the advertised prefixes are monitored to detect any deviation from the expected routing behavior.
The goal of route monitoring is to ensure prefixes are reachable from as many sources as possible, and the paths used to reach those prefixes are correct. Aspects of BGP that can and should be monitored via route monitoring are:
- Availability and downtime - BGP availability tests whether a path to a particular prefix exists, where downtime is the amount of time where there is no path.
- Withdrawn and restored routes - A record can be kept of routes that have been withdrawn and restored, and when these took place.
- Route flaps - This occurs when a BGP route disappears and reappears continually in the routing table. This may be a result of misconfiguration, or may simply be due to an unstable BGP peering session. Route flaps can be extremely detrimental to efficient routing and traffic forwarding and must be detected and mitigated as quicklyas possible.
{{banner-7="/design/banners"}}
BGP problem detection
Both misconfigurations and malicious attacks can cause BGP problems. Some of the vital attributes that must be monitored to detect such events include:
- Origin AS - To detect route leaks or hijacks, we must ensure that no other AS is advertising prefixes that belong to the local AS.
- Origin neighbor - Monitor any changes in the ASNs that are being advertised. If any ASN changes unexpectedly or according topreset rules, this can trigger an alert or action.
- Prefix mismatch - The announcement of a more specific prefix may be an attempt to steer your traffic to a different destination, or, more commonly, may be a result of a configuration error.
- AS path - A deviation from the expected AS path for any particular prefix may be an indication of an attack or a misconfiguration.
{{banner-7="/design/banners"}}
Monitoring data sources
Publicly available BGP information sources can supplement and improve BGP monitoring efforts. Two useful public sources of BGP monitoring data are:
- Routing Information Service (RIS)- This is a service that provides BGP information from hundreds of active BGP peers on the internet. It is provided by the regional internet registry for Europe, the Middle East, and parts of Central Asia (RIPE NCC).
- RouteViews - This is a project that has been put together by the University of Oregon that includes dozens of peering sites that share their full BGP routing tables.
Two fundamental aspects that will affect the overall effectiveness of a BGP monitoring implementation are:
- The location and the number of data collection sources.
- The location of the data collectors.
The more diverse and distributed the data sources are, the more useful and valuable the collected data will be. BGP monitoring must rely on multiple data sources.Otherwise, there will be large visibility gaps resulting in “blindness” to significant portions of the internet.
The main objective of monitoring network architecture is to place data collectors as close as possible to the monitored ASes while using the fewest collectors possible, with a minimum of two for redundancy. Collectors should be added sparingly since they add more BGP sessions, which increases CPU and memory consumption on BGP routers.
{{banner-sre="/design/banners"}}
Monitoring local data sources
Leveraging the information that is generated by your local AS is vital for achieving successful BGP monitoring of your prefixes. An excellent supplemental technology to add security to your BGP routing is Route Origin Authorization (ROA) with the Resource Public Key Infrastructure (RPKI) repository of each regional internet registry (RIR).
ROA is a cryptographically signed object that indicates the AS which is authorized to originate a particular prefix. Along with the RPKI, these technologies introduce mechanisms that help mitigate hijacking and malicious attacks on your internet routing.