Blog Post

Building a Comprehensive BGP Alerting Strategy

Published

January 21, 2020

mins read

Craig Lowell

in this blog post

Ensuring the health of your BGP routes is one of the most important parts of a robust network monitoring strategy. When any of your IP prefixes are unreachable for end users, you need to know as soon as possible with direct, actionable data that pinpoint the exact nature of the problem. Without this kind of incisive real-time BGP alerting, troubleshooting end-user experience issues becomes much harder, leading to increased mean time to detection (MTTD) and mean time to resolution (MTTR).

Setting up a comprehensive BGP monitoring initiative involves inputting all of the public-facing IP prefixes that belong to your organization, and tying alerts to each one that cover all of the potential route issues that could arise to disrupt your end-user experience. Once alerted to a problem, you must then be able to drill down into the BGP data to pinpoint the exact nature and location of the issue so that you can start the remediation process.

All of this is available with Catchpoint Network Insights using real-time BGP data, and can be done directly in the Catchpoint platform; there’s no need to switch windows or input the data into another tool to conduct the analysis.

For the purposes of this blog, we’re going to focus on the alerting aspect of your BGP monitoring strategy. But before we do, let’s cover the different types of security, availability, and performance issues that could arise with your BGP routes.

Prefix Hijacks are one of the most destructive forms of BGP security issues, as they involve stealing your traffic and directing users to a different, unauthorized destination. Since preferred BGP routes are selected based on whichever router has the most specific (i.e. longest) IP prefix match, hijacks can be perpetrated by malicious actors originating a prefix that has not been allocated to it, but which nevertheless creates a more specific path and thus becomes the preferred route.

Hijacks can take the form of blackholing, which creates a denial of service (DoS) for end users, impersonation of the intended destination’s site or application, or traffic sniffing, wherein the malicious actor stores sensitive information of the stolen packets before sending them on to the intended destination.

Route Leaks are similar to hijacks in that the traffic gets routed via paths that violate the BGP routing policies established among the AS’s, which is likely to cause performance disruption and packet loss. Unlike hijacks, however, they typically occur due to simple human error such as a typo in the origin prefix or a misconfigured AS router (in this way, they can end up having the same end result as a blackhole hijack).

Route Flaps occur when an AS announces a route an excessive number of times, meaning that the same prefix is being announced multiple times with some attribute changing. Th could lead to the preferred path switching from one to another, thereby causing availability and performance problems for the end users.

To detect issues like these, it’s important to get your BGP data in real-time. Many BGP monitoring tools completely rely on data that’s only collected every 15 minutes, which means even the fastest alerting and analysis strategies will be delayed. There are five different types of alerts available in the Catchpoint platform that can be set up for each BGP test and which are powered by real-time BGP data:

Availability & Downtime

Downtime within BGP occurs when there is no established path to the destination prefix being monitored, making availability one of the most (if not the most) important metrics to be alerted about. When creating a BGP test within Catchpoint, you can set alerts if the availability for the destination prefix drops below a certain percentage over a set period of time; e.g. send an alert if availability drops below 90% over a five-minute period.

However, the destination prefix is not the only prefix that needs to be monitored, because it will also be unavailable if the AS’s that it peers with go down. Therefore, Catchpoint enables alerting for peer availability as well so that you can be notified if a certain percentage of them are failing over a set time period.

Origin AS

A route hijack will occur when someone else announces a prefix that is controlled by your organization and becomes the preferred path, and a route leak will occur if an incorrect change occurs to the prefix in your own autonomous system. Therefore, it’s necessary to set alerts for when a change occurs so that the origin AS is no longer an exact match of the one controlled by your organization.

Origin Neighbor

Similar to the Origin AS alert, you can set alerts for when changes occur to the prefixes of the AS’s that your AS peers with, as this can also be the cause of a route leak or traffic sniffing hijack. It’s important to note that this alert requires you to know the IP prefixes of all your AS peers.

Prefix Mismatch

One of the ways that traffic can be steered away from your destination prefix is by another AS announcing a longer (i.e. more specific) prefix. The prefix mismatch alert is a way to guard against that, as it notifies you whenever a prefix is being returned that’s not an exact match for the one that’s trusted by your organization.

Path AS

Provided that you know at least part of the preferred path that users must take to reach your destination prefix, you can set alerts for when any changes occur along that preferred path. The Path AS alert can work as a comprehensive warning for both Origin AS and Origin Neighbor changes, thereby allowing you to receive just one alert rather than two.

For example, if you know that part of the preferred path must include three specific ASNs, you can set an alert for whenever a change occurs to that group.