The routing of Internet packets is one of the most important Internet governance issues you have probably never heard of. Yet Internet routing security made the popular press this summer. Two events in particular were noteworthy:
- Swiss-based operator Safe Host improperly updated its routers and advertised BGP routes to its peering partner(s), including China Telecom, who accepted and leaked them. As a result, a lot of third party traffic was routed through China Telecom’s network. Despite causing the incident, Safe Host seemed to escape the press firestorm. But some leveled incorrect and jingoistic accusations of BGP “hijacking” at China Telecom, which has a specific meaning in Internet routing circles. A hijack occurs when an adversary manipulates the BGP protocol by announcing a victim’s address prefix, thus rerouting traffic.
- DQE Communications, a fiber network company that is a subsidiary of Pittsburgh-based Duquesne Light Holdings, advertised BGP routes to its customer(s), including US-based operator Allegheny Technologies Inc. Allegheny accepted these routes and leaked them to a second transit provider, Verizon, who in turn accepted and propagated them. Together, these mistakes caused large amounts of traffic to be improperly routed through Allegheny’s network. In this event, both Allegheny and Verizon received the ire of operators, who angrily accused them of sloppy route filtering.
Together, these events highlight two issues: 1) route leaks, which affect network operators and can degrade Internet connectivity; 2) the incentives of operators to pursue solutions to route leaks.
What exactly is a route leak?
A route leak is the propagation of a valid BGP routing announcement “beyond its intended scope”. As the IETF explains:
“The intended scope is usually defined by a set of local redistribution/filtering policies distributed among the ASes involved. Often, these intended policies are defined in terms of the pair-wise peering business relationship between ASes (e.g., customer, transit provider, peer).”
That is, a route leak is not just a technical issue. While not universally applied, typically operators announce BGP routes such that:
- Routes learned from Customers can be announced to other Customers, Peers, and Transit Providers
- Routes learned from Peers and Transit Providers should only be announced to Customers
Preventing route leaks requires network operators to identify potentially troublesome route announcements and maintain filtering policies in their routers to uphold these policies.
Applying the above to the events mentioned previously, China Telecom and Allegheny Technologies could have had route-filtering policies in place to prevent their routers from announcing routes they learned from a peer and provider, respectively. Verizon didn’t appear to violate the above rules, but it apparently wasn’t implementing another best practice, namely, setting an upper limit on the number of routes accepted from a customer. If that policy had been implemented, it could have limited the damage done by propagating route announcements beyond their intended scope. But if the rules and best practices to prevent route leaks seem straightforward, why aren’t filtering policies like the above being implemented? We’ll get to that, but let’s first look at the prevalence of route leaks.
How many route leaks?
Several systems attempt to capture evidence of route leaks (with the normal caveats about visibility into a global network of networks, etc.). This includes bgpstream.com which is operated by OpenDNS (part of Cisco Technologies). Its website publishes data on route leaks, possible hijacks and service outages. Not much is publicly known about their methodology but the data collected is apparently used by ISOC’s newly launched MANRS Observatory. Figure 1 below shows data from the website for 2019 so far. It’s just eight months data, but it gives a sense of the relative importance of different types of problems. Outages, where no perpetrator is identified, far and away make up the majority of observations in every month and overall (4066, 60.19%). Route leaks constitute the second most observations overall (1650, 24.43%), while possible hijacks make up the fewest number overall (1039, 15.38%). A couple more observations: There seems to be major discrepancies between the route leak numbers reported by bgpstream.com and the “incident” numbers published by the MANRS Observatory, with monthly differences in the latter ranging anywhere from 139% higher to 20% lower. Lower we would expect as MANRS consolidates one or more route leaks into “incidents”, but higher? One explanation could be that bgptream.com doesn’t make all of its observations public, but is giving access to the MANRS project and allowing them to share aggregate-level data publicly and operator-level data privately. If this is the case, it’s a clever way to gain collective benefits from private route monitoring, and a good use of ISOC resources. But more transparency is needed in both organizations’ methodologies to understand for sure what is occurring.
Who is leaking routes?
As noted above, China Telecom (CT) has been portrayed frequently in the Western press as a culprit when it comes to routing incidents. But is it a deserved reputation? Table 1 shows the top 10 Leaker ASes in the data described above, ranked by total leaks. China Telecom tops the list, and has two ASes (AS 4809 and 4134) in the top 10. Although large route leak events (in terms of number of prefixes affected) like the ones involving Allegheny and China Telecom are troubling and make news, they appear to be unusual. E.g., Allegheny only leaked routes in one month, and China Telecom had large numbers of leaks in two months but otherwise has had low amounts of route leaks per month. To the contrary, we see other operators that experience relatively high numbers of route leaks consistently month to month. E.g., two Bangladesh-based operators (AS 58601 and 17494), US-based operator CenturyLink, and a Myanmar-based operator (AS 132167) have the highest median numbers of route leaks month to month. Yet we rarely (ever?) see stories about the routing security problems these less politically charged actors have. Arguably, the differences in the occurrence of route leaks between operators matter. In the former situation, leaks are likely associated with a single event caused by a specific factor(s), meaning the underlying cause can potentially be addressed relatively simply. Where more leaks are spread more evenly across time, there are multiple events. The likelihood increases that there are multiple contributing factors and more complicated solution(s) may be required.
This helps explain why fixing the route leak and other routing security problems is more difficult than one might think. As we’ve known for some time about networked forms of governance like routing security:
“cybersecurity at the individual and system levels is influenced by how the incentives of different actors align. Sometimes individual and group incentives are compatible with both the private and social costs and benefits so that decentralized decisions will be workable and effective to achieve desirable levels of security.” (Asghari, van Eeten and Bauer, 2012)
At one level fixing routing security is a classic collective action problem, requiring more than one operator to implement best practices like those MANRS promotes in order to improve overall routing security. For example, the collectively run IRR system is plagued by data governance problems caused by misaligned incentives, high transaction costs, and unmanageable interdependencies. But any solution to the problem is also influenced by individual operator incentives. The complexity and costs of implementing those best practices can vary dramatically depending on a variety of factors (e.g., number of actors and routes, frequency of announcements, etc.). Applying Asghari et al’s thinking, rational operators will make routing security investment decisions based on several relevant factors, including the risks they are facing, the financial and non-financial consequences of routing security events, the resilience of their networks, etc.
Given this, how can we improve the situation? The primary responsibility for routing security lies with the private network operators who rely on and operate the shared, transnational infrastructure that makes up the global Internet. Aggregated reporting of routing security incidents like MANRs provides only helps analysis so much. Researchers need to know which operators are experiencing problems in order to analyze the institutional, economic and other factors affecting why and how they implement solutions. Second, research and press stories driven by geopolitical conflict and national security concerns that equate transnational operators with governments (ala China Telecom) and treat them as adversaries are doing the global Internet a disservice. Getting details correct and substantiating claims with evidence matters. Cybersecurity can be a positive sum game, where routing security gains in the overall infrastructure make everyone better off. Acknowledging the bigger picture will help advance that objective.