BEYOND TECHNICAL SOLUTIONS: UNDERSTANDING THE ROLE OF GOVERNANCE STRUCTURES IN INTERNET ROUTING SECURITY
The research problem
Routing is fundamental to how the Internet works. Routing protocols direct the movement of packets between your computer and any other computers it is communicating with. The Internet’s routing protocol, Border Gateway Protocol (BGP) is known to be susceptible to errors and attacks. These problems can literally knock entire networks off the Internet or divert traffic to an unintended party.
Most research on Internet routing security concentrates on technical solutions (new standards and protocols). But what if the obstacles to improved routing security are not just technical? What if the susceptibility of networks to malicious route hijacks and path manipulations have as much to do with the way organizations implement routing policies and technologies as with the technical standards and protocols per se? What if a new technology designed to “solve” routing security problems creates new, unanticipated implementation and cooperation issues that could undermine many of the theoretical security gains of the better design? Despite the role of socio-economic factors in security, studies of routing security are not adequately supported by social science studies of the actual behavior of network operators.
This project is based on the premise that organizational and institutional factors – known as governance structures in institutional economics – are as important to Internet routing security as technological design. Internet routing involves decentralized decision making among tens of thousands of autonomous network operators. In this environment, an individual operator’s decisions regarding implementation, organization and monitoring of routing policies powerfully affect the adoption and performance of security technologies.
The research will investigate the institutional and organizational arrangements among ISPs that affect routing. The study will also test and extend social science theories about the performance of complex, networked forms of governance.
This research is supported by the U.S. National Science Foundation, Award Number SES-1422629. Begin date: August 15, 2014; end date July 30, 2016.
A social science look at routing security
Routing is fundamental to how the Internet works. Routing protocols direct the movement of packets between your computer and any other computers it is communicating with. The Internet’s routing protocol is known to be susceptible to errors and attacks. These problems can literally knock entire networks off the Internet or divert traffic to an unintended party. While the technical aspects of routing insecurity are well-researched, the socio-economic influences on routing security are poorly understood.
How does routing work?
Routing network traffic from a source to a destination on the Internet requires cooperation between multiple distinct autonomous systems. An autonomous system (AS) is the basic unit of network operations and routing policy. Prefixes are blocks of contiguous IP addresses. E.g., the notation 18.104.22.168 /16 represents all the IP addresses starting with 128.100, a 16 bit prefix. Routing between the tens of thousands of ASes on the Internet is accomplished using the Border Gateway Protocol (BGP). BGP structures the way ASes communicate information about how to reach IP prefixes. BGP messages (sometimes referred to as announcements) include the path chosen by the AS to the prefix, the prefix itself and additional headers that allow for traffic engineering. By announcing a path to a neighboring AS, an AS implicitly agrees to carry traffic for the neighbor along the advertised path.
Each AS running BGP will choose a single path to the destination prefix based on the policies set by the network manager. The choice will be based on the properties of the path, such as its length (the number of ASes it must pass through) and the business relationships the network with the neighboring AS. Routing policy also dictates to which neighbors an AS will announce their chosen path. Networks tend to avoid transiting traffic that does not generate revenue (the valley-free assumption). Because BGP empowers ASes to customize routing policies based on their organizational needs, routing policies vary from one AS to another – largely due to socio-economic concerns.
What are the causes of routing security incidents?
BGP was designed when the Internet was made up of a smaller number of ASes with strong social and institutional incentives to cooperate (e.g., university research networks). With the Internet’s commercialization and global adoption, BGP poses greater risks of routing incidents caused by mistaken configurations or by deliberate attacks. Routing incidents may be divided into two main categories:
- Prefix hijacks. These are incidents in which an IP prefix is announced (originated) by an AS other than the one intended by the AS to which the IP prefix is allocated.
- Path Manipulation. These are incidents in which an AS path is altered by an unauthorized party, either intentionally or unintentionally.
Notable routing incidents have been reported publicly for almost two decades. In 2008, Pakistan Telecom (AS 17557), fulfilling a request of its government to block access to certain videos, inadvertently hijacked prefixes allocated to YouTube (AS 36561), making the popular video site inaccessible worldwide for two hours for most users. More recently, China Telecom (AS 23724) hijacked more than 50,000 prefixes allocated to ASes worldwide for around 18 minutes, including several operated by U.S. government agencies. From these outages and diversions it is clear that routing security not only creates substantial risks of economic loss but also has the potential to create political and military risks.
The imperfect data about routing incidents we already have suggest that routing incidents are not uniformly distributed across ISPs. Certain operators have fewer incidents than others. The explanation for this divergence is not likely to be the use of different technologies, as the new routing security technologies have not been widely implemented. It is more plausible to attribute differences in security performance to socio-economic factors, such as the different policies and practices of network operators.
New security technologies such as RPKI and BGPSEC will also raise implementation and institutional issues that could dramatically affect their performance. In theory, RPKI and BGPSEC are more secure than their predecessors, but implementation in the real world poses problems that could undermine their effectiveness. Many network operators have expressed fears that a hierarchical RPKI rooted in the address allocation authorities will involve relinquishing too much control over their operations. These fears could deter adoption, or lead to complex contractual and legal negotiations regarding certificate revocation policies and practices. Computer science research has also shown that RPKI may enable new kinds of attacks that in worst-case scenarios could have devastating effects.
The Internet is comprised of hundreds of thousands of distinct organizations with varying incentives and operational goals. Routing is a decentralized, cooperative process in which network operators exchange information and use contracts or other kinds of voluntary agreements based on common technical standards to exchange traffic. In Internet routing, institutional and regulatory authority is also decentralized; while there is global connectivity, there are approximately 200 separate national legal jurisdictions and no common, hierarchical global regulatory authority over all the organizations that comprise the Internet and their routing practices. Routing security, therefore, must be achieved through a bottom up process of self-governance. As a result, deploying secure Internet routing is much more challenging than just installing a piece of hardware or performing a software upgrade. It requires understanding the socio-economic factors that influence operators’ cooperative practices and technology implementation decisions. This study will use an innovative combination of institutional economics and network analysis to isolate and understand the governance structures underlying Internet routing, and attempt to determine which governance structures lead to more or less routing security incidents.
Our research method requires quantitative measures of routing incidents over time (the dependent variable) and a set of independent variables that reflect variations in the governance structures among ASes. Below we describe how we will define these variables and gather data for them.
Dependent variable: a longitudinal view of routing incidents
Numerous systems and heuristics have been proposed to monitor Internet routing and specifically identify incidents. At a high level, routing incident systems work by identifying differences between an AS’s intended and observed BGP announcements collected from numerous route announcement monitoring projects (e.g., BGPmon, RouteViews, RIPE-RIS, Abilene, and Packet Clearing House) and other observation points on the Internet. These systems continue to evolve to address shortcomings such as limited views of the Internet, and over- or under-estimation of routing incidents.
For example, the Cyclops system operated by UCLA’s Internet Research Lab generates “anomalies” data on a daily basis, including transient announcement of prefixes, unexpected removal of prefix announcements, and announcement of bogus or incorrectly configured prefixes. Most recently, researchers at Tsinghua University and Tsinghua National Laboratory for Information Science and Technology have developed the Argus system. Recognized by the research community for its contributions, Argus addresses weaknesses observed in previous systems, including distinguishing between types of prefix hijacks and eliminating false negatives. Given the absence of unified criteria for quantifying the occurrence of incidents, and of any comprehensive study of the prevalence and impact of routing incidents over time, we intend to collect and archive routing incident data from the Argus system with the AS as the unit of analysis. Our project will be designed to allow us to monitor ongoing research in this area and adapt or update our measures of incidents accordingly.
Independent variables: macro, meso and micro
As the diagram of our analytical model shows, data will be collected and analyzed at the macro, meso and micro levels. The first step is to detect variations in the organizational structures among ASes at the meso level. We expect to find that distinct configurations at the meso level will vary in their susceptibility to routing incidents. Using these observations as a guide for further inquiry, we will identify institutional arrangements at the macro level and use interviews to explore the different factors shaping the incentives and decisions of operators at the micro level.
Meso-level research will involve analysis of the Internet’s routing topology using graph data generated by computer science researchers. For instance, data generated by the Center for Applied Internet Data Analysis (CAIDA) at the University of San Diego provides longitudinal snapshots of AS routing relationships, including whether AS pairs have peer, provider or customer relationships. Using this information we will measure the structural embeddedness of an AS within the routing topology graph. In the context of routing, structural embeddedness (SE) refers to the extent to which the mutual routing partners of interconnected ASes are also connected to one another. The role and importance of structural embeddedness within social, economic and governance networks is well known. In other policy domains, theory suggests that SE is a precondition for effective network governance, with higher levels of SE helping to ensure the spread of information and permitting actors to scrutinize one another effectively, thereby invoking trust and minimizing uncertainty. Our method will allow us to explore how embeddedness of specific ASes within the graph may evolve over time, highlighting different types of networked governance arrangements and different organizational policies and incentives.
This aspect of the research will track the evolution of RPKI and IRR implementation, the diffusion of RPKI use by ISPs, and IRR usage. It will involve (1) monitoring the deployment of RPKI by the Regional Internet Registries and (2) understanding the usage of IRRs by ISPs, including analyzing the organizational, contractual or policy elements of the RPKI and IRR systems and the differences between implementations. This research will allow us to find out whether macro-level institutional structures such as RPKI or IRR use is associated with a reduction in incidents among the operators who use them. Furthermore, it will also allow us to determine whether particular institutional arrangements governing RPKI or IRRs facilitate or impede their widespread adoption, and adoption of related technologies like BGPSEC.
Micro-level research will involve both qualitative interviews with operators and the collection of qualitative data about their incentives to adopt and use IRRs and RPKI. These interviews will elicit their policies and views concerning RPKI diffusion, IRR use and any other methods of protecting routing security. They will also provide a reality check on explaining some of the patterns observed in meso-level analysis.
To summarize, meso-level analyses will allow us to correlate levels of structural embeddedness in the Internet’s routing topology with the number of incidents. Both types of data (SE and incidents) will be available longitudinally, allowing us to assess trends. Corresponding macro- and micro-level analysis will supplement this more general form of analysis, providing a reality check and supplying concrete cases and examples of institutional arrangements and operator incentives and practices. In doing so, we will be able to identify distinct governance structures which can be correlated with routing incident data, to compare their routing security performance. The research will allow us to test at least two propositions in quantitative terms:
Proposition 1. Distinct levels of structural embeddedness among ASes are correlated with variation in the number and severity of routing incidents.
Proposition 2. ASes with low rates of routing incidents become more embedded over time.
Proposition 3. The diffusion of RPKI reduces the number of routing incidents.
The method will also allow us to explore a number of other questions of interest, such as:
- Are ASes with similar levels of embeddedness more likely to develop routing relationships?
- Are ASes with relatively lower/higher numbers of routing incidents implementing particular systems (i.e., RPKI, IRR, etc.)?
- Are there correlations between SE and the type or impact of routing incidents?
This page will host research working papers and publications developed by the project team. Keep an eye on the IGP blog for developments, research outcomes and publications.
Conference and Other Academic Presentations
Presentation at Ivan Allen College ICT Policy Speaker Series (slides)
Poster Beyond technical solutions: Understanding the role of governance structures in Internet routing security presented at 2015 Research Conference on Communications, Information and Internet Policy in Washington DC. (poster)
Paper Beyond technical solutions: Understanding the role of governance structures in Internet routing security presented at the 10th Annual Giganet Symposium held in conjunction with 2015 Internet Governance Forum in João Pessoa, Brazil. (paper) (video)