PathRadar

Introduction

Currently, over 80,000 autonomous systems (ASes) on the Internet utilize the Border Gateway Protocol (BGP) to route packets between source and destination ASes. Accurately inferring fine-grained AS routing paths, such as prefix-specific paths and multiple paths between source-destination AS pairs, is critical for the operation and optimization of the Internet. The valley-free principle is commonly employed to characterize these paths, where an AS path typically consists of zero or more customer-to-provider (c2p) links, followed by zero or one peer-to-peer (p2p) link, and concludes with zero or more provider-to-customer (p2c) links.

Existing approaches attempt to infer AS paths by leveraging AS policies or stitching together observed AS paths to form new inferred paths. However, these methods face challenges in terms of accuracy and completeness. Specifically, overly simplistic generalizations of AS policies are inadequate for capturing complex AS routing behaviors, while limited and biased observations hinder the comprehensive reconstruction of large-scale Internet paths. Furthermore, existing methods neglect the inference of valley-free paths, which constitute approximately 10% of all paths.

To address these limitations, we propose PathRadar, a framework designed to infer fine-grained AS paths. PathRadar categorizes ASes into core ASes and shell ASes based on policy complexity and observation adequacy, enabling policy-based and stitching-based methods to leverage their respective strengths. Additionally, it introduces a progressive learning process (PLP) to enhance inference accuracy and completeness. As a result, PathRadar improves the accuracy and completeness of AS-level path inference by 1.5–6.7 times, boosts prefix-specific path inference accuracy by up to 8.9 times, and achieves ~90% accuracy and completeness in inferring non-valley-free paths.

Challenges and Basic Ideas

#(AS pairs) with different #(optimal paths)

#opt paths	#s-d AS pairs	#opt paths	#s-d AS pairs
1	15,743,908 (56.3%)	6	426,218
2	6,096,392	7	269,305
3	2,260,803	8	186,557
4	1,111,899	9	125,795
5	623,551	≥ 10	1,103,783

Major patterns of non valley-free paths

patterns of AS paths	# AS paths	Type
(c2p)* (p2p) (p2p)	1,340,436 (31.2%)	1
(c2p)* (p2p)? (p2c)+ (p2p)	979,192 (22.8%)	1
(c2p)* (p2p) (p2p) (p2c)+	780,424 (18.2%)	2
(c2p)* (p2p) (p2p) (p2p)	177,946 (4.1%)	1
(c2p)* (p2p)? (p2c)+ (p2p) (p2p)	169,302 (3.9%)	1
others	850,227 (19.8%)	-

Complex AS relationships. The p2c and p2p relationships are only the simplest relationships between neighboring ASes, and more complex ones are widely used in practice. Additionally, since ASes often establish their connections in private, it is not easy to deduce the actual types of relationships they use, even for the simplest ones. Therefore, path inference algorithms based solely on error-prone p2p and p2c relationships will inevitably face high error probabilities.
Prefix-specific routing policies. AS relationship is not the only factor that affects how ASes configure their routing policies. An AS sometimes needs to implement prefix-specific policies , for example, when a traffic engineering optimizer decides to shift the traffic destined to one particular prefix through a longer but less loaded path, rather than a cheaper one. As a result, an AS may choose different paths for different prefixes announced by the same destination AS. Therefore, multiple optimal paths often exist between the same source-destination (denoted as s-d) AS pairs (for different prefixes or even the same prefix).
Non valley-free paths. We also find a non-negligible fraction of non valley-free paths in the Internet (~10%). Existing path inference algorithms, either policy-based or stitching-based, rely on valley-freeness to improve accuracy and avoid state explosion, with a complete loss of these paths.
Limited and biased observations. Public available BGP routes and traceroute records provided by various measurement platforms can be viewed as our observations on the Internet paths, whose limitation and bias have been recognized by many studies, not only because the number of VPs we have is much smaller than the scale of the Internet, but also because they are more concentrated in the upper tier of the Internet or in some large ASes. Thus, we have very limited knowledge on the routing behavior of a large number of low-tier ASes. Among all ASes we can observe, more than half are ASes with multiple providers or peer paths, but the VPs cannot observe any path starting from them. It is also known that, many p2p links between low-tier ASes are hidden from our observations, as are the AS paths containing them.
The extraordinary scale of the Internet. There are around 80,000 ASes in the Internet, and the number of our observed AS paths goes well beyond tens of millions. This brings the combinatorial state explosion problem: enumerating all possible paths between an s-d AS pair, derived either from AS routing policies or from observed path segments, becomes an impossible task within limited time and space.

To address these challenges, PathRadar categorizes ASes into core and shell groups, effectively balancing the complex AS policies and sparse observations. Additionally, it introduces PLP to efficiently generate negative samples, mitigating the impact of large-scale Internet on inference. Finally, PathRadar incorporates the analysis of non valley-free paths, facilitating the inference of two primary types of such paths.

Algorithm Design of PathRadar

division of core ASes and shell ASes

overall framework of PathRadar

PathRadar works in four steps as follows.

We divide ASes into two categories, i.e., core ASes whose routing behaviors are complex but well observed, and shell ASes whose behaviors are less observed but tend to be simpler. Through the study of AS degree (corresponding to the complexity of AS routing policies) and the number of stitching points (corresponding to the difficulty of stitching), we determined that this type of division can cover almost all ASes in the Internet.. With this division, we can adjust our preference to policy or stitching when dealing with different ASes, thus making the best use of both.
We generate routing paths towards an AS with an algorithm that starts from this AS and propagates across the AS graph, akin to BGP message propagation. We intentionally allow multiple paths between an s-d AS pair, akin to different prefixes announced by the same AS propagating different paths. This step is realized in a progressive learning process (PLP). PLP has two levels of iterations. In each iteration of the inner loop, a standard path generator first propagates a path of length N (i.e., N-1 hops away from an origin AS) to multiple next hops to generate paths of length N+1. (Noting this step may cause an explosion if we keep all generated paths.) Then these paths are fed into two different discriminators according to the AS classification results in step 1. With a configurable threshold t, the discriminators only accept some of the generated paths, which will be further propagated in the next iteration. Especially, for those accepted but unobserved paths, a small part of them has a higher score. They may actually exist in the Internet but not observed by us, so they are neither treated as positive nor negative samples to avoid the misclassification of discriminators. For other accepted but unobserved paths, they will be added to the training sets of the two discriminators as negative samples. At the end of the inner loop, the discriminators will be trained again using the updated training sets. Then we repeat this loop for several rounds (i.e., outer level iterations), where the two discriminators are updated once in each round. The accepted paths in the last round are output as our inference results. With PLP, we can infer paths between any s-d AS pair.
To further infer prefix-specific paths for different prefixes originated by the same AS, for each prefix, a few observed prefix-specific path segments are fed into our generator, and one best path from each source AS to this prefix will be chosen by the discriminators.
At last, based on our analysis of non valley-free paths observed in the Internet, we use stitching method to further infer two special kinds of such paths.

Publications

Which way to go? Inferring Fine-grained AS paths with PathRadar

Zitong Jin, Xingang Shi, Qiang Ma, Letong Sun, Zhiliang Wang, Xia Yin, Jianping Wu from Zhongguancun Laborarty and Tsinghua University

PathRadar employs policy-based and stitching-based methods to infer AS paths with distinct observational features, and utilizes PLP to enhance inference precision. Consequently, it achieves comprehensive and accurate inference of AS-level paths, prefix-specific paths, and non-valley-free paths. You can learn more about PathRadar in INFOCOM 2025.

[PDF]

Code

We release the following code on Github:

PathRadar

You can also download the zip file here.

Presentation

TBA.

Authors

Zitong Jin, Zhongguancun Laborarty, jinzt@mail.zgclab.edu.cn
Xingang Shi, Tsinghua University, shixg@cernet.edu.cn
Qiang Ma, Zhongguancun Laborarty, maq@mail.zgclab.edu.cn
Letong Sun, Tsinghua University, slt20@mails.tsinghua.edu.cn
Zhiliang Wang, Tsinghua University, wzl@cernet.edu.cn
Xia Yin, Tsinghua University, yxia@tsinghua.edu.cn
Jianping Wu, Tsinghua University, jianping@cernet.edu.cn