Bounding the bias of tree-like sampling in IP topologies
-
1.
Department of Mathematics, Bar-Ilan University, Ramat-Gan 52900
-
2.
School of Electrical Engineering, Tel Aviv University, Ramat Aviv 69978
-
Received:
01 September 2007
Revised:
01 January 2008
-
-
Primary: 90B15, 90B18.
-
-
It is widely believed that the Internet's AS-graph degree
distribution obeys a power-law form. However, it was
recently argued that since Internet data is collected
in a tree-like fashion,
it only produces a sample of the degree distribution, and this sample
may be biased.
This argument was backed by simulation data and
mathematical analysis, which demonstrated that under certain
conditions a tree sampling procedure can produce an artificial
power-law in the degree distribution.
Thus, although the observed
degree distribution of the AS-graph follows a power-law, this
phenomenon may be an artifact of the sampling process.
In this work we provide some evidence to the contrary. We show, by
analysis and simulation, that when the underlying graph degree
distribution obeys a power-law with an exponent $\gamma>2$, a
tree-like sampling process produces a negligible bias in the sampled
degree distribution. Furthermore, recent data collected from the
DIMES project, which is not based on single source sampling, indicates that the
Internet indeed obeys a power-law degree distribution
with an exponent $\gamma>2$. Combining this empirical data with our
simulation of traceroute experiments on
DIMES-measured AS-graph as the underlying graph, and with our
analysis, we conclude that the bias in the degree distribution
calculated from BGP data is negligible.
Citation: Reuven Cohen, Mira Gonen, Avishai Wool. Bounding the bias of tree-like sampling in IP topologies[J]. Networks and Heterogeneous Media, 2008, 3(2): 323-332. doi: 10.3934/nhm.2008.3.323
-
Abstract
It is widely believed that the Internet's AS-graph degree
distribution obeys a power-law form. However, it was
recently argued that since Internet data is collected
in a tree-like fashion,
it only produces a sample of the degree distribution, and this sample
may be biased.
This argument was backed by simulation data and
mathematical analysis, which demonstrated that under certain
conditions a tree sampling procedure can produce an artificial
power-law in the degree distribution.
Thus, although the observed
degree distribution of the AS-graph follows a power-law, this
phenomenon may be an artifact of the sampling process.
In this work we provide some evidence to the contrary. We show, by
analysis and simulation, that when the underlying graph degree
distribution obeys a power-law with an exponent $\gamma>2$, a
tree-like sampling process produces a negligible bias in the sampled
degree distribution. Furthermore, recent data collected from the
DIMES project, which is not based on single source sampling, indicates that the
Internet indeed obeys a power-law degree distribution
with an exponent $\gamma>2$. Combining this empirical data with our
simulation of traceroute experiments on
DIMES-measured AS-graph as the underlying graph, and with our
analysis, we conclude that the bias in the degree distribution
calculated from BGP data is negligible.
-
-
-
-