Estimating Residential Broadband Capacity using Big Data from M-Lab
Authors: Xiaohong Deng, Yun Feng, Hassan Habibi Gharakheili, Vijay Sivaraman
Knowing residential broadband capacity profiles across a population is of interest to both consumers and regulators who want to compare or audit performance of various broadband service offerings. Unfortunately, extracting broadband capacity from speed tests in public datasets like M-Lab is challenging because tests are indexed by client IP address which can be dynamic and/or obfuscated by NAT, and variable network conditions can affect measurements. This paper presents the first systematic effort to isolate households and extract their broadband capacity using 63 million speed test measurements recorded over a 12 month period in the M-Lab dataset. We first identify a key parameter, the correlation between measured speed and congestion count for a specific client IP address, as an indicator of whether the IP address represents a single house, or a plurality of houses that may be dynamically sharing addresses or be aggregated behind a NAT. We then validate our approach by comparing to ground truth taken from a few known houses, and at larger scale by checking internal consistency across ISPs and across months. Lastly, we present results that isolate households and estimate their broadband capacity based on measured data, and additionally reveal insights into the prevalence of NAT and variations in service capacity tiers across ISPs.
Resources
External Resources
Related Publications
M-Lab: User Initiated Internet Data for the Research Community
Phillipa Gill, Christophe Diot, Lai Yi Ohlsen, Matt Mathis, and Stephen Soltesz
on which researchers have deployed measurement tools. Its mission is to measure the Internet, save the data and make it universally accessible and useful. This paper serves as an update on the MLab platform 10+ years after its initial introduction to the research community. Here, we detail the current state of the M-Lab distributed platform, highlights existing measurements/data available on the platform, and describes opportunities for further engagement between the networking research community and the platform.
The importance of contextualization of crowdsourced active speed test measurements
Udit Paul, Jiamo Liu, Mengyang Gu, Arpit Gupta, Elizabeth Belding
Crowdsourced speed test measurements, such as those by Ookla® and Measurement Lab (M-Lab), offer a critical view of network access and performance from the user's perspective. However, we argue that taking these measurements at surface value is problematic. It is essential to contextualize these measurements to understand better what the attained upload and download speeds truly measure. To this end, we develop a novel Broadband Subscription Tier (BST) methodology that associates a speed test data point with a residential broadband subscription plan. Our evaluation of this methodology with the FCC's MBA dataset shows over 96% accuracy. We augment approximately 1.5M Ookla and M-Lab speed test measurements from four major U.S. cities with the BST methodology. We show that many low-speed data points are attributable to lower-tier subscriptions and not necessarily poor access. Then, for a subset of the measurement sample (80k data points), we quantify the impact of access link type (WiFi or wired), WiFi spectrum band and RSSI (if applicable), and device memory on speed test performance. Interestingly, we observe that measurement time of day only marginally affects the reported speeds. Finally, we show that the median throughput reported by Ookla speed tests can be up to two times greater than M-Lab measurements for the same subscription tier, city, and ISP due to M-Lab's employment of different measurement methodologies. Based on our results, we put forward a set of recommendations for both speed test vendors and the FCC to con-textualize speed test data points and correctly interpret measured performance.
The ukrainian internet under attack: an NDT perspective
Akshath Jain, Deepayan Patra, Peijing Xu, Justine Sherry, Phillipa Gill
On February 24, 2022, Russia began a large-scale invasion of Ukraine, the first widespread conflict in a country with high levels of network penetration. Because the Internet was designed with resilience under warfare in mind, the war in Ukraine offers the networking community a unique opportunity to evaluate whether and to what extent this design goal has been realized. We provide an early glimpse at Ukrainian network resilience over 54 days of war using data from Measurement Lab's Network Diagnostic Tool (NDT). We find that NDT users' network performance did indeed degrade - e.g. with average packet loss rates increasing by as much as 500% relative to pre-wartime baselines in some regions - and that the intensity of the degradation correlated with the presence of Russian troops in the region. Performance degradation also correlated with changes in traceroute paths; we observed an increase in path diversity and significant changes to routing decisions at Ukrainian border Autonomous Systems (ASes) post-invasion. Overall, the use of diverse and changing paths speaks to the resilience of the Internet's underlying routing algorithms, while the correlated degradation in performance highlights a need for continued efforts to ensure usability and stability during war.