Data
Overview
Measurement data from many experiments hosted on M-Lab are processed via the ETL pipeline and published in two forms:
-
Archival Data
-
M-Lab publishes raw output from many measurement tests on Google Cloud Storage as file archives.
-
See M-Lab Archival Data documentation for more information.
-
-
Google BigQuery
-
M-Lab parses data for a subset of tests and publishes the data on BigQuery so that users can run SQL queries on the data.
-
See M-Lab BigQuery QuickStart for more information.
-
Some M-Lab hosted tests do not use our ETL pipeline. Data for these tests are published independently by the test developers.
There is typically at least a 24-hour delay between data collection and data publication. Below we provide links to data for our Current Tests and archival data from Inactive or Retired Tests. Additionally, we list data from Current M-Lab Core Services as well as Retired M-Lab Core Services.
Measurement Data (Active Tests)
-
-
Network Diagnostic Tool (NDT) measures characteristics of a TCP connection under heavy load.
-
NDT data is processed by the M-Lab ETL Pipeline.
-
More technical information is available on GitHub.
-
-
-
Neubot measured the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
-
More information is available at Nexa Center and GitHub.
-
-
-
Reverse traceroute measures the network path back to a user from selected network endpoints, and provides a rich source of information on network routing and topology.
-
Reverse Traceroute data is not processed by the M-Lab ETL Pipeline.
-
More information is available at Reverse Traceroute
-
-
-
Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify, and attempts to tell you whether your ISP is giving different performance to an app's network traffic.
-
More information is available from the WeHe website and GitHub.
-
-
-
The IP Route Survey (IPRS) is a continuous survey of IP-level routing across the internet.
-
IPRS data is not processed by the M-Lab ETL Pipeline.
-
More information is available from the IPRS home page
-
Current M-Lab Core Services and Platform Data
-
-
Collects packet headers for all incoming TCP flows and saves each stream of packet header captures into a per-stream .pcap file.
-
More information is available on Github.
-
Packet Headers Raw Data.
-
-
-
Collects statistics about the TCP connections running on the M-Lab platform using tcp-info.
-
More information is available on Github.
-
-
-
M-Lab uses the Scamper traceroute tool from CAIDA to collect statistics about the TCP connections running on the M-Lab platform using tcp-info.
-
More information is available from CAIDA.
-
-
M-Lab Utilization Telemetry Data
-
Since June 2016, M-Lab has collected high resolution switch telemetry for each M-Lab server and site uplink and published it in the utilization dataset.
-
More information is available in the blog post announcing this dataset provides more information about the utilization dataset.
-
Historical Data Sets (Inactive/Retired Tests)
-
-
BISmark measures Internet service provider (ISP) performance and traffic inside home networks.
-
BISmark data is not processed by the M-Lab ETL Pipeline.
-
More information is available on the Project BISmark website and on the Project BISmark Open Development Portal
-
-
-
Glasnost detected prioritization or censorship of network traffic.
-
-
-
MobiPerf is an open source application for measuring network performance on mobile platforms.
-
MobiPerf data is not processed by the M-Lab ETL Pipeline.
-
More information is available on the MobiPerf website
-
-
-
OONI measures censorship, surveillance, and traffic manipulation on the Internet.
-
OONI data is not processed by the M-Lab ETL Pipeline.
-
More information is available at OONI
-
-
-
Pathload2 measured the available bandwidth of an Internet connection.
-
More information is available at https://code.google.com/p/pathload2-gatech/.
-
-
-
The SamKnows performance testing platform is used by the USA's Federal Communications Commission (FCC), European Commission, UK government (Ofcom), Brazilian government (Anatel), Singapore's IDA and other government-backed studies worldwide.
-
SamKnows infrastructure includes off-net test servers hosted by M-Lab, and the M-Lab and SamKnows teams coordinate regularly to support the various regulatory reporting periods of data collection conducted by SamKnows.
-
SamKnows data is not processed by the M-Lab ETL Pipeline.
-
More information is available at the SamKnows website
-
-
-
ShaperProbe detected prioritization of network traffic.
-
-
- WindRider attempted to detect whether your mobile provider was performing application- or service-specific differentiation.
Retired M-Lab Core Services
-
-
Paris Traceroute maps network topology between two points on the Internet.
-
Paris Traceroute data is processed by the M-Lab ETL Pipeline.
-
More information is available at Paris Traceroute
-
Paris Traceroute Raw Data - Paris Traceroute BigQuery Dataset
-
-
-
SideStream collects TCP state information about completed TCP connections on a system.
-
Sidestream data is processed by the M-Lab ETL Pipeline.
-
More information is available on Github.
-
M-Lab Data Documentation
Here we document how to work with M-Lab data, covering some of the most common topics, from basic to advanced use. If you have questions beyond what is covered here, please contact us.
Querying BigQuery (Basic)
The links below provide the basics of querying M-Lab data.
Querying BigQuery (Advanced)
For researchers and others interested in advanced querying techniques, we provide some guidance on some common use cases in the advanced BigQuery topics below.
Accessing Raw Data via GCS
Advanced users may also be interested in obtaining raw M-Lab test data for detailed analyses. For example, TCP packet captures are conducted for each NDT test, and are only available in M-Lab's raw data archives.
Querying the DISCO Switch Dataset
Analyses
Data License and Citing M-Lab Data
All data collected by M-Lab tests are available to the public without restriction under a No Rights Reserved Creative Commons Zero Waiver.
Please cite M-Lab data sets as follows:
The M-Lab test name Data Set, date range used. M-Lab test URL
For example:
The M-Lab NDT Data Set 2009-02-11–2015-12-21. https://measurementlab.net/tests/ndt
or, in BibTeX format:
@misc{mlab,
author="{Measurement Lab}",
title="The {M}-{L}ab {NDT} Data Set",
year="(2009-02-11 -- 2015-12-21)",
howpublished="\url{https://measurementlab.net/tests/ndt}",
comment="Depending on if you used viz.measurementlab.net, bigquery, or the raw data, please use one of the following notes:",
note="Bigquery table {\tt measurement-lab.ndt.download}",
note1="Google cloud storage {\tt gs://archive-measurement-lab/ndt}",
note2="Data visualization system \url{https://viz.measurementlab.net}",
}