How It Works
How WiFi Monitor measures network performance, and why each measurement works the way it does. This is a living document — update it when probe behavior changes or when questions come up about what the monitor captures.
We publish this in full because the credibility of any network-quality tool comes from its methodology being inspectable. If you can't see what we're measuring and why, you have no basis for trusting the numbers. Cloudflare, the IETF responsiveness draft, and the FCC Measuring Broadband America program all publish their methodologies for the same reason. Our framework is closer to FCC MBA / SamKnows than to Speedtest: characterize a link over minutes-to-hours of multi-protocol probing rather than a 30-second snapshot.
Overview
WiFi Monitor measures a network connection at every layer of the stack, all writing to a single timestamped CSV. The goal isn't "how many probes" — it's coverage: each layer is tested with more than one source so the layers cross-check each other. That's what makes the data tell you things a single-protocol speed test never could (TCP PEPs, ICMP proxies, per-CDN throttling, the gap between "this link could stream 4K" and "this player actually does").
The design center is in-flight WiFi — satellite handover detection, TCP PEP detection, per-CDN rate-limiting policy — but the same probes work on any network.
Coverage by layer
| Layer | What we probe | Cross-check / why more than one |
|---|---|---|
| WiFi radio (L1/L2) | RSSI, noise floor, channel, channel width, band, MCS index, NSS (spatial streams), PHY mode, TX rate, SSID/BSSID | Correlate signal drops with throughput drops — distinguishes RF problems from network problems. |
| IP / path (L3) | ICMP ping to default gateway + two internet targets (5/sec); traceroute every 30s | Gateway vs internet split distinguishes local from upstream issues; two targets catch asymmetric routing and per-target rate-limiting. |
| Transport (L4) | TCP SYN, UDP (DNS-port), QUIC handshake — all to the same target where possible | Three transports against one target reveal PEPs (TCP fast + QUIC slow = TCP proxy), ICMP proxying (ICMP fast + everything else slow), and UDP filtering. |
| Application protocols (L7) | HTTP HEAD against CDN + obscure "canary" targets; system-resolver DNS | CDN-vs-canary HTTP delta reveals transparent proxies that cache popular sites; system DNS catches captive portals and DNS interception. |
| CDN diversity | Self-hosted Cloudflare Worker, Cloudflare public speed test, Akamai (dash.akamaized.net), Netflix Open Connect via fast.com | Per-CDN rate-limiting is real and invisible to single-source tests: Netflix may be capped at 2.3 Mbps on a link where Cloudflare sees 10+ Mbps. Four sources surface the policy. |
| Real video player | YouTube segment probe (yt-dlp, no browser) + YouTube ABR probe (Playwright + headless Chromium, real player); Netflix max-bitrate tier from OCA throughput | Segment probe says what should work; ABR probe says what the player actually does. When they disagree, the link or platform is doing something to the player. |
The next two tables show how that coverage maps to each shipping platform — which probes run on which OS, and which features are implementation-complete vs platform-specific.
We deliberately operate at the invasive end of the measurement spectrum: 5/sec ICMP ping to the gateway, multi-protocol probes running concurrently the entire capture, round-scheduled throughput tests that consume up to 900 MB / 30 min. We accept this link impact in exchange for measurement fidelity. A passive home monitor would be a different tool with a different design center.
WiFi Monitor runs on four platforms. The CLI is the reference
implementation; macOS and Windows apps wrap it via subprocess (uv run
--json), inheriting all CLI capabilities. The iOS app is a native Swift
reimplementation — it covers most probes but not all. On iPad, the app
uses an adaptive layout with live latency charts, multi-column stat
panels, and wider post-capture result views — closer to the macOS
experience than the phone layout.
Probes by Platform
| Probe | CLI | macOS App | Windows App | iOS App |
|---|---|---|---|---|
| ICMP Ping (internet) | ✓ | ✓ | ✓ | ✓ |
| ICMP Ping (gateway) | ✓ | ✓ | ✓ | — |
| UDP DNS | ✓ | ✓ | ✓ | — |
| TCP Connect | ✓ | ✓ | ✓ | ✓ |
| QUIC Handshake | ✓ | ✓ | ✓ | ✓ |
| HTTP HEAD + canary | ✓ | ✓ | ✓ | ✓ |
| Speed Test (Worker/CF/Akamai) | ✓ | ✓ | ✓ | ✓ |
| Netflix Max Bitrate (OCA) | ✓ | ✓ | ✓ | ✓ |
| YouTube Segment (no browser) | ✓ | ✓ | ✓ | — |
| YouTube ABR (real player, Chromium) | ✓ | ✓ | ✓ | — |
| DNS (system resolver) | ✓ | ✓ | ✓ | ✓ |
| Traceroute | ✓ | ✓ | ✓ | — |
| WiFi Signal (basic) | ✓ | ✓ | ✓ | — |
| WiFi Signal (PHY: MCS, width, band) | ✓ | ✓ | — | — |
| Loss Burst Detection | ✓ | ✓ | ✓ | — |
iOS uses Akamai-only speed testing. Netflix OCA discovery (via the fast.com API) is implemented on all platforms including iOS. The YouTube probes (which depend on either Playwright + Chromium or yt-dlp) are not implemented on iOS — YouTube streaming measurement is available on CLI, macOS, and Windows.
iOS probe frequencies differ from CLI. The CLI runs TCP at 2s, QUIC and HTTP at 5s (active profile). iOS runs all three at 15s intervals (~4×/min) as a battery/radio compromise — sufficient for trend lines without hammering the cellular radio on GEO satellite. DNS is 30s on both platforms. ICMP ping is 1/sec on iOS (vs 5/sec on CLI).
Features by Platform
| Feature | CLI | macOS App | Windows App | iOS App |
|---|---|---|---|---|
| Adaptive probe backoff | ✓ | ✓ | ✓ | — |
| Adaptive round scheduler (5/10/30-min) | ✓ | ✓ | ✓ | mode-based |
On-demand speed test (s key) |
✓ | ✓ | ✓ | — |
On-demand YouTube test (y key) |
✓ | ✓ | ✓ | — |
| Observed throughput | ✓ | ✓ | ✓ | — |
| Upload to R2 | ✓ | ✓ | ✓ | ✓ |
| Privacy transform | ✓ | ✓ | ✓ | ✓ |
| PNG analysis charts | ✓ | — | ✓ | — |
| Capture history | — | ✓ | ✓ | ✓ |
| Live latency charts | — | ✓ | — | iPad only |
| iPad adaptive layout | — | — | — | ✓ |
Probe Types
ICMP Ping
| Property | Value |
|---|---|
| Frequency | 5/sec (0.2s interval) |
| Targets | 8.8.8.8, 1.1.1.1 |
| + Gateway | Auto-detected default gateway |
| Timeout | interval + 2s |
| Logged as | ping |
What it measures: Base round-trip time and packet loss. The most fundamental network measurement — ICMP echo request/reply with no application-layer overhead.
Implementation: On macOS/Linux, uses the system ping binary (setuid,
SOCK_RAW). This is deliberate: macOS's SOCK_DGRAM ICMP handling (used by
icmplib's non-privileged mode) causes ~1-second quantized latency spikes
under concurrent load. The system ping avoids this artifact entirely. See
docs/vpn-periodic-latency-spikes.md for the investigation.
Falls back to icmplib on Windows or when the system ping binary is unavailable. The icmplib path uses a fire-and-forget pattern: pings are sent at strict intervals regardless of RTT, allowing multiple pings in-flight simultaneously. This matters on GEO satellite links (~600ms RTT) where a synchronous ping-wait-ping approach would reduce effective frequency.
Gateway ping: Pings the default gateway (usually 192.168.1.1) to distinguish local WiFi issues from internet path issues. If a spike appears in gateway ping simultaneously with internet ping, the problem is local. Auto-disables after 20 consecutive losses (some networks block ICMP to the gateway).
Why two internet targets: If one target has issues (Google's 8.8.8.8 occasionally rate-limits ICMP), the other provides a control. Also detects asymmetric routing — different targets may take different paths.
UDP (DNS)
| Property | Value |
|---|---|
| Frequency | 5/sec (0.2s interval) |
| Targets | 8.8.8.8, 1.1.1.1 |
| Timeout | 2s |
| Logged as | udp |
What it measures: UDP round-trip time, independent of ICMP. Sends a minimal DNS A-record query and times the response. This is a real UDP round-trip, not just an ICMP echo.
Why it matters: Some networks treat ICMP differently from real traffic — lower priority, rate limiting, or filtering. UDP probes via DNS provide a second latency measurement using actual data-plane traffic. On networks with ICMP deprioritization, you'll see UDP RTT match TCP/HTTP while ICMP shows higher latency.
Implementation: Builds a raw DNS query packet (A record for www.google.com), sends it over SOCK_DGRAM to port 53, and verifies the transaction ID in the response. No DNS library needed — the query is ~30 bytes.
TCP Connect
| Property | Value |
|---|---|
| Frequency | Every 2s |
| Targets | 8.8.8.8:443, 1.1.1.1:443 |
| Timeout | 5s |
| Logged as | tcp |
What it measures: TCP SYN-ACK time — the time for a TCP three-way handshake to complete. Connects to port 443 and immediately closes.
Why it matters: This is the key probe for detecting TCP Performance Enhancing Proxies (PEPs). On satellite connections, airlines often deploy transparent TCP proxies that intercept TCP connections and respond with a local SYN-ACK, making TCP appear much faster than the actual satellite round-trip. Compare:
- ICMP: 600ms (real satellite RTT)
- TCP: 5ms (proxy responding locally)
This gap is the signature of a TCP PEP. Without the multi-protocol comparison, you'd never know the proxy exists.
TCP retransmit behavior: On packet loss, TCP retransmits after 1 second (the initial RTO). This means TCP loss events show as 1000ms+ spikes rather than the gaps you see in ICMP. This difference is itself informative — it shows how TCP's loss recovery affects application performance.
QUIC Handshake
| Property | Value |
|---|---|
| Frequency | Every 5s |
| Targets | google.com:443, cloudflare.com:443 |
| Timeout | 10s |
| Logged as | quic |
What it measures: QUIC handshake time — a full UDP + TLS 1.3
connection setup negotiating HTTP/3 (ALPN h3). Connects to port 443 and
immediately closes after the handshake completes.
Why it matters: QUIC uses UDP, not TCP. This makes it the best probe for detecting TCP Performance Enhancing Proxies (PEPs) — the same proxies that make TCP SYN look artificially fast can't intercept QUIC because it's UDP-based. Compare:
- TCP SYN: 5ms (proxy responding locally)
- QUIC: 600ms (actual satellite round-trip, proxy can't help)
This gap is the inverse of the ICMP/TCP gap and confirms the proxy hypothesis from a different angle. QUIC is also increasingly the protocol passengers actually use — YouTube, Google services, and Cloudflare all default to QUIC/HTTP/3 when available.
Why hostnames instead of IPs: QUIC requires TLS SNI (Server Name Indication), so raw IPs won't work. DNS resolution is included in the first probe of each round, which is intentionally realistic — it measures what a real QUIC connection costs.
Expected RTT vs TCP SYN: QUIC handshake is typically 1.5-3x TCP SYN time on the same path because it includes TLS 1.3 key exchange (one additional round-trip). On proxied satellite links, the ratio inverts dramatically since the proxy only accelerates TCP.
Networks that block QUIC: Some enterprise/airline networks block UDP on port 443, forcing fallback to TCP/HTTP/2. The QUIC probe will show 100% loss on these networks — this is itself useful data, confirming that the network actively filters QUIC traffic.
Implementation: Uses the aioquic library (optional dependency). If
not installed, the probe is silently skipped. A monkey-patch suppresses
harmless ValueError noise from aioquic's StreamWriter cleanup of
server-initiated push streams.
HTTP HEAD
| Property | Value |
|---|---|
| Frequency | Every 5s |
| Targets | google.com, cloudflare.com + proxy canaries |
| Timeout | 12s |
| Logged as | http |
What it measures: Full-stack latency: DNS resolution + TCP handshake + TLS negotiation + HTTP request/response. This is what users actually experience when loading a webpage.
Proxy canary targets: In addition to CDN targets (Google, Cloudflare), the probe tests obscure "canary" sites (httpbin.org, icanhazip.com, example.com). On startup, it calibrates which canaries are reachable, then selects one HTTP and one HTTPS candidate.
The idea: transparent proxies often cache/accelerate popular CDN content but
pass through traffic to obscure sites unmodified. By comparing CDN latency
vs canary latency (cdn_delta_ms in the log), you can detect:
- Positive delta: Canary is slower — normal, canary is further away
- Negative delta: Canary is faster than CDN — suggests CDN content is
being intercepted and re-served (proxy artifact)
- Large delta on HTTP but not HTTPS: Port-80-only transparent proxy
Response header inspection: Logs Via, X-Cache, X-Cache-Hits,
X-Forwarded-For, and Server headers — additional proxy fingerprints.
Session reuse: Uses a persistent requests.Session so TLS sessions are
reused after the first request. This means subsequent measurements reflect
steady-state latency, not cold-start TLS negotiation.
Concurrency: All HTTP probes in a round fire concurrently via
asyncio.gather(). This prevents slow canary sites from delaying CDN
measurements.
Throughput
| Property | Value |
|---|---|
| Frequency | Adaptive round schedule (see below) + on-demand s key |
| Backend | 3-tier: self-hosted Worker → Cloudflare → Akamai |
| Streams | 4 parallel TCP connections |
| Logged as | throughput_down, throughput_up, calibration |
What it measures: Download and upload speed via multi-stream HTTP transfers. Saturates the link (like Ookla) to measure true capacity.
Backend fallback
Speed tests use a 3-tier backend priority system:
-
Self-hosted Worker (primary): Download endpoint at
howismywifi.com/speed-test/download?bytes=N— our Cloudflare Worker streaming zeros viaReadableStream. Zero egress cost, no rate limiting, no bot detection. Upload goes to our existing upload-signer Worker. -
Cloudflare (fallback):
speed.cloudflare.com/__down?bytes=Nand__up. Free, no API key, reliable. Falls back here if the Worker is unreachable. -
Akamai (last resort):
dash.akamaized.nettest files. Activated after consecutive Cloudflare 429/403 errors (shared NAT IPs on satellite links trigger per-IP rate limits).
Each tier tracks consecutive failures independently. After a threshold
(3 failures, or 5 on GEO satellite), the next tier becomes active.
Every 30 minutes, blocked tiers are re-probed for recovery. The active
backend is logged with each speed test result (backend=worker|cloudflare|akamai).
Calibration (runs once at startup)
Before the first throughput test, a two-step calibration determines how to size transfers:
- Latency probe (1KB download): Measures time-to-first-byte to classify the link. >400ms = GEO satellite, everything else = terrestrial. This sets the test duration: 4s for terrestrial links, 15s for GEO satellite (TCP slow-start takes ~7 RTTs = 4.2s to fill the congestion window at 600ms RTT).
PEP-aware GEO detection: On PEP'd satellite links, the TCP latency probe may return ~80ms (the proxy's local response) instead of the true ~700ms satellite RTT, causing GEO misclassification. To counter this, GEO detection is an OR-latch: if any signal indicates GEO (calibration TTFB, QUIC handshake >400ms, or ICMP median >400ms), the link is classified as GEO and never downgraded. QUIC is the strongest signal since it uses UDP and bypasses the PEP entirely.
- Speed calibration (iterative): Downloads and uploads at increasing sizes (5, 10, 20, 40, 100 MB) until the estimate converges (<20% change between rounds). Slow links converge quickly (2 rounds); fast links iterate further. Max calibration data: 175 MB. Download and upload converge independently — once one direction converges, it stops probing.
The calibrated speed determines transfer size: speed * duration / streams
bytes per stream, clamped to 500KB-50MB per stream.
Loss-aware early exit: On severely lossy links (sustained ICMP loss
above ~8%, or self-observed throughput variance above 40%), calibration
exits after round 2 instead of grinding through all five sizes. The
extra rounds rarely converge on a noisy link and only waste data
budget. The completion record carries an early_exit flag so analysis
can distinguish "this is the converged value" from "this is the best
estimate we got before bailing."
Adaptive Round Schedule
Captures are organized into rounds. Each round is a fixed bundle of work (speed test, Netflix, ICMP, etc.) sized to take a known amount of time. The number of rounds in a capture scales with the capture duration so that short captures still produce meaningful coverage:
| Capture duration | Rounds | Notes |
|---|---|---|
| 5 min (Quick) | 3 | Two work rounds + one final round |
| 10 min (Standard) | 5 | Evenly spaced across the window |
| 30 min (Extended) | 8 | Full bursty + spread coverage |
| Continuous | 8 / 30 min, repeats | 30-min cycle, no end |
Rounds are scheduled with even spacing across the available time
(rather than the older 30-min wave's "burst then spread" cadence). Live
displays and JSON events include round_num / total_rounds so the
apps can show "Round 3 of 8: testing download…" — the user sees what's
happening and when the capture will finish.
On-demand triggers: The CLI and apps support manual test triggering during a capture without breaking the round schedule:
s— run a speed test immediatelyy— run a YouTube test immediately
Useful for "the connection just got worse, test it again" without having to start a new capture.
Passive profile: Long-running passive captures (not the default) keep the heavy probes on a wider cadence — speed tests every 2 hours instead of every round — to limit data usage for unattended monitoring.
Out-of-Family Detection
If a throughput result deviates >50% from the median of the last 5 results, it's flagged as "out of family" and a retest is scheduled 90 seconds later. Max 1 retest per round. Retests never trigger further retests.
This catches transient anomalies — a single bad result gets verified before it skews analysis. The 90-second delay is long enough for transient congestion to clear but short enough to catch real changes.
Data Budget
Each 30-minute window has a 900 MB data budget. When cumulative download
bytes exceed this, remaining tests are skipped (logged as
skipped=data_budget). This prevents Cloudflare rate limiting (~1 GB
threshold triggers 429/403 responses).
On rate limit (429 or 403): exponential backoff (30s, doubling up to 120s) AND transfer size is halved. On success, backoff decays gradually (halved each success) rather than resetting instantly — prevents immediately re-triggering the limit.
Measurement Method
- Download: Streaming with per-chunk samples. Uses raw throughput (total_bytes / wall_clock_time). The warmup-discard approach (dropping first 30% of samples) was found to overcorrect on satellite links due to bursty beam-scheduled delivery.
- Upload: Wall-clock timing only. Sample-based timing measures TCP buffer fill speed (how fast the OS accepts data), not actual network throughput. Wall-clock timing includes the time for the server to acknowledge receipt.
- PEP burst guard: On PEP'd satellite links, data arrives in bursts — the proxy prefetches from the server at ground speed and delivers from its buffer over the satellite downlink. Without correction, the steady-state algorithm can compute throughput over a 0.6s burst window and report 445 Mbps on a 24 Mbps link (18x inflation). The burst guard compares raw throughput (total_bytes / wall_clock) to steady-state; when the ratio exceeds 3x, the raw value is used instead.
- Raw samples: Per-chunk timing data is written to a sidecar CSV
(
*_throughput_raw.csv) for TCP ramp-up analysis. Fields:timestamp_utc, test_id, direction, sample_num, elapsed_ms, bytes.
Netflix Max Bitrate (OCA)
| Property | Value |
|---|---|
| Frequency | Twice per round (paired with speed test bursts) |
| Targets | Netflix OCA (via fast.com API), Akamai (dash.akamaized.net) |
| Timeout | 45s wall-clock cap |
| Logged as | cdn_throughput |
What it measures: Download throughput from the same Netflix Open Connect Appliance (OCA) servers that serve Netflix video to real users. Measured Mbps is mapped to a streaming resolution tier so the result answers a question users actually have — "would Netflix play in HD on this link?" — instead of a raw number that requires interpretation.
Resolution tiers: Based on Netflix's published encoding bitrates:
| Tier | Measured throughput |
|---|---|
| 4K | ≥ 8 Mbps |
| 1080p | ≥ 2.5 Mbps |
| 720p | ≥ 1.5 Mbps |
| SD | < 1.5 Mbps |
Why it matters: Different CDNs are treated differently by in-flight networks. Airlines commonly deploy per-CDN rate limiting — Netflix OCA traffic may be capped at 2.3 Mbps on GEO satellite while Cloudflare sees 10+ Mbps on the same link. The Netflix probe surfaces these policies directly: it's not enough to know the link can do 25 Mbps if Netflix itself is shaped to 720p.
Netflix OCA discovery: The probe fetches the fast.com page,
extracts the API token from the embedded JS bundle, then calls
api.fast.com/netflix/speedtest/v2 to get OCA URLs. If an OCA URL
expires during a long capture (returns errors), the probe re-discovers
a fresh URL automatically. Akamai (dash.akamaized.net) is tested in
parallel as a generic CDN comparison point.
Segment sizing: Download size adapts to link latency (from calibration): 10 MB on GEO satellite (>400ms RTT), 5 MB on medium latency, 2 MB on low latency. Larger segments on high-latency links ensure enough data to get past TCP slow-start. Speed test fallback: after 3 consecutive Cloudflare blocks, the throughput probe also falls back to Akamai for the remainder of the round.
Measurements logged: Throughput (Mbps), resolution tier, time-to-first-byte (TTFB), DNS resolution time, bytes transferred, content type.
YouTube (real-player ABR + segment probe)
| Property | Value |
|---|---|
| Frequency | Once per round (when scheduled); also on demand via y key |
| Target | YouTube (rotating set of long-form public videos) |
| Logged as | youtube_segment, youtube_abr |
What it measures: Real video streaming performance — not just whether the link could support YouTube, but what YouTube actually does on it. This is the full-stack credibility test: ICMP and throughput can look fine while a real player still struggles with startup or quality selection. The probe runs in two stages: a fast segment-level measurement, and (when Chromium is available) a real headless player.
Segment probe (no browser, ~8s)
Uses yt-dlp to resolve a video's DASH/HLS manifest, then downloads
four real media segments over HTTP and times each one. Reports
per-segment throughput, time-to-first-byte, CDN host, codec, and
container format — plus a recommended streaming resolution:
| Recommended tier | Required throughput (incl. 30% headroom) |
|---|---|
| 4K | ≥ 9 Mbps |
| 1080p | ≥ 1.6 Mbps |
| 720p | ≥ 0.8 Mbps |
| 480p / lower | < 0.8 Mbps |
The segment probe runs on every YouTube test slot and ships with the Python CLI / macOS / Windows apps with no Chromium install required. It is the data behind the "Recommended quality" chip on the results page.
ABR probe (Playwright + headless Chromium, ~50s)
When Chromium is installed (offered on first run; bundled as an opt-in download in the apps), the ABR probe launches a real headless YouTube player and runs two phases:
- AUTO phase: let YouTube's ABR algorithm pick the quality on this link.
- FORCED phase: ask the player for the resolution the segment probe said it could support, and see what actually happens.
Per phase, the probe reports cold startup time (page navigation to
first frame), the resolution actually rendered, dropped frames, stall
count and total stall duration, ABR quality switches, codec, container,
and player-reported bandwidth and buffer health. An ad_seen flag
records whether YouTube served an ad during startup — important because
ads can mask or skew the cold-start measurement, and the flag flows all
the way through to the results page so post-hoc analysis can filter.
Video rotation and fallbacks: The probe maintains a list of long-form public videos and rotates through them; if a video is unavailable (regional restriction, takedown, geo-blocked OCA), the next one in the list is tried. The full fallback loop is exercised in tests so a single bad video can't silently kill the probe.
Why two probes: The segment probe is fast, dependency-free, and gives a portable "what should work" number. The ABR probe is slower and heavier but tells you what a real player actually does, including startup latency that no throughput test can capture. Both numbers appear side-by-side: when they agree the link is well-behaved; when they disagree (segment says 1080p, ABR locks to 480p), the network or platform is doing something to the player itself.
Platforms: CLI, macOS, Windows. iOS does not run either probe (Playwright/Chromium and yt-dlp are not available on iOS); the iOS app focuses on link characterization only.
DNS
| Property | Value |
|---|---|
| Frequency | Every 30s |
| Target | www.google.com |
| Method | System resolver (getaddrinfo) |
| Logged as | dns |
What it measures: End-to-end DNS resolution time as applications experience it. Uses the system resolver, which includes OS DNS cache effects.
Why system resolver (not raw DNS): The UDP probe already measures raw DNS transport latency to 8.8.8.8 and 1.1.1.1. This probe captures what applications actually see — including local caching, search domain expansion, and any DNS interception by the network. On networks with captive portals or DNS-based filtering, this probe shows the real behavior.
Why 30s interval: DNS results are heavily cached. More frequent probing would just measure cache hits. 30 seconds balances between capturing DNS changes and not flooding the resolver.
Traceroute
| Property | Value |
|---|---|
| Frequency | Every 30s |
| Target | 8.8.8.8 |
| Max hops | 20 |
| Logged as | traceroute |
What it measures: The network path — every router hop between the device and the target, with per-hop latency.
Why it matters: Path changes correlate with performance changes. On satellite connections, a path change can indicate a beam handover or gateway switch. On terrestrial networks, path changes may indicate routing convergence events. The hop count itself is informative — satellite links typically show fewer hops (device → gateway → satellite → ground station → internet).
Implementation: Uses the system traceroute command on macOS/Linux
(-n -q 1 -w 2 -m 20 — numeric, one probe per hop, 2s timeout, max 20
hops). Falls back to icmplib on Windows. Non-responding hops (* * *) are
skipped in the output.
First hop tracking: The first responding hop is logged separately
(first_hop=) because it identifies the local network gateway and can
change if the device roams between access points.
WiFi Signal
| Property | Value |
|---|---|
| Frequency | Every 10s |
| Platform | macOS, Windows |
| Logged as | wifi |
What it measures: RF and PHY-layer metrics from the WiFi adapter:
- RSSI — signal strength in dBm
- Noise floor — ambient noise in dBm
- SNR — signal-to-noise ratio (RSSI minus noise)
- Channel — WiFi channel number
- Channel width — 20, 40, 80, or 160 MHz (macOS only)
- Channel band — 2.4 GHz, 5 GHz, or 6 GHz (macOS only)
- PHY mode — 802.11a/b/g/n/ac/ax, i.e., WiFi 4/5/6 (macOS only)
- TX rate — transmit rate in Mbps
- MCS index — modulation and coding scheme index (macOS only)
- NSS — number of spatial streams, i.e., MIMO configuration (macOS only)
- SSID and BSSID — network name and access point MAC address
Why it matters: Correlating signal metrics with performance metrics reveals whether issues are RF-related (weak signal, channel congestion) or network-related (routing, congestion, satellite handover). A latency spike that coincides with an RSSI drop is likely a local issue; one without signal change points to the upstream path.
The extended PHY metrics (channel width, band, PHY mode, MCS, spatial streams) are particularly useful for diagnosing why throughput is lower than expected. For example: connected on WiFi 6 (ax) but only 20 MHz channel width and 1 spatial stream explains a 100 Mbps ceiling. Or: PHY mode dropped from ac to n mid-capture, indicating the adapter fell back to a slower standard (often due to interference or range).
macOS implementation: Primary method is a compiled Swift binary using
CoreWLAN framework (works on macOS Sequoia 15+, where the deprecated
airport command was removed). The binary is compiled once on first run
and cached at a hash-versioned path so source changes trigger recompile.
Falls back to the airport -I command on older macOS (which does not
provide channel width, band, PHY mode, MCS, or spatial stream data).
MCS index and spatial stream count are read via undocumented KVC
(mcsIndex, numberOfSpatialStreams) on the CWInterface object. These
keys have been stable across macOS versions but are not part of the
public CoreWLAN API.
The macOS app can provide a pre-compiled binary via the WIFI_INFO_PATH
environment variable, avoiding the first-run compilation step.
Note: macOS Sequoia redacts SSID and BSSID without Location Services entitlement. The app captures what's available.
Windows implementation: Parses netsh wlan show interfaces output.
Signal percentage is converted to approximate dBm: (pct / 2) - 100.
Extended PHY metrics (channel width, band, MCS, spatial streams) are not
available via netsh.
Loss Burst Detection
Runs in the background, scanning new CSV rows every 30 seconds for
consecutive ping losses. When 5 or more consecutive ICMP losses are
detected, a loss_burst row is logged with:
- burst_size: Number of consecutive lost packets
- duration_ms: Approximate burst duration (count * 200ms at 5/sec rate)
- utc_second: Position within the UTC second when the burst started
The UTC second alignment is designed for satellite handover analysis: GEO satellites use 15-second beam cycles, so loss bursts that align to 15-second boundaries suggest handover events rather than random packet loss.
Uses incremental file reads — tracks file position between scans to avoid re-reading the entire CSV on long captures.
Loaded Latency (Bufferbloat Split)
Loaded latency measures how much latency increases when the network is under load — the defining characteristic of bufferbloat. While idle latency tells you how fast a quiet connection is, loaded latency tells you what users actually experience when downloads, uploads, video calls, and cloud sync are all running at once. Ookla's Speedtest now reports this as three separate measurements (idle, download-loaded, upload-loaded), and it's becoming a key metric for operators.
We compute loaded latency by post-hoc analysis of the continuous ping stream against speed test timing windows. Because pings run at 5/sec throughout the capture, we already have latency samples covering every speed test — no dedicated loaded-latency probe is needed.
How it works
During analysis, every ping is classified into one of three buckets:
- idle — no throughput test was running
- dl_loaded — a download speed test was in progress
- ul_loaded — an upload speed test was in progress
The classification uses throughput test timestamps with guard bands:
- Phase guard (250ms): Each throughput phase window is extended by 250ms on each end to capture pings that overlap phase boundaries.
- Idle guard (5000ms): Pings within 5 seconds of any phase boundary are excluded from the idle bucket. This prevents contamination from TCP ramp-up/cooldown — the network may still be buffered even though the speed test has technically ended.
When download and upload phases overlap (common on asymmetric GEO satellite links where upload finishes during a long download), pings in the overlap region are assigned to the shorter-duration phase. This prevents the longer phase from dominating the classification.
Gateway vs internet decomposition
Because we ping both the gateway and internet targets at 5/sec, we compute bufferbloat separately for each hop:
- Internet bufferbloat (ping to 8.8.8.8 / 1.1.1.1): Shows total path bufferbloat — router + ISP + upstream.
- Gateway bufferbloat (ping to local gateway): Isolates the local WiFi/router contribution. If gateway bufferbloat is high but internet is low, the router is the bottleneck. If both are high, the problem is upstream.
No other consumer tool provides this decomposition, because no other tool pings the gateway during throughput tests.
Output
For each hop (gateway and internet), the analysis produces:
| Field | Description |
|---|---|
| idle.p50, idle.p95 | Median and 95th percentile RTT when network is quiet |
| dl_loaded.p50, dl_loaded.p95 | RTT during download speed tests |
| ul_loaded.p50, ul_loaded.p95 | RTT during upload speed tests |
| bufferbloat_dl_ms | dl_loaded.p50 − idle.p50 (the delta) |
| bufferbloat_ul_ms | ul_loaded.p50 − idle.p50 |
The bufferbloat delta is rated on the results page:
| Rating | Delta |
|---|---|
| excellent | < 20 ms |
| good | < 50 ms |
| fair | < 100 ms |
| impaired | < 200 ms |
| broken | ≥ 200 ms |
Implementation
Implemented in Python (csv_extract.py:compute_bufferbloat_split())
and a parallel JavaScript port (bufferbloat.js) in the upload-signer
Worker, so both local analysis and server-side data.json generation
produce identical results. Displayed on results pages as an "Under Load"
panel showing idle/DL/UL latency bars with a quality rating.
PEP Detection
| Property | Value |
|---|---|
| Metric | pep.detected (boolean) |
| Method | QUIC/TCP RTT ratio at shared target |
| Target | cloudflare-quic.com:443 |
| Threshold | QUIC p50 / TCP p50 ≥ 5.0× |
| Min samples | 10 per protocol |
| Constraint | Only flags on GEO satellite links |
What it measures: Whether a TCP Performance Enhancing Proxy (PEP) is intercepting TCP connections on the link. PEPs respond to TCP SYN locally, making TCP appear much faster than the actual satellite round-trip. QUIC uses UDP and bypasses the proxy entirely, so the QUIC/TCP ratio reveals the proxy's presence.
Why a shared target matters: Earlier captures compared TCP to one set
of targets and QUIC to another. Different targets may have different
routing, introducing noise. Both probes now test cloudflare-quic.com:443
(which supports both TCP and QUIC), giving an apples-to-apples
comparison. Falls back to overall protocol stats for captures recorded
before the shared target was added.
Why GEO-only: A high QUIC/TCP ratio on terrestrial links usually means the QUIC target is simply further away or QUIC is being throttled — not a PEP. PEPs are a satellite-specific technology, so the metric only triggers on GEO links (median RTT > 400ms).
Display: When detected, the results page shows a "TCP Acceleration Detected" banner explaining what the proxy is and how it affects measurements. The protocol comparison bars also get a "PEP" chip, and protocol ordering changes to highlight the TCP/QUIC gap.
Implementation: Python (csv_extract.py:compute_pep_index()) and
JavaScript (pep.js in the upload-signer Worker).
ICMP Proxy Detection
| Property | Value |
|---|---|
| Metric | icmp_proxied (boolean) |
| Method | ICMP vs HTTP/QUIC/TCP RTT ratio |
| Threshold | Ratio > 50× AND ICMP p50 < 5ms AND other protocol p50 > 50ms |
| Min samples | 10 ICMP, 5 for comparison protocol |
What it detects: Some in-flight networks have the onboard gateway answer ICMP echo replies locally instead of forwarding them over the satellite link. This makes ping latency appear sub-millisecond on a 600ms satellite connection — superficially great numbers that are completely misleading.
How it works: If ICMP p50 is below 5ms and a comparison protocol (HTTP, QUIC, or TCP) has p50 above 50ms and the ratio exceeds 50×, ICMP is flagged as proxied. The ratio guard is the primary discriminator — a 0.5ms ICMP with 355ms HTTP gives a ratio of 710×, far above the threshold. The 50ms floor prevents false positives on genuinely fast local networks.
What changes when detected: The results page shows an "ICMP Pings Answered Locally" banner. Latency ratings switch from ICMP to HTTP HEAD measurements, which the gateway can't fake. The latency spread panel re-labels as "Latency Spread (HTTP)" so the rating reflects real end-to-end performance, not the local gateway distance.
Coverage: Works on GEO satellite, LEO satellite, and hybrid networks (like EAN — European Aviation Network, a hybrid S-band + LTE system). Originally calibrated for GEO only; threshold was lowered from 400ms to 50ms after EAN captures showed the pattern at sub-GEO latencies.
Implementation: Python (io.py:detect_icmp_proxy()) and JavaScript
(icmpProxy.js in the upload-signer Worker).
Adaptive Probe Backoff
When a probe experiences sustained failures, the monitor adapts its polling frequency rather than hammering an unresponsive target at full speed. This operates at two levels:
Loop-level backoff (AdaptiveInterval)
Each probe loop tracks consecutive losses and doubles its interval after a threshold period of continuous failure. The behavior is network-aware when a shared NetworkHealth signal is available:
-
Blocked (network is up, this probe is failing): The probe is likely filtered — e.g., QUIC blocked on a corporate network. Backs off to the configured max interval (e.g., 60s for QUIC) after 60 seconds of failures. This reduces noise without losing the data point entirely.
-
Outage (entire network is down): Caps at 2 seconds regardless of how long the outage lasts. Fast polling during an outage ensures the probe detects recovery immediately — important for capturing satellite handover timing.
On the first success after being throttled, the interval resets immediately to its base rate.
Per-target backoff (TargetBackoff)
Multi-target probe loops (ping, UDP, TCP) may have one target blocked while others work fine. TargetBackoff tracks failures per target and skips backed-off targets most iterations, retrying only every 60 seconds. When a backed-off target recovers, it returns to normal polling immediately.
This prevents a single blocked target from inflating the CSV with loss rows while keeping the other targets at full frequency.
Network Detection
At startup, the tool auto-detects the network environment:
- Gateway: Default route via
netstat -rn(macOS),route print(Windows), orip route show default(Linux) - VPN: Detects active VPN tunnels by finding utun interfaces with IPv4 addresses assigned (macOS), TAP/TUN/VPN/WireGuard adapters (Windows), or tun interfaces (Linux). Identifies known corporate VPNs via DNS search domains (e.g., corporate domain → "Corporate VPN")
- WiFi info: Signal strength, channel, SSID (see WiFi Signal section)
Quality Tiers and What They Mean
Every metric is classified into a 6-tier rating system: superior → excellent → good → fair → impaired → broken. The overall capture rating is the worst of any individual metric — one bad metric drags the overall down. This is intentional: a connection with great download but unusable latency for video calls should not get an "excellent" rating.
The tiers are calibrated to the link type. GEO satellite has a ~600ms physics floor, so latency tiers shift up and cap at "good" — a 700ms GEO RTT is as good as it physically gets and shouldn't be penalized for what physics enforces. Below are the actual thresholds.
Latency (median RTT)
| Tier | Non-GEO | GEO satellite |
|---|---|---|
| superior | < 20 ms | (n/a — physics floor) |
| excellent | < 50 ms | (n/a) |
| good | < 100 ms | ≤ 700 ms |
| fair | < 200 ms | ≤ 900 ms |
| impaired | < 500 ms | ≤ 1200 ms |
| broken | ≥ 500 ms | > 1200 ms |
GEO detection: median RTT > 400 ms is treated as GEO satellite.
Packet loss
| Tier | Threshold |
|---|---|
| superior | < 0.1 % |
| excellent | < 0.5 % |
| good | < 1 % |
| fair | < 5 % |
| impaired | < 10 % |
| broken | ≥ 10 % |
These thresholds are harmonized with the fleet-wide dashboard's
connectivityLoss thresholds for cross-tool comparability.
Download throughput
| Tier | Mbps |
|---|---|
| superior | ≥ 200 |
| excellent | ≥ 100 |
| good | ≥ 25 |
| fair | ≥ 12 |
| impaired | ≥ 3 |
| broken | < 3 |
Upload throughput
| Tier | Mbps |
|---|---|
| superior | ≥ 100 |
| excellent | ≥ 25 |
| good | ≥ 5 |
| fair | ≥ 1 |
| impaired | ≥ 0.5 |
| broken | < 0.5 |
WiFi signal (RSSI)
| Tier | dBm |
|---|---|
| excellent | ≥ -50 |
| good | ≥ -67 |
| fair | ≥ -70 |
| impaired | < -70 |
The thresholds derive from the WiFi Alliance signal classification guidelines, where ≥ -67 dBm is the typical threshold for reliable streaming.
CSV Format
All measurements write to a single CSV with columns:
timestamp_utc, measurement_type, target, rtt_ms, loss_bool, seq,
hop_count, throughput_mbps, dns_ms, burst_size, notes
The notes field carries structured key=value pairs specific to each
measurement type (protocol details, proxy headers, calibration data, etc.).
A separate raw throughput samples CSV (*_throughput_raw.csv) records
per-chunk timing data for TCP ramp-up analysis.
Post-Capture: Upload and Server-Side Analysis
After a capture completes, files are uploaded to Cloudflare R2. The upload Worker then generates the results page data automatically — clients upload raw measurements and are done.
Upload pipeline
-
Client uploads raw files: CSV, meta.json (location, network, privacy settings), and throughput_raw.csv (per-chunk samples). Files are gzip-compressed (~5-6x) before upload.
-
Worker generates
data.jsonserver-side by analyzing the CSV. The Worker runs the same analysis code (bufferbloat split, PEP detection, ICMP proxy detection, quality ratings, timeseries extraction) as a JavaScript port. Every upload automatically gets a results page — no client-side analysis needed. -
Worker updates
captures-index.jsonwith the new capture metadata, making it immediately visible on the browse page.
This replaced the earlier model where each client (Python, Swift, macOS app) generated data.json independently before upload. Server-side generation eliminated reliability gaps (5/358 historical captures had no results page due to client-side generation failures) and the maintenance burden of keeping Python and Swift analysis in sync.
Upload modes
-
Worker proxy (default): Uploads via the Cloudflare Worker — no R2 credentials needed on the client. Config:
upload_url+upload_tokenin~/.wifi-monitor/r2.json. Used by all platforms. -
Direct S3 (admin): Uploads via boto3 with R2 credentials. Used by
report.py --pullto download captures for report generation.
Config is auto-provisioned on first use from a bundled default_r2.json,
which generates a unique uploader_id per machine. Upload state is
tracked locally in ~/.wifi-monitor/uploaded.json to avoid
re-uploading. Pending uploads (e.g., laptop closed before upload
finished) are caught up on the next run.
Access control
The R2 bucket is served through a Cloudflare Worker at howismywifi.com.
The capture listing and index API are restricted to VPN users. Individual
results pages are accessible by direct link (capture IDs include a random
component to prevent enumeration). Crawlers are blocked from /c/ paths,
and brute-force scanning is rate-limited. See docs/r2-data-protection.md
for full details.
Capture Management
The wifi-manage CLI provides post-capture operations on R2-hosted data.
Link related captures
wifi-manage link <id1> <id2> [...] ties multiple captures together
(e.g., several short sessions from the same flight). Patches meta.json
and data.json.gz on R2 to add bidirectional related_captures entries.
Results pages render a "Related" banner with links to sibling captures.
Trim mixed captures
wifi-manage trim <id> splits a capture that spans two environments
(e.g., inflight GEO satellite + post-landing terrestrial WiFi).
Downloads the CSV from R2, filters to a specified time window,
re-uploads as a new capture with correct aggregate stats.
Combined analysis
When captures are linked, a combined analysis script merges their data into a unified view — aggregated stats, merged timelines, and an overall quality assessment across all sessions. This gives a single-flight summary instead of forcing users to mentally piece together 5 separate results pages.
Hide / unhide captures
wifi-manage hide <id> marks a capture as hidden — excluded from
the browse page and listings but still accessible by direct link.
Data stays on R2 (nothing is deleted). wifi-manage unhide <id>
reverses it. The results page also has a toggle button (VPN-only).
Captures shorter than 1 minute are auto-hidden on upload.
macOS App Integration
The macOS app (macos-app/) is a native SwiftUI wrapper that spawns the
Python tool as a subprocess with --json mode. Communication is JSON lines
on stdout (events) and stderr (diagnostics).
The app provides: - SetupView: Configure location, network type, duration before capture - RunningView: Live stats during capture (latency, loss, throughput, RSSI) - ResultsView: Post-capture summary with quality ratings - CapturePickerView: Browse and load past captures from the captures directory - BootstrapView: First-run setup (installs Python dependencies via uv)
The Python side emits 20 JSON event types: info, status, probe,
throughput_status, throughput_result, cdn_throughput_result,
traceroute_result, complete, analysis_complete, upload_start,
upload_complete, upload_error, upload_skipped, upload_summary,
capture_reminder, capture_auto_stop, system_sleep,
network_change, note, error.
Auto-update is handled by Sparkle. publish-update.sh signs the DMG,
generates an appcast, and uploads to howismywifi.com/download/.
How We Compare to Other Tools
Network-quality tools occupy different points in a design space. Here's where WiFi Monitor sits relative to the major reference points, with sources cited so you can verify each comparison.
| Feature | Speedtest (Ookla) | Cloudflare Speed Test | Orb.net | WiFi Monitor |
|---|---|---|---|---|
| Test duration | ~30 sec single-shot | ~30 sec single-shot | continuous (24/7) | 2 min → 30 min → continuous |
| Concurrent multi-protocol | ❌ TCP only | ❌ HTTP only | ❌ HTTPS/h3 only | ✅ ICMP + UDP + TCP + HTTP + QUIC |
| Adaptive duration for high-RTT | ✅ scales to RTT | ⚠️ fixed | ⚠️ fixed | ✅ 4s default, 15s GEO |
| Loaded latency / bufferbloat | ✅ "working latency" | ✅ AIM | ⚠️ implicit | ✅ idle / DL-loaded / UL-loaded + gateway split |
| Packet loss measurement | ❌ not surfaced | ✅ via WebRTC TURN | ✅ over time | ✅ from ICMP seq + UDP |
| Traceroute / path | ❌ | ❌ | ❌ | ✅ every 30s with hop deltas |
| TCP PEP / proxy detection | ❌ | ❌ | ⚠️ implicit (h3 vs HTTPS) | ✅ explicit via QUIC handshake |
| Multi-CDN diversity | ❌ single server | ❌ Cloudflare only | ⚠️ Cloudflare + Fastly | ✅ Cloudflare + Akamai + Netflix OCA |
| Real video-player measurement | ❌ | ❌ | ❌ | ✅ headless Chromium YouTube ABR + startup time |
| Streaming resolution tier (4K/1080p/720p/SD) | ❌ | ❌ | ❌ | ✅ Netflix (OCA) + YouTube (segment + ABR) |
| WiFi signal correlation (RSSI/PHY) | ❌ | ❌ | ❌ | ✅ RSSI + channel + MCS + NSS |
| Composite quality score | ❌ | ✅ AIM (3 scenarios) | ✅ Orb Score 0–100 | ✅ 6-tier per-metric (link-type aware) |
| Designed for satellite | ❌ generic | ❌ generic | ❌ generic | ✅ that is the whole point |
The bold cells are wifi-monitor's distinctive strengths — none of the consumer tools above do them. Bufferbloat and PEP detection are surfaced as named metrics on every results page, and the full-stack video tier (Netflix max bitrate + real YouTube playback) closes the gap between "this link can theoretically do X Mbps" and "this is what streaming actually looks like on it." Our remaining gaps are in summary scoring — RPM as a single-number responsiveness metric, and per-scenario quality ratings (see "Coming Soon" below).
The deepest empirical comparison of these tools is MacMillan et al. (2023), A Comparative Analysis of Ookla Speedtest and M-Lab NDT7. Headline finding: on high-latency links (500–600ms RTT), single-stream short-duration tests under-report throughput by 60–70%. This is the empirical justification for our calibrated multi-stream approach with extended duration on GEO links.
What We Don't Measure (and Why)
Honest disclosure of methodology gaps. These are deliberate choices rather than oversights, and we revisit them as priorities shift.
IPv6 path testing. All probes target IPv4 today. Some inflight satellite networks have IPv6 quirks worth surfacing, but the implementation cost is moderate and inflight v6 is rare at the passenger edge. Tracked as a future enhancement.
PMTU / MTU discovery. Path MTU black-holes are a known inflight gotcha (especially in VPN-in-VPN scenarios — corporate VPN inside satellite VPN). We don't probe for them today. No consumer tool does either, but it's a real diagnostic gap. Tracked.
HLS, DASH-outside-YouTube, and WebRTC measurement. The YouTube ABR probe simulates one real video player end-to-end — startup time, selected resolution, stalls, ABR switches, dropped frames — and the Netflix probe measures real OCA throughput mapped to a resolution tier. What we still don't do: generic HLS rebuffering tests against arbitrary streams (Disney+, Hulu, Apple TV), or WebRTC connection setup (Zoom / Teams / Webex call-quality simulation). The YouTube probe is the closest a passenger-side tool can get without operator cooperation, and it captures the failure modes that matter most for in-flight video. WebRTC simulation in particular is tracked as a roadmap item.
Real-time UDP packet loss (separate from ICMP). Packet loss today is inferred from ICMP sequence gaps. Real-time apps care about UDP loss, which can diverge from ICMP loss on networks that shape ICMP. We could add a UDP echo probe but it requires hosting a small echo server. Deferred until ICMP-shaping is shown to mislead our numbers.
Per-passenger fleet aggregation. howismywifi.com displays individual captures. We don't aggregate across captures to produce "median GEO Viasat capture by airline" or "this route compared to last month." That comparative context is a credibility flywheel and is tracked as a roadmap item.
Single-CDN comparisons across paths. We test multiple CDNs from one device, but we don't compare the same CDN reached from different satellite ground stations or from a single-passenger device vs. gateway-hosted measurements. A research-grade question, out of scope.
Long-term persistent monitoring of a single endpoint. Orb.net does this; we don't. Different tool, different design center.
Coming Soon
Active development, ordered by priority. See
docs/plan-research-improvements-priorities.md
for the full roadmap.
RPM (Round-trips Per Minute) as a responsiveness metric.
RPM = 60,000 / loaded_round_trip_ms. Apple ships it in the OS,
Cloudflare ships it via the mach CLI,
the IETF has standardized it. "Higher is better" framing translates
across link types in a way that raw milliseconds doesn't — a great
GEO satellite link still has a poor RPM, which correctly forecasts that
interactive use will struggle. We'll integrate via mach (with AIM
upload disabled) and surface RPM alongside Mbps.
Per-scenario quality scores. Instead of a single overall rating, "what can this link actually do right now?" — Voice call: ★★★, HD streaming: ★★★★, Gaming: ★★, etc. Calibrated for the link type (a great-for-aviation 200ms RTT GEO link shouldn't always score badly). Scenario list tuned for the actual audience: video conferencing on satellite (Zoom/Teams/Webex), VPN-tunneled work, cloud sync (Slack/Dropbox/iCloud), SSH/remote shell. Reference: Cloudflare AIM scoring and LibreQoS QoO.
Design Considerations
Why Multi-Protocol?
No single measurement tells the whole story:
- ICMP can be filtered, deprioritized, or proxied
- TCP timing reveals proxy/PEP presence invisible to ICMP
- QUIC confirms TCP proxy detection from the opposite direction (UDP-based, bypasses TCP proxies)
- HTTP shows real application-layer experience
- UDP via DNS provides a control for ICMP-specific behavior
- Throughput shows capacity, not just latency
- DNS resolution affects every connection but is often overlooked
The power is in the comparison. When ICMP says 600ms and TCP says 5ms, you've found a PEP. When QUIC says 600ms and TCP says 5ms, the PEP is confirmed — QUIC bypasses the proxy entirely. When ICMP is fine but HTTP spikes, the problem is at the application layer.
Why High Frequency?
Ping and UDP at 5/sec (not 1/sec) because:
- Satellite beam handovers last 0.5-2 seconds — 1/sec sampling aliases these events. 5/sec captures the actual loss duration.
- Latency spikes on congested WiFi are sub-second. 1/sec misses them.
- Statistical reliability: 5x the samples in the same time window.
TCP and HTTP are slower (2s, 5s) because they're heavier: each probe opens a connection and does real work. The trade-off is more detail from lightweight probes, broader coverage from heavyweight probes.
Why Subprocess Ping on macOS?
macOS's ICMP implementation via SOCK_DGRAM (non-privileged sockets, used by
icmplib) introduces measurement artifacts: periodic ~1-second latency
spikes that appear real but are OS-level queuing artifacts. The system
ping binary uses SOCK_RAW (setuid) and doesn't have this issue.
This was a hard-won finding after investigating mysterious periodic spikes
that only appeared on macOS with VPN active. Full investigation in
docs/vpn-periodic-latency-spikes.md.
Why Cloudflare for Throughput?
- Free, no registration or API key required
- Precise byte-level control of download size (
__down?bytes=N) - Global CDN means the server is close to the user (low latency overhead)
- High capacity — won't bottleneck even on fast connections
- Rate limits are generous enough for our round schedule (~900 MB/30 min)
Why Iterative Calibration?
A fixed transfer size either wastes time on slow links (downloading 100 MB on a 3 Mbps hotel connection takes 4.5 minutes) or undersamples fast links (5 MB on a 500 Mbps fiber completes in 80ms, mostly TCP ramp-up).
The iterative approach (5→10→20→40→100 MB until convergence) adapts to the actual link speed in 2-3 rounds, using minimal data on slow links and scaling up only as needed. Total calibration overhead: 15-175 MB depending on speed.
Why Wall-Clock Upload Timing?
When uploading, sample-based timing (recording timestamps as data is generated) measures how fast the OS accepts data into the TCP send buffer, not how fast it actually leaves the machine. On fast local networks, the buffer fills almost instantly while the actual upload takes much longer.
Wall-clock timing (total_bytes / elapsed_time) is correct because
requests.post() only returns after the server acknowledges receipt. The
full round-trip is included in the measurement.
Download doesn't have this problem because iter_content() blocks until
data actually arrives from the network.
References & Further Reading
The methodology choices above draw on a body of network-measurement research and standards. The most useful sources, organized by topic.
Standards & specifications
- IETF draft-ietf-ippm-responsiveness — Responsiveness under Working Conditions. Defines RPM (Round-trips Per Minute), the loaded-latency measurement standard. Apple's
networkQualitytool implements it; Cloudflare'smachimplements it. - RFC 9318 — IAB Workshop on Measuring Network Quality for End-Users — Consensus view from the internet measurement community on what metrics matter and why working latency > idle latency.
- Cheshire IETF 121 slides — Apple's framing of why RPM uses "higher is better" rather than milliseconds.
Comparative analysis
- MacMillan, Mangla, Saxon, Marwell, Feamster (2023) — A Comparative Analysis of Ookla Speedtest and Measurement Lab's NDT7. The single best academic source on how Ookla and NDT7 actually behave at high latency. Quantitative data on adaptive multi-stream behavior.
- CAIDA: Empirical Characterization of Ookla's Speed Test Platform (2024) — Server distribution analysis, server selection behavior, ISP proximity effects.
Bufferbloat
- bufferbloat.net: Tests for Bufferbloat — Catalog of every bufferbloat test tool with methodology notes.
- Waveform Bufferbloat Test — Reference UX for "letter grade for bufferbloat" approach.
- LibreQoS Bufferbloat & QoO Test — Implementation of IETF QoO (Quality of Outcome) per application class.
- Ookla: Introducing a Better Measure of Latency — Explains Speedtest's idle / download-loaded / upload-loaded latency split — the same decomposition we implement.
- Ookla: Loaded Latency and L4S — Why loaded latency is becoming the defining metric for modern networks; L4S/ECN as the next-generation mechanism to keep it low.
Tool methodologies
- Cloudflare: How does Cloudflare's Speed Test really work? (2025) — Best technical explanation of a modern consumer speed test. Documents the block-based "no saturation" methodology.
- Cloudflare AIM scoring — Scenario-specific scoring (Streaming / Gaming / RTC) framework.
@cloudflare/speedtestsource — npm package, MIT license. The default measurement sequence is in the README.- Cloudflare
mach(Rust RPM client) — Open-source CLI for RPM measurement. - FCC Measuring Broadband America — Open Methodology — The SamKnows + FCC methodology. Closest analog to wifi-monitor's design philosophy. Annual technical appendices document the full test suite.
- Orb.net documentation — Most complete public description of Orb's metric definitions and continuous measurement approach.
Inflight-specific
- paxex.aero: Measuring more than Megabits — Inflight internet monitors take the spotlight (2024) — Coverage of NetForecast QMap and the Seamless Air Alliance QoE spec.
Internal documents
docs/research-similar-tools-synthesis.md— Our internal synthesis of how Orb, Speedtest, and Cloudflare work, and what we plan to borrow / reject / leave alone.docs/plan-research-improvements.md— Per-platform implementation plan for the metrics in "Coming Soon" above.