How It Works

How WiFi Monitor measures network performance, and why each measurement works the way it does. This is a living document — update it when probe behavior changes or when questions come up about what the monitor captures.

We publish this in full because the credibility of any network-quality tool comes from its methodology being inspectable. If you can't see what we're measuring and why, you have no basis for trusting the numbers. Cloudflare, the IETF responsiveness draft, and the FCC Measuring Broadband America program all publish their methodologies for the same reason. Our framework is closer to FCC MBA / SamKnows than to Speedtest: characterize a link over minutes-to-hours of multi-protocol probing rather than a 30-second snapshot.

Overview

WiFi Monitor measures a network connection at every layer of the stack, all writing to a single timestamped CSV. The goal isn't "how many probes" — it's coverage: each layer is tested with more than one source so the layers cross-check each other. That's what makes the data tell you things a single-protocol speed test never could (TCP PEPs, ICMP proxies, per-CDN throttling, the gap between "this link could stream 4K" and "this player actually does").

The design center is in-flight WiFi — satellite handover detection, TCP PEP detection, per-CDN rate-limiting policy — but the same probes work on any network.

Coverage by layer

Layer	What we probe	Cross-check / why more than one
WiFi radio (L1/L2)	RSSI, noise floor, channel, channel width, band, MCS index, NSS (spatial streams), PHY mode, TX rate, SSID/BSSID	Correlate signal drops with throughput drops — distinguishes RF problems from network problems.
IP / path (L3)	ICMP ping to default gateway + two internet targets (5/sec); traceroute every 30s	Gateway vs internet split distinguishes local from upstream issues; two targets catch asymmetric routing and per-target rate-limiting.
Transport (L4)	TCP SYN, UDP (DNS-port), QUIC handshake — all to the same target where possible	Three transports against one target reveal PEPs (TCP fast + QUIC slow = TCP proxy), ICMP proxying (ICMP fast + everything else slow), and UDP filtering.
Application protocols (L7)	HTTP HEAD against CDN + obscure "canary" targets; system-resolver DNS	CDN-vs-canary HTTP delta reveals transparent proxies that cache popular sites; system DNS catches captive portals and DNS interception.
CDN diversity	Self-hosted Cloudflare Worker, Cloudflare public speed test, Akamai (dash.akamaized.net), Netflix Open Connect via fast.com	Per-CDN rate-limiting is real and invisible to single-source tests: Netflix may be capped at 2.3 Mbps on a link where Cloudflare sees 10+ Mbps. Four sources surface the policy.
Real video player	YouTube segment probe (yt-dlp, no browser) + YouTube ABR probe (Playwright + headless Chromium, real player); Netflix max-bitrate tier from OCA throughput	Segment probe says what should work; ABR probe says what the player actually does. When they disagree, the link or platform is doing something to the player.

The next two tables show how that coverage maps to each shipping platform — which probes run on which OS, and which features are implementation-complete vs platform-specific.

We deliberately operate at the invasive end of the measurement spectrum: 5/sec ICMP ping to the gateway, multi-protocol probes running concurrently the entire capture, and round-scheduled throughput tests. (The public Cloudflare speed-test tier is held to a 900 MB / 30-min budget to respect its quota; our primary self-hosted server and the Akamai fallback are unmetered — see Throughput.) We accept this link impact in exchange for measurement fidelity. A passive home monitor would be a different tool with a different design center.

WiFi Monitor runs on four platforms. The CLI is the reference implementation; macOS and Windows apps wrap it via subprocess (uv run --json), inheriting all CLI capabilities. The iOS app is a native Swift reimplementation — it covers most probes but not all. On iPad, the app uses an adaptive layout with live latency charts, multi-column stat panels, and wider post-capture result views — closer to the macOS experience than the phone layout.

Probes by Platform

Probe	CLI	macOS App	Windows App	iOS App
ICMP Ping (internet)	✓	✓	✓	✓
ICMP Ping (gateway)	✓	✓	✓	—
UDP DNS	✓	✓	✓	✓
TCP Connect	✓	✓	✓	✓
QUIC Handshake	✓	✓	✓	✓
HTTP HEAD + canary	✓	✓	✓	✓
Speed Test (Worker/CF/Akamai)	✓	✓	✓	✓
Netflix Max Bitrate (OCA)	✓	✓	✓	✓
YouTube Segment (no browser)	✓	✓	✓	—
YouTube ABR (real player, Chromium)	✓	✓	✓	—
DNS (system resolver)	✓	✓	✓	✓
Traceroute	✓	✓	✓	—
WiFi Signal (basic)	✓	✓	✓	—
WiFi Signal (PHY: MCS, width, band)	✓	✓	—	—
Loss Burst Detection	✓	✓	✓	—

iOS uses Akamai-only speed testing. Netflix OCA discovery (via the fast.com API) is implemented on all platforms including iOS. The YouTube probes (which depend on either Playwright + Chromium or yt-dlp) are not implemented on iOS — YouTube streaming measurement is available on CLI, macOS, and Windows.

iOS probe frequencies differ from the CLI — see Probe Schedule by Platform for the full per-probe cadence on each platform.

Features by Platform

Feature	CLI	macOS App	Windows App	iOS App
Adaptive probe backoff	✓	✓	✓	—
Adaptive round scheduler (5/10/30-min)	✓	✓	✓	mode-based
On-demand speed test (`s` key)	✓	✓	✓	—
On-demand YouTube test (`y` key)	✓	✓	✓	—
Observed throughput	✓	✓	✓	—
Upload to R2	✓	✓	✓	✓
Privacy transform	✓	✓	✓	✓
Analysis charts (interactive Plotly)	✓	web	web	web
Capture history	—	✓	✓	✓
Live latency charts	—	✓	—	iPad only
iPad adaptive layout	—	—	—	✓

Probe Schedule by Platform

Every probe runs on its own cadence. Fast, lightweight probes (ICMP, UDP) sample several times a second to catch sub-second events like satellite beam handovers; heavier probes (TCP, QUIC, HTTP, speed tests) run less often because each one does real connection work. The interactive Probe Schedule page has the live, generated cadence and the full round-by-round timeline; the table below is a platform summary.

The CLI is the reference cadence, and the macOS and Windows apps shell out to the same Python — so they match the CLI exactly. iOS is a native reimplementation that runs a lighter schedule to protect the battery and the cellular radio. All numbers below are the default (active) profile.

Probe	CLI · macOS · Windows	iOS
ICMP ping	5 / sec (every 0.2s)	1 / sec
UDP (DNS-port)	5 / sec (every 0.2s)	1 / sec (3-target rotation)
TCP connect	every 2s	every 15s
QUIC handshake	every 5s	every 15s
HTTP HEAD	every 5s	every 15s
DNS (system resolver)	every 30s	every 30s
Traceroute	every 30s	not implemented
WiFi signal	every 2s	not available on iOS
Observed throughput (interface counters)	every 2s	not implemented
Speed test (throughput)	scheduled in measurement rounds — a speed test anchors most rounds, plus a second Akamai-backend test for CDN cross-validation; plus on-demand (`s`)	scheduled in measurement rounds (Quick / Standard / Extended tiers)
Netflix max bitrate (OCA)	scheduled in measurement rounds; short captures collect fewer	scheduled with the speed tests
YouTube bitrate (segment probe)	scheduled in measurement rounds; short captures collect fewer	not implemented
YouTube resolution (ABR: adaptive + forced)	scheduled in measurement rounds (the heaviest probe, so least often) + on-demand (`y`)	not implemented
YouTube Shorts (short-form QoE)	scheduled in measurement rounds — a standard session plus a longer “deep” round; default-on	not implemented

Passive profile (--profile passive, CLI only). For unattended long-run monitoring the CLI can drop to a much wider cadence: ping 2s, UDP 5s, TCP 30s, QUIC 60s, HTTP 60s, DNS 120s, traceroute 300s, WiFi 10s, and a speed test every 2 hours instead of every few minutes. The desktop and mobile apps always use the active profile.

Video / streaming cadence

“How often does the video test run?” comes up a lot, and there isn’t one number — “video” is really four separate tests (Netflix bitrate, YouTube bitrate, YouTube resolution, and YouTube Shorts) that each run on their own schedule, woven into the capture’s measurement rounds so they don’t all fire together. Short captures collect fewer samples of each; a continuous in-flight capture repeats the 30-minute cycle for as long as it runs, so the longer you record, the more video samples you collect. Exact per-cycle counts are generated from the scheduler on the Probe Schedule page and summarized in the round schedule below — this prose stays deliberately count-free so it can’t drift.

Streaming probe	What it does	Cadence	Platforms
Netflix max bitrate (OCA)	Download throughput from the real Netflix Open Connect servers, mapped to a resolution tier	Scheduled in measurement rounds	All, including iOS
YouTube bitrate probe (segment, ~8s)	`yt-dlp` downloads real media segments and times them (no browser) → achievable bitrate and recommended resolution	Scheduled in measurement rounds	CLI · macOS · Windows
YouTube resolution probe (ABR, ~50s)	Real headless-Chromium player — runs an adaptive (auto) phase then a forced-resolution phase to see what the player actually does	Scheduled in measurement rounds, only when Chromium is installed	CLI · macOS · Windows
YouTube Shorts probe (short-form QoE)	Real headless Shorts feed — swipes through clips and reads comments to measure rebuffering and quality on short-form video; a standard session plus a longer “deep” round	Scheduled in measurement rounds, only when Chromium is installed	CLI · macOS · Windows

So a short desktop capture collects a sample or two of each video test while a longer capture collects several; a continuous flight repeats the cycle for as long as it runs. iOS measures Netflix bitrate but runs no YouTube probe at all — the YouTube probes (segment, ABR, and Shorts) depend on yt-dlp and headless Chromium, neither of which is available on iOS. On macOS and Windows the heavier YouTube ABR and Shorts probes only fire if the optional Chromium download has been installed; the lighter segment probe always runs. See Netflix Max Bitrate and YouTube for the per-probe detail, and the Probe Schedule for the live, generated cadence.

Concurrency model

Probes do not take turns — they run concurrently for the entire capture, all writing to one timestamped CSV. This is deliberate: we operate at the invasive end of the measurement spectrum and accept the link impact in exchange for fidelity.

CLI · macOS · Windows. Every probe is an independent task in a single async event loop. The lightweight probes (ping, UDP, TCP, QUIC, HTTP, DNS, traceroute, WiFi) each tick on their own interval in parallel and never block one another; within a single HTTP round all the URLs fire concurrently. Speed-test and streaming rounds run alongside the lightweight probes rather than pausing them — the only interaction is natural bandwidth contention, which is itself part of what we measure (loaded vs. idle latency).
iOS. Ping and UDP run as their own continuous tasks at 1/sec. TCP, QUIC, HTTP, and DNS share a single sequential diagnostic loop that cycles about every 15 seconds (DNS every other cycle), and speed tests run on a separate scheduled task. Batching the heavier probes into one loop is the battery/radio compromise that lets a phone sustain a long capture.

Probe Types

ICMP Ping

Property	Value
Frequency	5/sec (0.2s interval)
Targets	8.8.8.8, 1.1.1.1
+ Gateway	Auto-detected default gateway
Timeout	interval + 2s
Logged as	`ping`

What it measures: Base round-trip time and packet loss. The most fundamental network measurement — ICMP echo request/reply with no application-layer overhead.

Implementation: On macOS/Linux, uses the system ping binary (setuid, SOCK_RAW). This is deliberate: macOS's SOCK_DGRAM ICMP handling (used by icmplib's non-privileged mode) causes ~1-second quantized latency spikes under concurrent load. The system ping avoids this artifact entirely.

Falls back to icmplib on Windows or when the system ping binary is unavailable. The icmplib path uses a fire-and-forget pattern: pings are sent at strict intervals regardless of RTT, allowing multiple pings in-flight simultaneously. This matters on GEO satellite links (~600ms RTT) where a synchronous ping-wait-ping approach would reduce effective frequency.

Gateway ping: Pings the default gateway (usually 192.168.1.1) to distinguish local WiFi issues from internet path issues. If a spike appears in gateway ping simultaneously with internet ping, the problem is local. Auto-disables after 20 consecutive losses (some networks block ICMP to the gateway).

Why two internet targets: If one target has issues (Google's 8.8.8.8 occasionally rate-limits ICMP), the other provides a control. Also detects asymmetric routing — different targets may take different paths.

UDP (DNS)

Property	Value
Frequency	5/sec (0.2s interval), round-robin across the targets
Targets	8.8.8.8 (Google), 1.1.1.1 (Cloudflare), 9.9.9.9 (Quad9)
Timeout	2s
Logged as	`udp` (with the resolver IP in the `target` column)

What it measures: UDP round-trip time and packet loss, independent of ICMP. Sends a minimal DNS A-record query and times the response. This is a real UDP round-trip, not just an ICMP echo — and it now runs on every platform including iOS.

Why it matters: Some networks treat ICMP differently from real traffic — lower priority, rate limiting, or filtering. UDP probes via DNS provide a second latency measurement using actual data-plane traffic, and a real UDP loss signal (PEP doesn't repair UDP loss, so this stays honest on GEO where TCP/ICMP loss is masked). On networks with ICMP deprioritization, you'll see UDP RTT match TCP/HTTP while ICMP shows higher latency. UDP loss + RTT/jitter feed the Voice MOS estimate (see Voice MOS and Jitter below).

Why three resolvers (target diversity): A multi-capture sweep found 1.1.1.1 is destination-flaky (17 UDP + 23 TCP partial blocks, clustered in corporate/hotel firewalls) while 8.8.8.8 is never UDP-blocked. The third resolver keeps a healthy two-target quorum when one is dropped, so the UDP verdict never rests on a single survivor. The choice is IP-agnostic: the analyzer de-weights whichever target is flaky relative to a healthy sibling, rather than hardcoding any resolver's reputation.

Per-target handling: Because the three resolvers sit at different network distances (e.g. Quad9 is often ~30ms farther from a given location than Google/Cloudflare), stats are computed per target, then combined by metric meaning: loss is a cross-target consensus (isolates the access link from one resolver's flakiness), while delay and jitter come from a single representative resolver — never the blended round-robin stream, which would manufacture inter-target jitter and a pessimistic median. Per-target percentiles are exposed in proto_stats.udp.targets.

Implementation: Builds a raw DNS query packet (A record for www.google.com), sends it over SOCK_DGRAM to port 53, and verifies the transaction ID in the response. No DNS library needed — the query is ~30 bytes.

TCP Connect

Property	Value
Frequency	Every 2s
Targets	8.8.8.8:443, 1.1.1.1:443, cloudflare-quic.com:443
Timeout	5s
Logged as	`tcp`

What it measures: TCP SYN-ACK time — the time for a TCP three-way handshake to complete. Connects to port 443 and immediately closes.

Why it matters: This is the key probe for detecting TCP Performance Enhancing Proxies (PEPs). On satellite connections, airlines often deploy transparent TCP proxies that intercept TCP connections and respond with a local SYN-ACK, making TCP appear much faster than the actual satellite round-trip. Compare:

ICMP: 600ms (real satellite RTT)
TCP: 5ms (proxy responding locally)

This gap is the signature of a TCP PEP. Without the multi-protocol comparison, you'd never know the proxy exists.

TCP retransmit behavior: On packet loss, TCP retransmits after 1 second (the initial RTO). This means TCP loss events show as 1000ms+ spikes rather than the gaps you see in ICMP. This difference is itself informative — it shows how TCP's loss recovery affects application performance.

QUIC Handshake

Property	Value
Frequency	Every 5s
Targets	google.com:443, cloudflare.com:443, cloudflare-quic.com:443
Timeout	10s
Logged as	`quic`

What it measures: QUIC handshake time — a full UDP + TLS 1.3 connection setup negotiating HTTP/3 (ALPN h3). Connects to port 443 and immediately closes after the handshake completes.

Why it matters: QUIC uses UDP, not TCP. This makes it the best probe for detecting TCP Performance Enhancing Proxies (PEPs) — the same proxies that make TCP SYN look artificially fast can't intercept QUIC because it's UDP-based. Compare:

TCP SYN: 5ms (proxy responding locally)
QUIC: 600ms (actual satellite round-trip, proxy can't help)

This gap is the inverse of the ICMP/TCP gap and confirms the proxy hypothesis from a different angle. QUIC is also increasingly the protocol passengers actually use — YouTube, Google services, and Cloudflare all default to QUIC/HTTP/3 when available.

Why hostnames instead of IPs: QUIC requires TLS SNI (Server Name Indication), so raw IPs won't work. DNS resolution is included in the first probe of each round, which is intentionally realistic — it measures what a real QUIC connection costs.

Expected RTT vs TCP SYN: QUIC handshake is typically 1.5-3x TCP SYN time on the same path because it includes TLS 1.3 key exchange (one additional round-trip). On proxied satellite links, the ratio inverts dramatically since the proxy only accelerates TCP.

Networks that block QUIC: Some enterprise/airline networks block UDP on port 443, forcing fallback to TCP/HTTP/2. The QUIC probe will show 100% loss on these networks — this is itself useful data, confirming that the network actively filters QUIC traffic.

Implementation: Uses the aioquic library (optional dependency). If not installed, the probe is silently skipped. A monkey-patch suppresses harmless ValueError noise from aioquic's StreamWriter cleanup of server-initiated push streams.

HTTP HEAD

Property	Value
Frequency	Every 5s
Targets	google.com, cloudflare.com, fast.com, dash.akamaized.net + proxy canaries
Timeout	12s
Logged as	`http`

What it measures: Full-stack latency: DNS resolution + TCP handshake + TLS negotiation + HTTP request/response. This is what users actually experience when loading a webpage.

Proxy canary targets: In addition to CDN targets (Google, Cloudflare, fast.com, Akamai), the probe tests obscure "canary" sites (httpbin.org, icanhazip.com, example.com). On startup, it calibrates which canaries are reachable, then selects one HTTP and one HTTPS candidate.

The idea: transparent proxies often cache/accelerate popular CDN content but pass through traffic to obscure sites unmodified. By comparing CDN latency vs canary latency (cdn_delta_ms in the log), you can detect: - Positive delta: Canary is slower — normal, canary is further away - Negative delta: Canary is faster than CDN — suggests CDN content is being intercepted and re-served (proxy artifact) - Large delta on HTTP but not HTTPS: Port-80-only transparent proxy

Response header inspection: Logs Via, X-Cache, X-Cache-Hits, X-Forwarded-For, and Server headers — additional proxy fingerprints.

Session reuse: Uses a persistent requests.Session so TLS sessions are reused after the first request. This means subsequent measurements reflect steady-state latency, not cold-start TLS negotiation.

Concurrency: All HTTP probes in a round fire concurrently via asyncio.gather(). This prevents slow canary sites from delaying CDN measurements.

Throughput

Property	Value
Frequency	Adaptive round schedule (see below) + on-demand `s` key
Backend	3-tier: self-hosted Worker → Cloudflare → Akamai
Streams	4 parallel TCP connections
Logged as	`throughput_down`, `throughput_up`, `calibration`

What it measures: Download and upload speed via multi-stream HTTP transfers. Saturates the link (like Ookla) to measure true capacity.

Backend fallback

Speed tests use a 3-tier backend priority system:

Self-hosted Worker (primary): Download endpoint at howismywifi.com/speed-test/download?bytes=N — our Cloudflare Worker streaming zeros via ReadableStream. Zero egress cost, no rate limiting, no bot detection. Upload goes to our existing upload-signer Worker.
Cloudflare (fallback): speed.cloudflare.com/__down?bytes=N and __up. Free, no API key, reliable. Falls back here if the Worker is unreachable.
Akamai (last resort): dash.akamaized.net test files. Activated after consecutive Cloudflare 429/403 errors (shared NAT IPs on satellite links trigger per-IP rate limits).

Each tier tracks consecutive failures independently. After a threshold (3 failures on terrestrial, 1 on GEO satellite — shared NAT IPs are rate-limited too aggressively to retry), the next tier becomes active. Every 30 minutes, blocked tiers are re-probed for recovery. The active backend is logged with each speed test result (backend=worker|cloudflare|akamai).

Calibration (runs once at startup)

Before the first throughput test, a two-step calibration determines how to size transfers:

Latency probe (1KB download): Measures time-to-first-byte to classify the link. >400ms = GEO satellite, everything else = terrestrial. This sets the test duration: 4s for terrestrial links, 15s for GEO satellite (TCP slow-start takes ~7 RTTs = 4.2s to fill the congestion window at 600ms RTT).

PEP-aware GEO detection: On PEP'd satellite links, the TCP latency probe may return ~80ms (the proxy's local response) instead of the true ~700ms satellite RTT, causing GEO misclassification. To counter this, GEO detection is an OR-latch: if any signal indicates GEO (calibration TTFB, QUIC handshake >400ms, or ICMP median >400ms), the link is classified as GEO and never downgraded. QUIC is the strongest signal since it uses UDP and bypasses the PEP entirely.

Speed calibration (iterative): Downloads and uploads at increasing sizes (5, 10, 20, 40, 100 MB) until the estimate converges (<20% change between rounds). Slow links converge quickly (2 rounds); fast links iterate further. Calibration is capped at 120s of wall-clock time (≈175 MB of transfers on the terrestrial sizes). Download and upload converge independently — once one direction converges, it stops probing.

The calibrated speed determines transfer size: speed * duration / streams bytes per stream, clamped to 500KB-50MB per stream.

Loss-aware early exit: On severely lossy links (sustained ICMP loss above ~8%, or self-observed throughput variance above 40%), calibration exits after round 2 instead of grinding through all five sizes. The extra rounds rarely converge on a noisy link and only waste data budget. The completion record carries an early_exit flag so analysis can distinguish "this is the converged value" from "this is the best estimate we got before bailing."

Adaptive Round Schedule

The scheduled probes are organized into rounds — each round runs a fixed template of work (a speed test anchors most rounds; the heavier streaming probes — Netflix, YouTube segment/ABR, and YouTube Shorts — are spread across the others, with a second speed-test backend, Akamai, for CDN cross-validation). Rounds are front-loaded so the most useful coverage lands early and a short capture still gets a representative set; the number of rounds scales with the capture duration. The schedule below is generated from scheduler.py, so it can’t drift from what the tool runs — see the interactive Probe Schedule for the full timeline and per-probe cadence.

The scheduled probes run in rounds — non-overlapping bursts spaced across the capture by the wave scheduler (Option E — front-loaded rounds). The 30-min continuous cycle is Full · Var+ · Full · Var · Deep (5 rounds), front-loaded so the heaviest coverage lands early; short captures drop rounds from the end. These numbers are generated from scheduler.py. See the interactive Probe Schedule for the full timeline.

Capture length	Rounds (GEO satellite)	Rounds (terrestrial / LEO)
5 min	1 — Full	2 — Full, Var+
10 min	3 — Full, Var+, Full	4 — Full, Var+, Full, Var
30 min / continuous	5 — Full, Var+, Full, Var, Deep	5 — Full, Var+, Full, Var, Deep

Per-cycle round composition (GEO durations — the binding link; terrestrial is faster):

Round	Probes	Duration
Full	Speed test (Cloudflare) → Netflix OCA → YouTube segment → YouTube ABR	1m41s
Var+	Speed test (Cloudflare) → Akamai speed test → YouTube Shorts	4m5s
Var	Akamai speed test → YouTube Shorts	3m32s
Deep	Speed test (Cloudflare) → YouTube ABR → YouTube Shorts (deep)	7m23s

Per 30-min cycle (GEO): Speed test (Cloudflare) ×4, Akamai speed test ×2, Netflix OCA ×2, YouTube segment ×2, YouTube ABR ×3, YouTube Shorts ×2, YouTube Shorts (deep) ×1.

Rounds are front-loaded across the available time rather than evenly spaced. Live displays and JSON events include round_num / total_rounds so the apps can show the current round (e.g. “Round 2 of 5: testing download…”) — the user sees what’s happening and when the capture will finish.

On-demand triggers: The CLI and apps support manual test triggering during a capture without breaking the round schedule:

s — run a speed test immediately
y — run a YouTube test immediately

Useful for "the connection just got worse, test it again" without having to start a new capture.

Passive profile: Long-running passive captures (not the default) keep the heavy probes on a wider cadence — speed tests every 2 hours instead of every round — to limit data usage for unattended monitoring.

Out-of-Family Detection

If a throughput result deviates >50% from the median of the last 5 results, it's flagged as "out of family" and a retest is scheduled 90 seconds later. Max 1 retest per round. Retests never trigger further retests.

This catches transient anomalies — a single bad result gets verified before it skews analysis. The 90-second delay is long enough for transient congestion to clear but short enough to catch real changes.

Data Budget

The public speed.cloudflare.com endpoint enforces a ~1 GB per-session quota (exceeding it triggers 429/403 responses). To stay under it we track a 900 MB data budget per 30-minute window. This budget applies only to that public Cloudflare tier — our primary self-hosted download server and the Akamai fallback have no such quota. When the budget is exhausted we simply stop using the public Cloudflare tier for the rest of the window (download tests continue on the other servers); the counter resets each window.

On rate limit (429 or 403): exponential backoff (30s, doubling up to 120s) AND transfer size is halved. On success, backoff decays gradually (halved each success) rather than resetting instantly — prevents immediately re-triggering the limit.

Measurement Method

Download: Streaming with per-chunk samples. Uses raw throughput (total_bytes / wall_clock_time). The warmup-discard approach (dropping first 30% of samples) was found to overcorrect on satellite links due to bursty beam-scheduled delivery.
Upload: Wall-clock timing only. Sample-based timing measures TCP buffer fill speed (how fast the OS accepts data), not actual network throughput. Wall-clock timing includes the time for the server to acknowledge receipt.
PEP burst guard: On PEP'd satellite links, data arrives in bursts — the proxy prefetches from the server at ground speed and delivers from its buffer over the satellite downlink. Without correction, the steady-state algorithm can compute throughput over a 0.6s burst window and report 445 Mbps on a 24 Mbps link (18x inflation). The burst guard compares raw throughput (total_bytes / wall_clock) to steady-state; when the ratio exceeds 3x, the raw value is used instead.
Untrusted-upload floor (flight-scoped): A very short upload can finish before it ever leaves the local buffer, reporting a meaningless number. The analyzer nulls an upload sample to unknown only when the capture is a flight and the upload's wall-clock window was under 2 seconds — a window that short is buffer-fill regardless of how it was timed. The guard is deliberately flight-scoped: a blanket sub-2s floor over-nulls valid terrestrial uploads (fast CLI/macOS tests legitimately finish a fixed-byte transfer in under 2s).
Upload plausibility cap (flight-scoped): Even with a valid-length window, a PEP buffer that drains slowly can report an uplink rate the link can't physically sustain. On a flight the analyzer rejects any upload window above the tier ceiling — roughly 3 Mbps on GEO, 80 Mbps on LEO/Starlink — and reports upload as unknown rather than an inflated number. Off-flight there is no cap (a fast terrestrial uplink legitimately exceeds these).
Raw samples: Per-chunk timing data is written to a sidecar CSV (*_throughput_raw.csv) for TCP ramp-up analysis. Fields: timestamp_utc, test_id, direction, sample_num, elapsed_ms, bytes.

Netflix Max Bitrate (OCA)

Property	Value
Frequency	Scheduled in measurement rounds (see the Probe Schedule for the live cadence)
Targets	Netflix OCA (via fast.com API), Akamai (dash.akamaized.net)
Timeout	45s wall-clock cap
Logged as	`cdn_throughput`

What it measures: Download throughput from the same Netflix Open Connect Appliance (OCA) servers that serve Netflix video to real users. Measured Mbps is mapped to a streaming resolution tier so the result answers a question users actually have — "would Netflix play in HD on this link?" — instead of a raw number that requires interpretation.

Resolution tiers: Based on Netflix's published encoding bitrates:

Tier	Measured throughput
4K	≥ 8 Mbps
1080p	≥ 4 Mbps
720p	≥ 2 Mbps
SD	< 2 Mbps

Why it matters: Different CDNs are treated differently by in-flight networks. Airlines commonly deploy per-CDN rate limiting — Netflix OCA traffic may be capped at 2.3 Mbps on GEO satellite while Cloudflare sees 10+ Mbps on the same link. The Netflix probe surfaces these policies directly: it's not enough to know the link can do 25 Mbps if Netflix itself is shaped to 720p.

Netflix OCA discovery: The probe fetches the fast.com page, extracts the API token from the embedded JS bundle, then calls api.fast.com/netflix/speedtest/v2 to get OCA URLs. If an OCA URL expires during a long capture (returns errors), the probe re-discovers a fresh URL automatically. Akamai (dash.akamaized.net) is tested in parallel as a generic CDN comparison point.

Segment sizing: Download size adapts to link latency (from calibration): 10 MB on GEO satellite (>400ms RTT), 5 MB on medium latency, 2 MB on low latency. Larger segments on high-latency links ensure enough data to get past TCP slow-start. Speed test fallback: after 3 consecutive Cloudflare blocks, the throughput probe also falls back to Akamai for the remainder of the round.

Measurements logged: Throughput (Mbps), resolution tier, time-to-first-byte (TTFB), DNS resolution time, bytes transferred, content type.

YouTube (real-player ABR + segment probe)

Property	Value
Frequency	Bitrate (segment) about every 10 min; resolution (ABR) about every 15 min; also on demand via `y` key
Target	YouTube (rotating set of long-form public videos)
Logged as	`youtube_segment`, `youtube_abr`

What it measures: Real video streaming performance — not just whether the link could support YouTube, but what YouTube actually does on it. This is the full-stack credibility test: ICMP and throughput can look fine while a real player still struggles with startup or quality selection. The probe runs in two stages: a fast segment-level measurement, and (when Chromium is available) a real headless player.

Segment probe (no browser, ~8s)

Uses yt-dlp to resolve a video's DASH/HLS manifest, then downloads four real media segments over HTTP and times each one. Reports per-segment throughput, time-to-first-byte, CDN host, codec, and container format — plus a recommended streaming resolution:

Recommended tier	Required throughput (incl. 30% headroom)
4K	≥ 9 Mbps
1440p	≥ 4 Mbps
1080p	≥ 1.6 Mbps
720p	≥ 0.8 Mbps
480p / lower	< 0.8 Mbps

The segment probe runs on every YouTube test slot and ships with the Python CLI / macOS / Windows apps with no Chromium install required. It is the data behind the "Recommended quality" chip on the results page.

ABR probe (Playwright + headless Chromium, ~50s)

When Chromium is installed (offered on first run; bundled as an opt-in download in the apps), the ABR probe launches a real headless YouTube player and runs two phases:

AUTO phase: let YouTube's ABR algorithm pick the quality on this link.
FORCED phase: ask the player for the resolution the segment probe said it could support, and see what actually happens.

Per phase, the probe reports cold startup time (page navigation to first frame), the resolution actually rendered, dropped frames, stall count and total stall duration, ABR quality switches, codec, container, and player-reported bandwidth and buffer health. An ad_seen flag records whether YouTube served an ad during startup — important because ads can mask or skew the cold-start measurement, and the flag flows all the way through to the results page so post-hoc analysis can filter.

Video rotation and fallbacks: The probe maintains a list of long-form public videos and rotates through them; if a video is unavailable (regional restriction, takedown, geo-blocked OCA), the next one in the list is tried. The full fallback loop is exercised in tests so a single bad video can't silently kill the probe.

Why two probes: The segment probe is fast, dependency-free, and gives a portable "what should work" number. The ABR probe is slower and heavier but tells you what a real player actually does, including startup latency that no throughput test can capture. Both numbers appear side-by-side: when they agree the link is well-behaved; when they disagree (segment says 1080p, ABR locks to 480p), the network or platform is doing something to the player itself.

Platforms: CLI, macOS, Windows. iOS does not run either probe (Playwright/Chromium and yt-dlp are not available on iOS); the iOS app focuses on link characterization only.

YouTube Shorts (short-form QoE)

Property	Value
Frequency	Scheduled in measurement rounds — a standard session plus a longer “deep” round; default-on when Chromium is installed
Target	YouTube Shorts feed (rotating public clips)
Logged as	`youtube_shorts`, `youtube_shorts_deep`
Platform	CLI, macOS, Windows

What it measures: Short-form vertical video is a different quality-of-experience profile from long-form YouTube — a swipe-linked sequence of autoplaying clips with aggressive cross-clip prefetch — so it gets its own probe. Driving a real Shorts feed in headless Chromium, it measures per clip: the delivered bitrate / resolution (the dominant discriminator, surfaced as “Connection speed” on the results page), the comments-open latency (tap comments → first comment painted), the swipe → first-frame time (cold-start on the first swipe, recorded separately from prefetch-masked steady state), the incoming clip's prefetch buffer, and stalls per clip.

Why it matters: Short-form is what a lot of passengers actually watch, and it stresses the link differently — many small autoplaying starts rather than one long buffered stream. A link that streams a long video fine can still feel sluggish swiping Shorts, and the comment-open latency captures a purely interactive delay no throughput test sees.

Device-representative demand: The probe renders at a phone's pixel density (device scale factor 3), so YouTube's player requests the resolution and bitrate a real phone would — earlier low-density runs under-demanded and pinned an artificially narrow frame. Because portrait Shorts resolution is capped by the viewport, the quality tier is judged off the frame height, not the narrow width, so a normal portrait clip is never mistaken for a degraded link.

Status: Shipped and default-on, but the composite QoE verdict rides on stall thresholds that ground and lab testing never exercised (nothing starved the buffer). Until a real GEO/LEO flight calibrates them, the analyzer marks the verdict uncalibrated and the results page withholds the verdict chip while still showing every raw metric.

DNS

Property	Value
Frequency	Every 30s
Target	www.google.com
Method	System resolver (getaddrinfo)
Logged as	`dns`

What it measures: End-to-end DNS resolution time as applications experience it. Uses the system resolver, which includes OS DNS cache effects.

Why system resolver (not raw DNS): The UDP probe already measures raw DNS transport latency to 8.8.8.8 and 1.1.1.1. This probe captures what applications actually see — including local caching, search domain expansion, and any DNS interception by the network. On networks with captive portals or DNS-based filtering, this probe shows the real behavior.

Why 30s interval: DNS results are heavily cached. More frequent probing would just measure cache hits. 30 seconds balances between capturing DNS changes and not flooding the resolver.

Traceroute

Property	Value
Frequency	Every 30s
Target	8.8.8.8
Max hops	20
Logged as	`traceroute`

What it measures: The network path — every router hop between the device and the target, with per-hop latency, and the network (ASN) that owns each hop.

Why it matters: Path changes correlate with performance changes. On satellite connections, a path change can indicate a beam handover or gateway switch. On terrestrial networks, path changes may indicate routing convergence events. The hop count itself is informative — satellite links typically show fewer hops (device → gateway → satellite → ground station → internet).

Implementation: Uses the system traceroute command on macOS/Linux (-n -q 1 -w 2 -m 20 — numeric, one probe per hop, 2s timeout, max 20 hops). Falls back to icmplib on Windows. Non-responding hops (* * *) are skipped in the output.

First hop tracking: The first responding hop is logged separately (first_hop=) because it identifies the local network gateway and can change if the device roams between access points.

Path classification (on-net vs. transit): Each responding hop is tagged with the network that owns it, using an offline IP→ASN database bundled with the analyzer (no lookups leave the device). Hops belonging to the satellite operator's own network (and its regional partners) are flagged, and the path is summarized as reaching the destination directly (a peering edge) versus crossing one or more third-party transit networks first. On satellite this tells you whether traffic stays on the operator's own backbone or hands off to the public internet early. It is honest topology only — not a claim about where content is served — and the offline database has coverage gaps (some operator hop ranges aren't mapped), so the absence of a flagged hop is not proof of an off-net path.

Egress network (newer captures): Because traceroute alone can't always prove operator membership, newer captures also record the connection's public egress IP and classify it against the operator's address space — a direct on-net signal that doesn't depend on every hop being mapped.

Directional Loss & Latency

Property	Value
Frequency	~1 packet/s baseline (jittered) + periodic micro-bursts
Target	Hosted reflector (regional fixed-IP pool)
Transport	UDP test stream + a short TCP control handshake
Logged as	`directional`
Platform	macOS, iOS, CLI

What it measures: Which direction a problem is on — separating the uplink (device → server) from the downlink (server → device) for both packet loss and latency. Every other probe on this page reports a round-trip number, so a lost reply is ambiguous (was the request lost going up, or the reply coming back?) and an RTT spike never says which leg queued. This probe resolves both.

Why it matters: On GEO/LEO satellite the uplink and downlink are physically separate channels that congest independently — and they need completely different fixes. Aggregate RTT and loss hide which one is hurting; the direction turns “the connection is bad” into “the uplink is dropping 5% while the downlink is clean.”

How it attributes loss to a direction: The probe sends numbered UDP packets to a lightweight reflector we host. The reflector stamps each reply with its own independent sequence number before echoing it back. With two sequence streams — the client's and the reflector's — a missing packet is no longer ambiguous:

Uplink loss — the reflector never saw the packet (no gap in its own sequence numbers), so it was lost on the way up.
Downlink loss — the reflector did reflect it (its sequence advanced) but the reply never arrived, so it was lost on the way down.

Why UDP, not TCP (the PEP problem): This only works over UDP. As PEP Detection explains, satellite links run a TCP proxy that silently retransmits lost uplink packets before your device ever learns they were dropped — over TCP, uplink loss simply disappears into a latency tail. UDP has no such proxy, so the reflector's server-side view cleanly separates the two directions. This was validated in a lab (netem-injected loss behind a real PEP) and confirmed on a live GEO link: UDP recovers per-direction loss, TCP masks it.

Directional latency & jitter: The same four timestamps (client send, reflector receive, reflector send, client receive) also give a relative per-direction latency — how much extra delay each leg saw versus its own best case — without synchronized clocks, because the unknown clock offset cancels the moment you subtract two measurements on the same link. Absolute one-way delay is deliberately not attempted (clock error would swamp it). We surface per-direction loss %, per-direction delay-variation (jitter), and the relative latency deltas.

Security & hosting: The reflector is a small fixed-IP service (a regional pool; the client pins one endpoint for the whole session). A short TCP handshake issues a per-session token so the reflector only echoes to a verified client — it can't be abused to bounce spoofed traffic at a third party — and replies are padded to the request size, so there is no amplification. No shared secret is required.

Status: Shipped as a first version; the design and thresholds are expected to evolve as we gather in-flight data.

WiFi Signal

Property	Value
Frequency	Every 2s (active; 10s on passive)
Platform	macOS, Windows
Logged as	`wifi`

What it measures: RF and PHY-layer metrics from the WiFi adapter:

RSSI — signal strength in dBm
Noise floor — ambient noise in dBm
SNR — signal-to-noise ratio (RSSI minus noise)
Channel — WiFi channel number
Channel width — 20, 40, 80, or 160 MHz (macOS only)
Channel band — 2.4 GHz, 5 GHz, or 6 GHz (macOS only)
PHY mode — 802.11a/b/g/n/ac/ax, i.e., WiFi 4/5/6 (macOS only)
TX rate — transmit rate in Mbps
MCS index — modulation and coding scheme index (macOS only)
NSS — number of spatial streams, i.e., MIMO configuration (macOS only)
SSID and BSSID — network name and access point MAC address

Why it matters: Correlating signal metrics with performance metrics reveals whether issues are RF-related (weak signal, channel congestion) or network-related (routing, congestion, satellite handover). A latency spike that coincides with an RSSI drop is likely a local issue; one without signal change points to the upstream path.

The extended PHY metrics (channel width, band, PHY mode, MCS, spatial streams) are particularly useful for diagnosing why throughput is lower than expected. For example: connected on WiFi 6 (ax) but only 20 MHz channel width and 1 spatial stream explains a 100 Mbps ceiling. Or: PHY mode dropped from ac to n mid-capture, indicating the adapter fell back to a slower standard (often due to interference or range).

macOS implementation: Primary method is a compiled Swift binary using CoreWLAN framework (works on macOS Sequoia 15+, where the deprecated airport command was removed). The binary is compiled once on first run and cached at a hash-versioned path so source changes trigger recompile. Falls back to the airport -I command on older macOS (which does not provide channel width, band, PHY mode, MCS, or spatial stream data).

MCS index and spatial stream count are read via undocumented KVC (mcsIndex, numberOfSpatialStreams) on the CWInterface object. These keys have been stable across macOS versions but are not part of the public CoreWLAN API.

The macOS app can provide a pre-compiled binary via the WIFI_INFO_PATH environment variable, avoiding the first-run compilation step.

Note: macOS Sequoia redacts SSID and BSSID without Location Services entitlement. The app captures what's available.

Windows implementation: Parses netsh wlan show interfaces output. Signal percentage is converted to approximate dBm: (pct / 2) - 100. Extended PHY metrics (channel width, band, MCS, spatial streams) are not available via netsh.

Loss Burst Detection

Runs in the background, scanning new CSV rows every 30 seconds for consecutive ping losses. When 5 or more consecutive ICMP losses are detected, a loss_burst row is logged with:

burst_size: Number of consecutive lost packets
duration_ms: Approximate burst duration (count * 200ms at 5/sec rate)
utc_second: Position within the UTC second when the burst started

The UTC second alignment is designed for satellite handover analysis: GEO satellites use 15-second beam cycles, so loss bursts that align to 15-second boundaries suggest handover events rather than random packet loss.

Uses incremental file reads — tracks file position between scans to avoid re-reading the entire CSV on long captures.

Loss Burstiness (Analysis)

Beyond live burst detection, the analyzer characterizes how loss is distributed in time — random dribble and tight bursts read very differently to a user. It computes a per-capture loss_burstiness summary (steady-state baseline loss, % of time loss-free, and what share of all lost packets fall in the worst 5% / 10% of time buckets), plus a full-resolution loss_raster — a per-bucket loss series rendered on the results page as a loss barcode strip and a raster-based CDF. From these it derives a plain-language loss-pattern verdict ("clean", "steady", or "bursty") shown next to the loss chart, so a concentrated 30-second outage isn't reported the same way as the same loss smeared evenly.

Voice MOS and Jitter

Jitter is reported per protocol as the mean absolute difference between consecutive RTTs (RFC 3550 style). Because voice quality is a single-path property, UDP jitter is taken from one representative resolver, not the blended round-robin stream (a fast→fast→slow rotation across resolvers at different distances would otherwise manufacture a large consecutive-diff every Nth sample that is not path jitter).

Voice MOS (G.107 MOS-CQE). The analyzer derives a predicted conversational voice score from the latency, jitter and loss it already collects — a closed-form ITU-T G.107 E-model estimate (G.711 codec, fixed jitter buffer), no media stream. It is a transmission-planning estimate (MOS-CQE), not a measured call score, and GEO/high-latency values are indicative only (the delay impairment is extrapolated beyond the model's terrestrial validation range). It feeds the UDP path specifically, because UDP loss is PEP-unmasked (real RTP is UDP) — never PEP-repaired TCP/ICMP loss. The score is gated off (null) when the UDP sample count is too low or the link is effectively all-blocked.

Per-target aggregation. The UDP-DNS probe round-robins three resolvers at different network distances, so the MOS inputs are computed per target and then combined by what each metric means:

Loss = cross-target consensus (the minimum per-target loss). Real access-link loss hits every resolver about equally, so the consensus isolates it from one resolver's destination-specific flakiness. PEP doesn't repair UDP loss and a DNS cache doesn't fake loss, so this stays honest on GEO.
Delay & jitter = one representative resolver, validated against a real-path reference (QUIC over UDP/443, then HTTP). On a satellite link the gateway answers common resolvers from a local DNS cache (~30ms) while the real path is ~650ms, so a naive "nearest resolver" would read a falsely-excellent score. Any resolver whose median is far faster than the real-path reference (the same gateway-short-circuit signal the DNS-proxy detector uses) is excluded when that reference is genuinely high (i.e. on GEO); among the survivors the nearest is chosen (an optimistic capability estimate, matching how a voice app picks the nearest media server). ICMP is deliberately not used as the reference — it is frequently gateway/PEP-proxied on GEO, the same disease as a cached resolver.
Fallback: if every UDP resolver is short-circuited, delay/jitter are derived from the QUIC/HTTP real path (UDP kept for loss only) and the results page discloses this in the Voice MOS tooltip.

Sleep / Power-Nap Exclusion

On a lid-closed in-flight capture, macOS Power Nap suspends the laptop and briefly wakes it about every 15 minutes. Each wake burst arrives before the satellite link and its PEP have re-established, so it reads as ~100% loss — enough to turn a real ~5% loss into a reported 34%. The analyzer detects these suspensions as multi-minute gaps in the merged latency stream (a genuine outage never gaps — failed pings keep logging at cadence) and excludes the dead-wake samples, while keeping the sleep markers for the timeline. The exclusion is reported transparently in summary.sleep_exclusion (windows, samples excluded, seconds excluded) so the correction is auditable, not hidden.

Loaded Latency (Bufferbloat Split)

Loaded latency measures how much latency increases when the network is under load — the defining characteristic of bufferbloat. While idle latency tells you how fast a quiet connection is, loaded latency tells you what users actually experience when downloads, uploads, video calls, and cloud sync are all running at once. Ookla's Speedtest now reports this as three separate measurements (idle, download-loaded, upload-loaded), and it's becoming a key metric for operators.

We compute loaded latency by post-hoc analysis of the continuous ping stream against speed test timing windows. Because pings run at 5/sec throughout the capture, we already have latency samples covering every speed test — no dedicated loaded-latency probe is needed.

How it works

During analysis, every ping is classified into one of three buckets:

idle — no throughput test was running
dl_loaded — a download speed test was in progress
ul_loaded — an upload speed test was in progress

The classification uses throughput test timestamps with guard bands:

Phase guard (250ms): Each throughput phase window is extended by 250ms on each end to capture pings that overlap phase boundaries.
Idle guard (5000ms): Pings within 5 seconds of any phase boundary are excluded from the idle bucket. This prevents contamination from TCP ramp-up/cooldown — the network may still be buffered even though the speed test has technically ended.

When download and upload phases overlap (common on asymmetric GEO satellite links where upload finishes during a long download), pings in the overlap region are assigned to the shorter-duration phase. This prevents the longer phase from dominating the classification.

Gateway vs internet decomposition

Because we ping both the gateway and internet targets at 5/sec, we compute bufferbloat separately for each hop:

Internet bufferbloat (ping to 8.8.8.8 / 1.1.1.1): Shows total path bufferbloat — router + ISP + upstream.
Gateway bufferbloat (ping to local gateway): Isolates the local WiFi/router contribution. If gateway bufferbloat is high but internet is low, the router is the bottleneck. If both are high, the problem is upstream.

No other consumer tool provides this decomposition, because no other tool pings the gateway during throughput tests.

Output

For each hop (gateway and internet), the analysis produces:

Field	Description
idle.p50, idle.p95	Median and 95th percentile RTT when network is quiet
dl_loaded.p50, dl_loaded.p95	RTT during download speed tests
ul_loaded.p50, ul_loaded.p95	RTT during upload speed tests
bufferbloat_dl_ms	dl_loaded.p50 − idle.p50 (the delta)
bufferbloat_ul_ms	ul_loaded.p50 − idle.p50

The p50/p95 above are the headline figures; each bucket also carries the full distribution (p10/p50/p90/p95/p99) so the loaded-latency tail is preserved, not just the median. On PEP'd GEO links the idle baseline falls back to an HTTP/end-to-end RTT — proxied ICMP would otherwise read near-zero and inflate the delta.

The bufferbloat delta is rated on the results page:

Rating	Delta
excellent	< 20 ms
good	< 50 ms
fair	< 100 ms
impaired	< 200 ms
broken	≥ 200 ms

Implementation

Implemented in Python (csv_extract.py:compute_bufferbloat_split()). Production analysis runs in Python only — the upload Worker invokes the Python analyzer in a Cloudflare Container, so there is no separate JS implementation to keep in sync. Displayed on results pages as an "Under Load" panel showing idle/DL/UL latency bars with a quality rating. On PEP'd GEO links, where proxied ICMP makes the idle baseline read artificially low, the idle baseline falls back to an HTTP/end-to-end RTT so the under-load delta stays meaningful.

PEP Detection

Property	Value
Metric	`pep.detected` (boolean)
Method	QUIC/TCP RTT ratio at shared target
Target	cloudflare-quic.com:443
Threshold	QUIC p50 / TCP p50 ≥ 5.0×
Min samples	10 per protocol
Constraint	Only flags on GEO satellite links

What it measures: Whether a TCP Performance Enhancing Proxy (PEP) is intercepting TCP connections on the link. PEPs respond to TCP SYN locally, making TCP appear much faster than the actual satellite round-trip. QUIC uses UDP and bypasses the proxy entirely, so the QUIC/TCP ratio reveals the proxy's presence.

Why a shared target matters: Earlier captures compared TCP to one set of targets and QUIC to another. Different targets may have different routing, introducing noise. Both probes now test cloudflare-quic.com:443 (which supports both TCP and QUIC), giving an apples-to-apples comparison. Falls back to overall protocol stats for captures recorded before the shared target was added.

Why GEO-only: A high QUIC/TCP ratio on terrestrial links usually means the QUIC target is simply further away or QUIC is being throttled — not a PEP. PEPs are a satellite-specific technology, so the metric only triggers on GEO links (median RTT > 400ms).

Display: When detected, the results page shows a "TCP Acceleration Detected" banner explaining what the proxy is and how it affects measurements. The protocol comparison bars also get a "PEP" chip, and protocol ordering changes to highlight the TCP/QUIC gap.

Implementation: Python (csv_extract.py:compute_pep_index()), run server-side by the Python analyzer container.

ICMP Proxy Detection

Property	Value
Metric	`icmp_proxied` (boolean)
Method	ICMP vs HTTP/QUIC/TCP RTT ratio
Threshold	Ratio > 50× AND ICMP p50 < 5ms AND other protocol p50 > 50ms
Min samples	10 ICMP, 5 for comparison protocol

What it detects: Some in-flight networks have the onboard gateway answer ICMP echo replies locally instead of forwarding them over the satellite link. This makes ping latency appear sub-millisecond on a 600ms satellite connection — superficially great numbers that are completely misleading.

How it works: If ICMP p50 is below 5ms and a comparison protocol (HTTP, QUIC, or TCP) has p50 above 50ms and the ratio exceeds 50×, ICMP is flagged as proxied. The ratio guard is the primary discriminator — a 0.5ms ICMP with 355ms HTTP gives a ratio of 710×, far above the threshold. The 50ms floor prevents false positives on genuinely fast local networks.

What changes when detected: The results page shows an "ICMP Pings Answered Locally" banner. Latency ratings switch from ICMP to HTTP HEAD measurements, which the gateway can't fake. The latency spread panel re-labels as "Latency Spread (HTTP)" so the rating reflects real end-to-end performance, not the local gateway distance.

Coverage: Works on GEO satellite, LEO satellite, and hybrid satellite+cellular networks. Originally calibrated for GEO only; the threshold was lowered from 400ms to 50ms after hybrid-network captures showed the pattern at sub-GEO latencies.

Implementation: Python (io.py:detect_icmp_proxy()), run server-side by the Python analyzer container.

The mirror case — filtered (not proxied) ICMP: Some networks drop ICMP entirely rather than answering it locally. Because the loss headline is ICMP-based, that reads as ~100% packet loss on a link that actually passes traffic fine. On terrestrial networks, when ICMP loss is near-total but a transport path (UDP / TCP / QUIC / HTTP) is healthy, the headline loss is driven from that transport-derived loss instead and the page notes the substitution. This fallback is deliberately terrestrial-only: on a satellite link a TCP PEP repairs TCP/HTTP loss, so their low loss there isn't trustworthy health evidence — flight and GEO captures keep the raw ICMP loss.

Adaptive Probe Backoff

When a probe experiences sustained failures, the monitor adapts its polling frequency rather than hammering an unresponsive target at full speed. This operates at two levels:

Loop-level backoff (AdaptiveInterval)

Each probe loop tracks consecutive losses and doubles its interval after a threshold period of continuous failure. The behavior is network-aware when a shared NetworkHealth signal is available:

Blocked (network is up, this probe is failing): The probe is likely filtered — e.g., QUIC blocked on a corporate network. Backs off to the configured max interval (e.g., 60s for QUIC) after 60 seconds of failures. This reduces noise without losing the data point entirely.
Outage (entire network is down): Caps at 2 seconds regardless of how long the outage lasts. Fast polling during an outage ensures the probe detects recovery immediately — important for capturing satellite handover timing.

On the first success after being throttled, the interval resets immediately to its base rate.

Per-target backoff (TargetBackoff)

Multi-target probe loops (ping, UDP, TCP) may have one target blocked while others work fine. TargetBackoff tracks failures per target and skips backed-off targets most iterations, retrying only every 60 seconds. When a backed-off target recovers, it returns to normal polling immediately.

This prevents a single blocked target from inflating the CSV with loss rows while keeping the other targets at full frequency.

Local Network Detection

At startup, the tool auto-detects the local network environment. (The link type itself — GEO / LEO / terrestrial — is inferred separately from RTT and PEP behavior, not from anything below.)

Gateway: Default route via netstat -rn (macOS), route print (Windows), or ip route show default (Linux)
VPN: Detects active VPN tunnels by finding utun interfaces with IPv4 addresses assigned (macOS), TAP/TUN/VPN/WireGuard adapters (Windows), or tun interfaces (Linux). Identifies known corporate VPNs via DNS search domains (e.g., corporate domain → "Corporate VPN")
WiFi info: Signal strength, channel, SSID (see WiFi Signal section)

Quality Tiers and What They Mean

Every metric is classified into a 6-tier rating system: superior → excellent → good → fair → impaired → broken. The overall capture rating is the worst of any individual metric — one bad metric drags the overall down. This is intentional: a connection with great download but unusable latency for video calls should not get an "excellent" rating.

The tiers are calibrated to the link type. GEO satellite has a ~600ms physics floor, so latency tiers shift up and cap at "good" — a 700ms GEO RTT is as good as it physically gets and shouldn't be penalized for what physics enforces. Below are the actual thresholds.

Latency (median RTT)

Tier	Non-GEO	GEO satellite
superior	< 20 ms	(n/a — physics floor)
excellent	< 50 ms	(n/a)
good	< 100 ms	≤ 700 ms
fair	< 200 ms	≤ 900 ms
impaired	< 500 ms	≤ 1200 ms
broken	≥ 500 ms	> 1200 ms

GEO detection: median RTT > 400 ms is treated as GEO satellite.

Packet loss

Tier	Threshold
superior	< 0.1 %
excellent	< 0.5 %
good	< 1 %
fair	< 5 %
impaired	< 10 %
broken	≥ 10 %

These loss thresholds line up with common industry loss-rate bands for cross-tool comparability.

Voice MOS (predicted MOS-CQE)

Tier	Threshold (MOS-CQE)
superior	≥ 4.3
excellent	≥ 4.0
good	≥ 3.6
fair	≥ 3.1
impaired	≥ 2.6
broken	< 2.6

G.107 E-model estimate (see Voice MOS and Jitter). On GEO the one-way delay alone caps the achievable score, so a GEO link reads "impaired/fair" for interactive voice even when loss is low — which is the honest expectation for a ~600ms-RTT path.

Download throughput

Tier	Mbps
superior	≥ 200
excellent	≥ 100
good	≥ 25
fair	≥ 12
impaired	≥ 3
broken	< 3

Upload throughput

Tier	Mbps
superior	≥ 100
excellent	≥ 25
good	≥ 5
fair	≥ 1
impaired	≥ 0.5
broken	< 0.5

WiFi signal (RSSI)

Tier	dBm
excellent	≥ -50
good	≥ -67
fair	≥ -70
impaired	< -70

The thresholds derive from the WiFi Alliance signal classification guidelines, where ≥ -67 dBm is the typical threshold for reliable streaming.

CSV Format

All measurements write to a single CSV with columns:

timestamp_utc, measurement_type, target, rtt_ms, loss_bool, seq,
hop_count, throughput_mbps, dns_ms, burst_size, notes

The notes field carries structured key=value pairs specific to each measurement type (protocol details, proxy headers, calibration data, etc.).

A separate raw throughput samples CSV (*_throughput_raw.csv) records per-chunk timing data for TCP ramp-up analysis.

Post-Capture: Upload and Server-Side Analysis

After a capture completes, files are uploaded to Cloudflare R2. The upload Worker then generates the results page data automatically — clients upload raw measurements and are done.

Upload pipeline

Client uploads raw files: CSV, meta.json (location, network, privacy settings), and throughput_raw.csv (per-chunk samples). Files are gzip-compressed (~5-6x) before upload.
Worker generates data.json server-side by analyzing the CSV. The upload Worker invokes the canonical Python analyzer in a Cloudflare Container (bufferbloat split, PEP detection, ICMP proxy detection, loss burstiness, quality ratings, timeseries extraction), then writes data.json.gz to R2. Every upload automatically gets a results page — no client-side analysis, and a single Python implementation.
Worker updates captures-index.json with the new capture metadata, making it immediately visible on the browse page.

This replaced the earlier model where each client (Python, Swift, macOS app) generated data.json independently before upload. Server-side generation eliminated reliability gaps (a small number of historical captures had no results page due to client-side generation failures) and the maintenance burden of keeping Python and Swift analysis in sync.

Upload

Captures upload through the Cloudflare Worker, so no storage credentials live on the client. Upload state is tracked locally so a capture is never uploaded twice, and pending uploads (e.g., the device was closed before the upload finished) are caught up on the next run.

Access control

Stored data is served through a Cloudflare Worker at howismywifi.com. The capture listing and index API are access-restricted. Individual results pages are accessible by direct link (capture IDs include a random component to prevent enumeration). Crawlers are blocked from /c/ paths, and brute-force scanning is rate-limited.

macOS App Integration

The macOS app (macos-app/) is a native SwiftUI wrapper that spawns the Python tool as a subprocess with --json mode. Communication is JSON lines on stdout (events) and stderr (diagnostics).

The app provides: - SetupView: Configure location, network type, duration before capture - RunningView: Live stats during capture (latency, loss, throughput, RSSI) - ResultsView: Post-capture summary with quality ratings - CapturePickerView: Browse and load past captures from the captures directory - BootstrapView: First-run setup (installs Python dependencies via uv)

The Python side emits 20 JSON event types: info, status, probe, throughput_status, throughput_result, cdn_throughput_result, traceroute_result, complete, analysis_complete, upload_start, upload_complete, upload_error, upload_skipped, upload_summary, capture_reminder, capture_auto_stop, system_sleep, network_change, note, error.

Auto-update is handled by Sparkle. publish-update.sh signs the DMG, generates an appcast, and uploads to howismywifi.com/download/.

How We Compare to Other Tools

Network-quality tools occupy different points in a design space. Here's where WiFi Monitor sits relative to the major reference points, with sources cited so you can verify each comparison.

Feature	Speedtest (Ookla)	Cloudflare Speed Test	Orb.net	WiFi Monitor
Test duration	~30 sec single-shot	~30 sec single-shot	continuous (24/7)	2 min → 30 min → continuous
Concurrent multi-protocol	❌ TCP only	❌ HTTP only	❌ HTTPS/h3 only	✅ ICMP + UDP + TCP + HTTP + QUIC
Adaptive duration for high-RTT	✅ scales to RTT	⚠️ fixed	⚠️ fixed	✅ 4s default, 15s GEO
Loaded latency / bufferbloat	✅ "working latency"	✅ AIM	⚠️ implicit	✅ idle / DL-loaded / UL-loaded + gateway split
Packet loss measurement	❌ not surfaced	✅ via WebRTC TURN	✅ over time	✅ from ICMP seq + UDP
Traceroute / path	❌	❌	❌	✅ every 30s with hop deltas
TCP PEP / proxy detection	❌	❌	⚠️ implicit (h3 vs HTTPS)	✅ explicit via QUIC handshake
Multi-CDN diversity	❌ single server	❌ Cloudflare only	⚠️ Cloudflare + Fastly	✅ Cloudflare + Akamai + Netflix OCA
Real video-player measurement	❌	❌	❌	✅ headless Chromium YouTube ABR + startup time
Streaming resolution tier (4K/1080p/720p/SD)	❌	❌	❌	✅ Netflix (OCA) + YouTube (segment + ABR)
WiFi signal correlation (RSSI/PHY)	❌	❌	❌	✅ RSSI + channel + MCS + NSS
Composite quality score	❌	✅ AIM (3 scenarios)	✅ Orb Score 0–100	✅ 6-tier per-metric (link-type aware)
Designed for satellite	❌ generic	❌ generic	❌ generic	✅ that is the whole point

The bold cells are wifi-monitor's distinctive strengths — none of the consumer tools above do them. Bufferbloat and PEP detection are surfaced as named metrics on every results page, and the full-stack video tier (Netflix max bitrate + real YouTube playback) closes the gap between "this link can theoretically do X Mbps" and "this is what streaming actually looks like on it." Our remaining gaps are in summary scoring — RPM as a single-number responsiveness metric, and per-scenario quality ratings.

The deepest empirical comparison of these tools is MacMillan et al. (2023), A Comparative Analysis of Ookla Speedtest and M-Lab NDT7. Headline finding: on high-latency links (500–600ms RTT), single-stream short-duration tests under-report throughput by 60–70%. This is the empirical justification for our calibrated multi-stream approach with extended duration on GEO links.

What We Don't Measure (and Why)

Honest disclosure of methodology gaps. These are deliberate choices rather than oversights, and we revisit them as priorities shift.

IPv6 path testing. All probes target IPv4 today. Some inflight satellite networks have IPv6 quirks worth surfacing, but the implementation cost is moderate and inflight v6 is rare at the passenger edge. Tracked as a future enhancement.

PMTU / MTU discovery. Path MTU black-holes are a known inflight gotcha (especially in VPN-in-VPN scenarios — corporate VPN inside satellite VPN). We don't probe for them today. No consumer tool does either, but it's a real diagnostic gap. Tracked.

HLS, DASH-outside-YouTube, and WebRTC measurement. The YouTube ABR probe simulates one real video player end-to-end — startup time, selected resolution, stalls, ABR switches, dropped frames — and the Netflix probe measures real OCA throughput mapped to a resolution tier. What we still don't do: generic HLS rebuffering tests against arbitrary streams (Disney+, Hulu, Apple TV), or WebRTC connection setup (Zoom / Teams / Webex call-quality simulation). The YouTube probe is the closest a passenger-side tool can get without operator cooperation, and it captures the failure modes that matter most for in-flight video. WebRTC simulation in particular is tracked as a roadmap item.

WebRTC / RTP media-stream loss. We now collect real UDP packet loss directly (via the UDP-DNS probe, separate from ICMP — see UDP (DNS) and Voice MOS and Jitter), which captures the divergence that matters on ICMP-shaping and PEP-masking networks. What we still don't do is run an actual RTP/WebRTC media stream (Zoom/Teams call simulation) against a cooperating endpoint; the Voice MOS estimate is modelled from the UDP path rather than measured from a live call. Tracked as a roadmap item.

Cross-capture aggregation. howismywifi.com displays individual captures. We don't aggregate across captures to produce comparative context like "median GEO capture by network" or "this route compared to last month." That is tracked as a roadmap item.

Single-CDN comparisons across paths. We test multiple CDNs from one device, but we don't compare the same CDN reached from different satellite ground stations or from a single-passenger device vs. gateway-hosted measurements. A research-grade question, out of scope.

Long-term persistent monitoring of a single endpoint. Orb.net does this; we don't. Different tool, different design center.

Design Considerations

Why Multi-Protocol?

No single measurement tells the whole story:

ICMP can be filtered, deprioritized, or proxied
TCP timing reveals proxy/PEP presence invisible to ICMP
QUIC confirms TCP proxy detection from the opposite direction (UDP-based, bypasses TCP proxies)
HTTP shows real application-layer experience
UDP via DNS provides a control for ICMP-specific behavior
Throughput shows capacity, not just latency
DNS resolution affects every connection but is often overlooked

The power is in the comparison. When ICMP says 600ms and TCP says 5ms, you've found a PEP. When QUIC says 600ms and TCP says 5ms, the PEP is confirmed — QUIC bypasses the proxy entirely. When ICMP is fine but HTTP spikes, the problem is at the application layer.

Why High Frequency?

Ping and UDP at 5/sec (not 1/sec) because:

Satellite beam handovers last 0.5-2 seconds — 1/sec sampling aliases these events. 5/sec captures the actual loss duration.
Latency spikes on congested WiFi are sub-second. 1/sec misses them.
Statistical reliability: 5x the samples in the same time window.

TCP and HTTP are slower (2s, 5s) because they're heavier: each probe opens a connection and does real work. The trade-off is more detail from lightweight probes, broader coverage from heavyweight probes.

Why Subprocess Ping on macOS?

macOS's ICMP implementation via SOCK_DGRAM (non-privileged sockets, used by icmplib) introduces measurement artifacts: periodic ~1-second latency spikes that appear real but are OS-level queuing artifacts. The system ping binary uses SOCK_RAW (setuid) and doesn't have this issue.

This was a hard-won finding after investigating mysterious periodic spikes that only appeared on macOS with a VPN active.

Why Cloudflare for Throughput?

Free, no registration or API key required
Precise byte-level control of download size (__down?bytes=N)
Global CDN means the server is close to the user (low latency overhead)
High capacity — won't bottleneck even on fast connections
Rate limits are generous enough for our round schedule (~900 MB/30 min)

Why Iterative Calibration?

A fixed transfer size either wastes time on slow links (downloading 100 MB on a 3 Mbps hotel connection takes 4.5 minutes) or undersamples fast links (5 MB on a 500 Mbps fiber completes in 80ms, mostly TCP ramp-up).

The iterative approach (5→10→20→40→100 MB until convergence) adapts to the actual link speed in 2-3 rounds, using minimal data on slow links and scaling up only as needed. Total calibration overhead: 15-175 MB depending on speed.

Why Wall-Clock Upload Timing?

When uploading, sample-based timing (recording timestamps as data is generated) measures how fast the OS accepts data into the TCP send buffer, not how fast it actually leaves the machine. On fast local networks, the buffer fills almost instantly while the actual upload takes much longer.

Wall-clock timing (total_bytes / elapsed_time) is correct because requests.post() only returns after the server acknowledges receipt. The full round-trip is included in the measurement.

Download doesn't have this problem because iter_content() blocks until data actually arrives from the network.

References & Further Reading

The methodology choices above draw on a body of network-measurement research and standards. The most useful sources, organized by topic.

Standards & specifications

IETF draft-ietf-ippm-responsiveness — Responsiveness under Working Conditions. Defines RPM (Round-trips Per Minute), the loaded-latency measurement standard. Apple's networkQuality tool implements it; Cloudflare's mach implements it.
RFC 9318 — IAB Workshop on Measuring Network Quality for End-Users — Consensus view from the internet measurement community on what metrics matter and why working latency > idle latency.
Cheshire IETF 121 slides — Apple's framing of why RPM uses "higher is better" rather than milliseconds.

Comparative analysis

MacMillan, Mangla, Saxon, Marwell, Feamster (2023) — A Comparative Analysis of Ookla Speedtest and Measurement Lab's NDT7. The single best academic source on how Ookla and NDT7 actually behave at high latency. Quantitative data on adaptive multi-stream behavior.
CAIDA: Empirical Characterization of Ookla's Speed Test Platform (2024) — Server distribution analysis, server selection behavior, ISP proximity effects.

Bufferbloat

bufferbloat.net: Tests for Bufferbloat — Catalog of every bufferbloat test tool with methodology notes.
Waveform Bufferbloat Test — Reference UX for "letter grade for bufferbloat" approach.
LibreQoS Bufferbloat & QoO Test — Implementation of IETF QoO (Quality of Outcome) per application class.
Ookla: Introducing a Better Measure of Latency — Explains Speedtest's idle / download-loaded / upload-loaded latency split — the same decomposition we implement.
Ookla: Loaded Latency and L4S — Why loaded latency is becoming the defining metric for modern networks; L4S/ECN as the next-generation mechanism to keep it low.

Tool methodologies

Cloudflare: How does Cloudflare's Speed Test really work? (2025) — Best technical explanation of a modern consumer speed test. Documents the block-based "no saturation" methodology.
Cloudflare AIM scoring — Scenario-specific scoring (Streaming / Gaming / RTC) framework.
@cloudflare/speedtest source — npm package, MIT license. The default measurement sequence is in the README.
Cloudflare mach (Rust RPM client) — Open-source CLI for RPM measurement.
FCC Measuring Broadband America — Open Methodology — The SamKnows + FCC methodology. Closest analog to wifi-monitor's design philosophy. Annual technical appendices document the full test suite.
Orb.net documentation — Most complete public description of Orb's metric definitions and continuous measurement approach.

Inflight-specific

paxex.aero: Measuring more than Megabits — Inflight internet monitors take the spotlight (2024) — Coverage of NetForecast QMap and the Seamless Air Alliance QoE spec.