How It Works

How WiFi Monitor measures network performance, and why each measurement works the way it does. This is a living document — update it when probe behavior changes or when questions come up about what the monitor captures.

We publish this in full because the credibility of any network-quality tool comes from its methodology being inspectable. If you can't see what we're measuring and why, you have no basis for trusting the numbers. Cloudflare, the IETF responsiveness draft, and the FCC Measuring Broadband America program all publish their methodologies for the same reason. Our framework is closer to FCC MBA / SamKnows than to Speedtest: characterize a link over minutes-to-hours of multi-protocol probing rather than a 30-second snapshot.

Overview

WiFi Monitor measures a network connection at every layer of the stack, all writing to a single timestamped CSV. The goal isn't "how many probes" — it's coverage: each layer is tested with more than one source so the layers cross-check each other. That's what makes the data tell you things a single-protocol speed test never could (TCP PEPs, ICMP proxies, per-CDN throttling, the gap between "this link could stream 4K" and "this player actually does").

The design center is in-flight WiFi — satellite handover detection, TCP PEP detection, per-CDN rate-limiting policy — but the same probes work on any network.

Coverage by layer

Layer What we probe Cross-check / why more than one
WiFi radio (L1/L2) RSSI, noise floor, channel, channel width, band, MCS index, NSS (spatial streams), PHY mode, TX rate, SSID/BSSID Correlate signal drops with throughput drops — distinguishes RF problems from network problems.
IP / path (L3) ICMP ping to default gateway + two internet targets (5/sec); traceroute every 30s Gateway vs internet split distinguishes local from upstream issues; two targets catch asymmetric routing and per-target rate-limiting.
Transport (L4) TCP SYN, UDP (DNS-port), QUIC handshake — all to the same target where possible Three transports against one target reveal PEPs (TCP fast + QUIC slow = TCP proxy), ICMP proxying (ICMP fast + everything else slow), and UDP filtering.
Application protocols (L7) HTTP HEAD against CDN + obscure "canary" targets; system-resolver DNS CDN-vs-canary HTTP delta reveals transparent proxies that cache popular sites; system DNS catches captive portals and DNS interception.
CDN diversity Self-hosted Cloudflare Worker, Cloudflare public speed test, Akamai (dash.akamaized.net), Netflix Open Connect via fast.com Per-CDN rate-limiting is real and invisible to single-source tests: Netflix may be capped at 2.3 Mbps on a link where Cloudflare sees 10+ Mbps. Four sources surface the policy.
Real video player YouTube segment probe (yt-dlp, no browser) + YouTube ABR probe (Playwright + headless Chromium, real player); Netflix max-bitrate tier from OCA throughput Segment probe says what should work; ABR probe says what the player actually does. When they disagree, the link or platform is doing something to the player.

The next two tables show how that coverage maps to each shipping platform — which probes run on which OS, and which features are implementation-complete vs platform-specific.

We deliberately operate at the invasive end of the measurement spectrum: 5/sec ICMP ping to the gateway, multi-protocol probes running concurrently the entire capture, round-scheduled throughput tests that consume up to 900 MB / 30 min. We accept this link impact in exchange for measurement fidelity. A passive home monitor would be a different tool with a different design center.

WiFi Monitor runs on four platforms. The CLI is the reference implementation; macOS and Windows apps wrap it via subprocess (uv run --json), inheriting all CLI capabilities. The iOS app is a native Swift reimplementation — it covers most probes but not all. On iPad, the app uses an adaptive layout with live latency charts, multi-column stat panels, and wider post-capture result views — closer to the macOS experience than the phone layout.

Probes by Platform

Probe CLI macOS App Windows App iOS App
ICMP Ping (internet)
ICMP Ping (gateway)
UDP DNS
TCP Connect
QUIC Handshake
HTTP HEAD + canary
Speed Test (Worker/CF/Akamai)
Netflix Max Bitrate (OCA)
YouTube Segment (no browser)
YouTube ABR (real player, Chromium)
DNS (system resolver)
Traceroute
WiFi Signal (basic)
WiFi Signal (PHY: MCS, width, band)
Loss Burst Detection

iOS uses Akamai-only speed testing. Netflix OCA discovery (via the fast.com API) is implemented on all platforms including iOS. The YouTube probes (which depend on either Playwright + Chromium or yt-dlp) are not implemented on iOS — YouTube streaming measurement is available on CLI, macOS, and Windows.

iOS probe frequencies differ from CLI. The CLI runs TCP at 2s, QUIC and HTTP at 5s (active profile). iOS runs all three at 15s intervals (~4×/min) as a battery/radio compromise — sufficient for trend lines without hammering the cellular radio on GEO satellite. DNS is 30s on both platforms. ICMP ping is 1/sec on iOS (vs 5/sec on CLI).

Features by Platform

Feature CLI macOS App Windows App iOS App
Adaptive probe backoff
Adaptive round scheduler (5/10/30-min) mode-based
On-demand speed test (s key)
On-demand YouTube test (y key)
Observed throughput
Upload to R2
Privacy transform
PNG analysis charts
Capture history
Live latency charts iPad only
iPad adaptive layout

Probe Types

ICMP Ping

Property Value
Frequency 5/sec (0.2s interval)
Targets 8.8.8.8, 1.1.1.1
+ Gateway Auto-detected default gateway
Timeout interval + 2s
Logged as ping

What it measures: Base round-trip time and packet loss. The most fundamental network measurement — ICMP echo request/reply with no application-layer overhead.

Implementation: On macOS/Linux, uses the system ping binary (setuid, SOCK_RAW). This is deliberate: macOS's SOCK_DGRAM ICMP handling (used by icmplib's non-privileged mode) causes ~1-second quantized latency spikes under concurrent load. The system ping avoids this artifact entirely. See docs/vpn-periodic-latency-spikes.md for the investigation.

Falls back to icmplib on Windows or when the system ping binary is unavailable. The icmplib path uses a fire-and-forget pattern: pings are sent at strict intervals regardless of RTT, allowing multiple pings in-flight simultaneously. This matters on GEO satellite links (~600ms RTT) where a synchronous ping-wait-ping approach would reduce effective frequency.

Gateway ping: Pings the default gateway (usually 192.168.1.1) to distinguish local WiFi issues from internet path issues. If a spike appears in gateway ping simultaneously with internet ping, the problem is local. Auto-disables after 20 consecutive losses (some networks block ICMP to the gateway).

Why two internet targets: If one target has issues (Google's 8.8.8.8 occasionally rate-limits ICMP), the other provides a control. Also detects asymmetric routing — different targets may take different paths.

UDP (DNS)

Property Value
Frequency 5/sec (0.2s interval)
Targets 8.8.8.8, 1.1.1.1
Timeout 2s
Logged as udp

What it measures: UDP round-trip time, independent of ICMP. Sends a minimal DNS A-record query and times the response. This is a real UDP round-trip, not just an ICMP echo.

Why it matters: Some networks treat ICMP differently from real traffic — lower priority, rate limiting, or filtering. UDP probes via DNS provide a second latency measurement using actual data-plane traffic. On networks with ICMP deprioritization, you'll see UDP RTT match TCP/HTTP while ICMP shows higher latency.

Implementation: Builds a raw DNS query packet (A record for www.google.com), sends it over SOCK_DGRAM to port 53, and verifies the transaction ID in the response. No DNS library needed — the query is ~30 bytes.

TCP Connect

Property Value
Frequency Every 2s
Targets 8.8.8.8:443, 1.1.1.1:443
Timeout 5s
Logged as tcp

What it measures: TCP SYN-ACK time — the time for a TCP three-way handshake to complete. Connects to port 443 and immediately closes.

Why it matters: This is the key probe for detecting TCP Performance Enhancing Proxies (PEPs). On satellite connections, airlines often deploy transparent TCP proxies that intercept TCP connections and respond with a local SYN-ACK, making TCP appear much faster than the actual satellite round-trip. Compare:

This gap is the signature of a TCP PEP. Without the multi-protocol comparison, you'd never know the proxy exists.

TCP retransmit behavior: On packet loss, TCP retransmits after 1 second (the initial RTO). This means TCP loss events show as 1000ms+ spikes rather than the gaps you see in ICMP. This difference is itself informative — it shows how TCP's loss recovery affects application performance.

QUIC Handshake

Property Value
Frequency Every 5s
Targets google.com:443, cloudflare.com:443
Timeout 10s
Logged as quic

What it measures: QUIC handshake time — a full UDP + TLS 1.3 connection setup negotiating HTTP/3 (ALPN h3). Connects to port 443 and immediately closes after the handshake completes.

Why it matters: QUIC uses UDP, not TCP. This makes it the best probe for detecting TCP Performance Enhancing Proxies (PEPs) — the same proxies that make TCP SYN look artificially fast can't intercept QUIC because it's UDP-based. Compare:

This gap is the inverse of the ICMP/TCP gap and confirms the proxy hypothesis from a different angle. QUIC is also increasingly the protocol passengers actually use — YouTube, Google services, and Cloudflare all default to QUIC/HTTP/3 when available.

Why hostnames instead of IPs: QUIC requires TLS SNI (Server Name Indication), so raw IPs won't work. DNS resolution is included in the first probe of each round, which is intentionally realistic — it measures what a real QUIC connection costs.

Expected RTT vs TCP SYN: QUIC handshake is typically 1.5-3x TCP SYN time on the same path because it includes TLS 1.3 key exchange (one additional round-trip). On proxied satellite links, the ratio inverts dramatically since the proxy only accelerates TCP.

Networks that block QUIC: Some enterprise/airline networks block UDP on port 443, forcing fallback to TCP/HTTP/2. The QUIC probe will show 100% loss on these networks — this is itself useful data, confirming that the network actively filters QUIC traffic.

Implementation: Uses the aioquic library (optional dependency). If not installed, the probe is silently skipped. A monkey-patch suppresses harmless ValueError noise from aioquic's StreamWriter cleanup of server-initiated push streams.

HTTP HEAD

Property Value
Frequency Every 5s
Targets google.com, cloudflare.com + proxy canaries
Timeout 12s
Logged as http

What it measures: Full-stack latency: DNS resolution + TCP handshake + TLS negotiation + HTTP request/response. This is what users actually experience when loading a webpage.

Proxy canary targets: In addition to CDN targets (Google, Cloudflare), the probe tests obscure "canary" sites (httpbin.org, icanhazip.com, example.com). On startup, it calibrates which canaries are reachable, then selects one HTTP and one HTTPS candidate.

The idea: transparent proxies often cache/accelerate popular CDN content but pass through traffic to obscure sites unmodified. By comparing CDN latency vs canary latency (cdn_delta_ms in the log), you can detect: - Positive delta: Canary is slower — normal, canary is further away - Negative delta: Canary is faster than CDN — suggests CDN content is being intercepted and re-served (proxy artifact) - Large delta on HTTP but not HTTPS: Port-80-only transparent proxy

Response header inspection: Logs Via, X-Cache, X-Cache-Hits, X-Forwarded-For, and Server headers — additional proxy fingerprints.

Session reuse: Uses a persistent requests.Session so TLS sessions are reused after the first request. This means subsequent measurements reflect steady-state latency, not cold-start TLS negotiation.

Concurrency: All HTTP probes in a round fire concurrently via asyncio.gather(). This prevents slow canary sites from delaying CDN measurements.

Throughput

Property Value
Frequency Adaptive round schedule (see below) + on-demand s key
Backend 3-tier: self-hosted Worker → Cloudflare → Akamai
Streams 4 parallel TCP connections
Logged as throughput_down, throughput_up, calibration

What it measures: Download and upload speed via multi-stream HTTP transfers. Saturates the link (like Ookla) to measure true capacity.

Backend fallback

Speed tests use a 3-tier backend priority system:

  1. Self-hosted Worker (primary): Download endpoint at howismywifi.com/speed-test/download?bytes=N — our Cloudflare Worker streaming zeros via ReadableStream. Zero egress cost, no rate limiting, no bot detection. Upload goes to our existing upload-signer Worker.

  2. Cloudflare (fallback): speed.cloudflare.com/__down?bytes=N and __up. Free, no API key, reliable. Falls back here if the Worker is unreachable.

  3. Akamai (last resort): dash.akamaized.net test files. Activated after consecutive Cloudflare 429/403 errors (shared NAT IPs on satellite links trigger per-IP rate limits).

Each tier tracks consecutive failures independently. After a threshold (3 failures, or 5 on GEO satellite), the next tier becomes active. Every 30 minutes, blocked tiers are re-probed for recovery. The active backend is logged with each speed test result (backend=worker|cloudflare|akamai).

Calibration (runs once at startup)

Before the first throughput test, a two-step calibration determines how to size transfers:

  1. Latency probe (1KB download): Measures time-to-first-byte to classify the link. >400ms = GEO satellite, everything else = terrestrial. This sets the test duration: 4s for terrestrial links, 15s for GEO satellite (TCP slow-start takes ~7 RTTs = 4.2s to fill the congestion window at 600ms RTT).

PEP-aware GEO detection: On PEP'd satellite links, the TCP latency probe may return ~80ms (the proxy's local response) instead of the true ~700ms satellite RTT, causing GEO misclassification. To counter this, GEO detection is an OR-latch: if any signal indicates GEO (calibration TTFB, QUIC handshake >400ms, or ICMP median >400ms), the link is classified as GEO and never downgraded. QUIC is the strongest signal since it uses UDP and bypasses the PEP entirely.

  1. Speed calibration (iterative): Downloads and uploads at increasing sizes (5, 10, 20, 40, 100 MB) until the estimate converges (<20% change between rounds). Slow links converge quickly (2 rounds); fast links iterate further. Max calibration data: 175 MB. Download and upload converge independently — once one direction converges, it stops probing.

The calibrated speed determines transfer size: speed * duration / streams bytes per stream, clamped to 500KB-50MB per stream.

Loss-aware early exit: On severely lossy links (sustained ICMP loss above ~8%, or self-observed throughput variance above 40%), calibration exits after round 2 instead of grinding through all five sizes. The extra rounds rarely converge on a noisy link and only waste data budget. The completion record carries an early_exit flag so analysis can distinguish "this is the converged value" from "this is the best estimate we got before bailing."

Adaptive Round Schedule

Captures are organized into rounds. Each round is a fixed bundle of work (speed test, Netflix, ICMP, etc.) sized to take a known amount of time. The number of rounds in a capture scales with the capture duration so that short captures still produce meaningful coverage:

Capture duration Rounds Notes
5 min (Quick) 3 Two work rounds + one final round
10 min (Standard) 5 Evenly spaced across the window
30 min (Extended) 8 Full bursty + spread coverage
Continuous 8 / 30 min, repeats 30-min cycle, no end

Rounds are scheduled with even spacing across the available time (rather than the older 30-min wave's "burst then spread" cadence). Live displays and JSON events include round_num / total_rounds so the apps can show "Round 3 of 8: testing download…" — the user sees what's happening and when the capture will finish.

On-demand triggers: The CLI and apps support manual test triggering during a capture without breaking the round schedule:

Useful for "the connection just got worse, test it again" without having to start a new capture.

Passive profile: Long-running passive captures (not the default) keep the heavy probes on a wider cadence — speed tests every 2 hours instead of every round — to limit data usage for unattended monitoring.

Out-of-Family Detection

If a throughput result deviates >50% from the median of the last 5 results, it's flagged as "out of family" and a retest is scheduled 90 seconds later. Max 1 retest per round. Retests never trigger further retests.

This catches transient anomalies — a single bad result gets verified before it skews analysis. The 90-second delay is long enough for transient congestion to clear but short enough to catch real changes.

Data Budget

Each 30-minute window has a 900 MB data budget. When cumulative download bytes exceed this, remaining tests are skipped (logged as skipped=data_budget). This prevents Cloudflare rate limiting (~1 GB threshold triggers 429/403 responses).

On rate limit (429 or 403): exponential backoff (30s, doubling up to 120s) AND transfer size is halved. On success, backoff decays gradually (halved each success) rather than resetting instantly — prevents immediately re-triggering the limit.

Measurement Method

Netflix Max Bitrate (OCA)

Property Value
Frequency Twice per round (paired with speed test bursts)
Targets Netflix OCA (via fast.com API), Akamai (dash.akamaized.net)
Timeout 45s wall-clock cap
Logged as cdn_throughput

What it measures: Download throughput from the same Netflix Open Connect Appliance (OCA) servers that serve Netflix video to real users. Measured Mbps is mapped to a streaming resolution tier so the result answers a question users actually have — "would Netflix play in HD on this link?" — instead of a raw number that requires interpretation.

Resolution tiers: Based on Netflix's published encoding bitrates:

Tier Measured throughput
4K ≥ 8 Mbps
1080p ≥ 2.5 Mbps
720p ≥ 1.5 Mbps
SD < 1.5 Mbps

Why it matters: Different CDNs are treated differently by in-flight networks. Airlines commonly deploy per-CDN rate limiting — Netflix OCA traffic may be capped at 2.3 Mbps on GEO satellite while Cloudflare sees 10+ Mbps on the same link. The Netflix probe surfaces these policies directly: it's not enough to know the link can do 25 Mbps if Netflix itself is shaped to 720p.

Netflix OCA discovery: The probe fetches the fast.com page, extracts the API token from the embedded JS bundle, then calls api.fast.com/netflix/speedtest/v2 to get OCA URLs. If an OCA URL expires during a long capture (returns errors), the probe re-discovers a fresh URL automatically. Akamai (dash.akamaized.net) is tested in parallel as a generic CDN comparison point.

Segment sizing: Download size adapts to link latency (from calibration): 10 MB on GEO satellite (>400ms RTT), 5 MB on medium latency, 2 MB on low latency. Larger segments on high-latency links ensure enough data to get past TCP slow-start. Speed test fallback: after 3 consecutive Cloudflare blocks, the throughput probe also falls back to Akamai for the remainder of the round.

Measurements logged: Throughput (Mbps), resolution tier, time-to-first-byte (TTFB), DNS resolution time, bytes transferred, content type.

YouTube (real-player ABR + segment probe)

Property Value
Frequency Once per round (when scheduled); also on demand via y key
Target YouTube (rotating set of long-form public videos)
Logged as youtube_segment, youtube_abr

What it measures: Real video streaming performance — not just whether the link could support YouTube, but what YouTube actually does on it. This is the full-stack credibility test: ICMP and throughput can look fine while a real player still struggles with startup or quality selection. The probe runs in two stages: a fast segment-level measurement, and (when Chromium is available) a real headless player.

Segment probe (no browser, ~8s)

Uses yt-dlp to resolve a video's DASH/HLS manifest, then downloads four real media segments over HTTP and times each one. Reports per-segment throughput, time-to-first-byte, CDN host, codec, and container format — plus a recommended streaming resolution:

Recommended tier Required throughput (incl. 30% headroom)
4K ≥ 9 Mbps
1080p ≥ 1.6 Mbps
720p ≥ 0.8 Mbps
480p / lower < 0.8 Mbps

The segment probe runs on every YouTube test slot and ships with the Python CLI / macOS / Windows apps with no Chromium install required. It is the data behind the "Recommended quality" chip on the results page.

ABR probe (Playwright + headless Chromium, ~50s)

When Chromium is installed (offered on first run; bundled as an opt-in download in the apps), the ABR probe launches a real headless YouTube player and runs two phases:

  1. AUTO phase: let YouTube's ABR algorithm pick the quality on this link.
  2. FORCED phase: ask the player for the resolution the segment probe said it could support, and see what actually happens.

Per phase, the probe reports cold startup time (page navigation to first frame), the resolution actually rendered, dropped frames, stall count and total stall duration, ABR quality switches, codec, container, and player-reported bandwidth and buffer health. An ad_seen flag records whether YouTube served an ad during startup — important because ads can mask or skew the cold-start measurement, and the flag flows all the way through to the results page so post-hoc analysis can filter.

Video rotation and fallbacks: The probe maintains a list of long-form public videos and rotates through them; if a video is unavailable (regional restriction, takedown, geo-blocked OCA), the next one in the list is tried. The full fallback loop is exercised in tests so a single bad video can't silently kill the probe.

Why two probes: The segment probe is fast, dependency-free, and gives a portable "what should work" number. The ABR probe is slower and heavier but tells you what a real player actually does, including startup latency that no throughput test can capture. Both numbers appear side-by-side: when they agree the link is well-behaved; when they disagree (segment says 1080p, ABR locks to 480p), the network or platform is doing something to the player itself.

Platforms: CLI, macOS, Windows. iOS does not run either probe (Playwright/Chromium and yt-dlp are not available on iOS); the iOS app focuses on link characterization only.

DNS

Property Value
Frequency Every 30s
Target www.google.com
Method System resolver (getaddrinfo)
Logged as dns

What it measures: End-to-end DNS resolution time as applications experience it. Uses the system resolver, which includes OS DNS cache effects.

Why system resolver (not raw DNS): The UDP probe already measures raw DNS transport latency to 8.8.8.8 and 1.1.1.1. This probe captures what applications actually see — including local caching, search domain expansion, and any DNS interception by the network. On networks with captive portals or DNS-based filtering, this probe shows the real behavior.

Why 30s interval: DNS results are heavily cached. More frequent probing would just measure cache hits. 30 seconds balances between capturing DNS changes and not flooding the resolver.

Traceroute

Property Value
Frequency Every 30s
Target 8.8.8.8
Max hops 20
Logged as traceroute

What it measures: The network path — every router hop between the device and the target, with per-hop latency.

Why it matters: Path changes correlate with performance changes. On satellite connections, a path change can indicate a beam handover or gateway switch. On terrestrial networks, path changes may indicate routing convergence events. The hop count itself is informative — satellite links typically show fewer hops (device → gateway → satellite → ground station → internet).

Implementation: Uses the system traceroute command on macOS/Linux (-n -q 1 -w 2 -m 20 — numeric, one probe per hop, 2s timeout, max 20 hops). Falls back to icmplib on Windows. Non-responding hops (* * *) are skipped in the output.

First hop tracking: The first responding hop is logged separately (first_hop=) because it identifies the local network gateway and can change if the device roams between access points.

WiFi Signal

Property Value
Frequency Every 10s
Platform macOS, Windows
Logged as wifi

What it measures: RF and PHY-layer metrics from the WiFi adapter:

Why it matters: Correlating signal metrics with performance metrics reveals whether issues are RF-related (weak signal, channel congestion) or network-related (routing, congestion, satellite handover). A latency spike that coincides with an RSSI drop is likely a local issue; one without signal change points to the upstream path.

The extended PHY metrics (channel width, band, PHY mode, MCS, spatial streams) are particularly useful for diagnosing why throughput is lower than expected. For example: connected on WiFi 6 (ax) but only 20 MHz channel width and 1 spatial stream explains a 100 Mbps ceiling. Or: PHY mode dropped from ac to n mid-capture, indicating the adapter fell back to a slower standard (often due to interference or range).

macOS implementation: Primary method is a compiled Swift binary using CoreWLAN framework (works on macOS Sequoia 15+, where the deprecated airport command was removed). The binary is compiled once on first run and cached at a hash-versioned path so source changes trigger recompile. Falls back to the airport -I command on older macOS (which does not provide channel width, band, PHY mode, MCS, or spatial stream data).

MCS index and spatial stream count are read via undocumented KVC (mcsIndex, numberOfSpatialStreams) on the CWInterface object. These keys have been stable across macOS versions but are not part of the public CoreWLAN API.

The macOS app can provide a pre-compiled binary via the WIFI_INFO_PATH environment variable, avoiding the first-run compilation step.

Note: macOS Sequoia redacts SSID and BSSID without Location Services entitlement. The app captures what's available.

Windows implementation: Parses netsh wlan show interfaces output. Signal percentage is converted to approximate dBm: (pct / 2) - 100. Extended PHY metrics (channel width, band, MCS, spatial streams) are not available via netsh.

Loss Burst Detection

Runs in the background, scanning new CSV rows every 30 seconds for consecutive ping losses. When 5 or more consecutive ICMP losses are detected, a loss_burst row is logged with:

The UTC second alignment is designed for satellite handover analysis: GEO satellites use 15-second beam cycles, so loss bursts that align to 15-second boundaries suggest handover events rather than random packet loss.

Uses incremental file reads — tracks file position between scans to avoid re-reading the entire CSV on long captures.

Loaded Latency (Bufferbloat Split)

Loaded latency measures how much latency increases when the network is under load — the defining characteristic of bufferbloat. While idle latency tells you how fast a quiet connection is, loaded latency tells you what users actually experience when downloads, uploads, video calls, and cloud sync are all running at once. Ookla's Speedtest now reports this as three separate measurements (idle, download-loaded, upload-loaded), and it's becoming a key metric for operators.

We compute loaded latency by post-hoc analysis of the continuous ping stream against speed test timing windows. Because pings run at 5/sec throughout the capture, we already have latency samples covering every speed test — no dedicated loaded-latency probe is needed.

How it works

During analysis, every ping is classified into one of three buckets:

  1. idle — no throughput test was running
  2. dl_loaded — a download speed test was in progress
  3. ul_loaded — an upload speed test was in progress

The classification uses throughput test timestamps with guard bands:

When download and upload phases overlap (common on asymmetric GEO satellite links where upload finishes during a long download), pings in the overlap region are assigned to the shorter-duration phase. This prevents the longer phase from dominating the classification.

Gateway vs internet decomposition

Because we ping both the gateway and internet targets at 5/sec, we compute bufferbloat separately for each hop:

No other consumer tool provides this decomposition, because no other tool pings the gateway during throughput tests.

Output

For each hop (gateway and internet), the analysis produces:

Field Description
idle.p50, idle.p95 Median and 95th percentile RTT when network is quiet
dl_loaded.p50, dl_loaded.p95 RTT during download speed tests
ul_loaded.p50, ul_loaded.p95 RTT during upload speed tests
bufferbloat_dl_ms dl_loaded.p50 − idle.p50 (the delta)
bufferbloat_ul_ms ul_loaded.p50 − idle.p50

The bufferbloat delta is rated on the results page:

Rating Delta
excellent < 20 ms
good < 50 ms
fair < 100 ms
impaired < 200 ms
broken ≥ 200 ms

Implementation

Implemented in Python (csv_extract.py:compute_bufferbloat_split()) and a parallel JavaScript port (bufferbloat.js) in the upload-signer Worker, so both local analysis and server-side data.json generation produce identical results. Displayed on results pages as an "Under Load" panel showing idle/DL/UL latency bars with a quality rating.

PEP Detection

Property Value
Metric pep.detected (boolean)
Method QUIC/TCP RTT ratio at shared target
Target cloudflare-quic.com:443
Threshold QUIC p50 / TCP p50 ≥ 5.0×
Min samples 10 per protocol
Constraint Only flags on GEO satellite links

What it measures: Whether a TCP Performance Enhancing Proxy (PEP) is intercepting TCP connections on the link. PEPs respond to TCP SYN locally, making TCP appear much faster than the actual satellite round-trip. QUIC uses UDP and bypasses the proxy entirely, so the QUIC/TCP ratio reveals the proxy's presence.

Why a shared target matters: Earlier captures compared TCP to one set of targets and QUIC to another. Different targets may have different routing, introducing noise. Both probes now test cloudflare-quic.com:443 (which supports both TCP and QUIC), giving an apples-to-apples comparison. Falls back to overall protocol stats for captures recorded before the shared target was added.

Why GEO-only: A high QUIC/TCP ratio on terrestrial links usually means the QUIC target is simply further away or QUIC is being throttled — not a PEP. PEPs are a satellite-specific technology, so the metric only triggers on GEO links (median RTT > 400ms).

Display: When detected, the results page shows a "TCP Acceleration Detected" banner explaining what the proxy is and how it affects measurements. The protocol comparison bars also get a "PEP" chip, and protocol ordering changes to highlight the TCP/QUIC gap.

Implementation: Python (csv_extract.py:compute_pep_index()) and JavaScript (pep.js in the upload-signer Worker).

ICMP Proxy Detection

Property Value
Metric icmp_proxied (boolean)
Method ICMP vs HTTP/QUIC/TCP RTT ratio
Threshold Ratio > 50× AND ICMP p50 < 5ms AND other protocol p50 > 50ms
Min samples 10 ICMP, 5 for comparison protocol

What it detects: Some in-flight networks have the onboard gateway answer ICMP echo replies locally instead of forwarding them over the satellite link. This makes ping latency appear sub-millisecond on a 600ms satellite connection — superficially great numbers that are completely misleading.

How it works: If ICMP p50 is below 5ms and a comparison protocol (HTTP, QUIC, or TCP) has p50 above 50ms and the ratio exceeds 50×, ICMP is flagged as proxied. The ratio guard is the primary discriminator — a 0.5ms ICMP with 355ms HTTP gives a ratio of 710×, far above the threshold. The 50ms floor prevents false positives on genuinely fast local networks.

What changes when detected: The results page shows an "ICMP Pings Answered Locally" banner. Latency ratings switch from ICMP to HTTP HEAD measurements, which the gateway can't fake. The latency spread panel re-labels as "Latency Spread (HTTP)" so the rating reflects real end-to-end performance, not the local gateway distance.

Coverage: Works on GEO satellite, LEO satellite, and hybrid networks (like EAN — European Aviation Network, a hybrid S-band + LTE system). Originally calibrated for GEO only; threshold was lowered from 400ms to 50ms after EAN captures showed the pattern at sub-GEO latencies.

Implementation: Python (io.py:detect_icmp_proxy()) and JavaScript (icmpProxy.js in the upload-signer Worker).

Adaptive Probe Backoff

When a probe experiences sustained failures, the monitor adapts its polling frequency rather than hammering an unresponsive target at full speed. This operates at two levels:

Loop-level backoff (AdaptiveInterval)

Each probe loop tracks consecutive losses and doubles its interval after a threshold period of continuous failure. The behavior is network-aware when a shared NetworkHealth signal is available:

On the first success after being throttled, the interval resets immediately to its base rate.

Per-target backoff (TargetBackoff)

Multi-target probe loops (ping, UDP, TCP) may have one target blocked while others work fine. TargetBackoff tracks failures per target and skips backed-off targets most iterations, retrying only every 60 seconds. When a backed-off target recovers, it returns to normal polling immediately.

This prevents a single blocked target from inflating the CSV with loss rows while keeping the other targets at full frequency.

Network Detection

At startup, the tool auto-detects the network environment:

Quality Tiers and What They Mean

Every metric is classified into a 6-tier rating system: superior → excellent → good → fair → impaired → broken. The overall capture rating is the worst of any individual metric — one bad metric drags the overall down. This is intentional: a connection with great download but unusable latency for video calls should not get an "excellent" rating.

The tiers are calibrated to the link type. GEO satellite has a ~600ms physics floor, so latency tiers shift up and cap at "good" — a 700ms GEO RTT is as good as it physically gets and shouldn't be penalized for what physics enforces. Below are the actual thresholds.

Latency (median RTT)

Tier Non-GEO GEO satellite
superior < 20 ms (n/a — physics floor)
excellent < 50 ms (n/a)
good < 100 ms ≤ 700 ms
fair < 200 ms ≤ 900 ms
impaired < 500 ms ≤ 1200 ms
broken ≥ 500 ms > 1200 ms

GEO detection: median RTT > 400 ms is treated as GEO satellite.

Packet loss

Tier Threshold
superior < 0.1 %
excellent < 0.5 %
good < 1 %
fair < 5 %
impaired < 10 %
broken ≥ 10 %

These thresholds are harmonized with the fleet-wide dashboard's connectivityLoss thresholds for cross-tool comparability.

Download throughput

Tier Mbps
superior ≥ 200
excellent ≥ 100
good ≥ 25
fair ≥ 12
impaired ≥ 3
broken < 3

Upload throughput

Tier Mbps
superior ≥ 100
excellent ≥ 25
good ≥ 5
fair ≥ 1
impaired ≥ 0.5
broken < 0.5

WiFi signal (RSSI)

Tier dBm
excellent ≥ -50
good ≥ -67
fair ≥ -70
impaired < -70

The thresholds derive from the WiFi Alliance signal classification guidelines, where ≥ -67 dBm is the typical threshold for reliable streaming.

CSV Format

All measurements write to a single CSV with columns:

timestamp_utc, measurement_type, target, rtt_ms, loss_bool, seq,
hop_count, throughput_mbps, dns_ms, burst_size, notes

The notes field carries structured key=value pairs specific to each measurement type (protocol details, proxy headers, calibration data, etc.).

A separate raw throughput samples CSV (*_throughput_raw.csv) records per-chunk timing data for TCP ramp-up analysis.

Post-Capture: Upload and Server-Side Analysis

After a capture completes, files are uploaded to Cloudflare R2. The upload Worker then generates the results page data automatically — clients upload raw measurements and are done.

Upload pipeline

  1. Client uploads raw files: CSV, meta.json (location, network, privacy settings), and throughput_raw.csv (per-chunk samples). Files are gzip-compressed (~5-6x) before upload.

  2. Worker generates data.json server-side by analyzing the CSV. The Worker runs the same analysis code (bufferbloat split, PEP detection, ICMP proxy detection, quality ratings, timeseries extraction) as a JavaScript port. Every upload automatically gets a results page — no client-side analysis needed.

  3. Worker updates captures-index.json with the new capture metadata, making it immediately visible on the browse page.

This replaced the earlier model where each client (Python, Swift, macOS app) generated data.json independently before upload. Server-side generation eliminated reliability gaps (5/358 historical captures had no results page due to client-side generation failures) and the maintenance burden of keeping Python and Swift analysis in sync.

Upload modes

  1. Worker proxy (default): Uploads via the Cloudflare Worker — no R2 credentials needed on the client. Config: upload_url + upload_token in ~/.wifi-monitor/r2.json. Used by all platforms.

  2. Direct S3 (admin): Uploads via boto3 with R2 credentials. Used by report.py --pull to download captures for report generation.

Config is auto-provisioned on first use from a bundled default_r2.json, which generates a unique uploader_id per machine. Upload state is tracked locally in ~/.wifi-monitor/uploaded.json to avoid re-uploading. Pending uploads (e.g., laptop closed before upload finished) are caught up on the next run.

Access control

The R2 bucket is served through a Cloudflare Worker at howismywifi.com. The capture listing and index API are restricted to VPN users. Individual results pages are accessible by direct link (capture IDs include a random component to prevent enumeration). Crawlers are blocked from /c/ paths, and brute-force scanning is rate-limited. See docs/r2-data-protection.md for full details.

Capture Management

The wifi-manage CLI provides post-capture operations on R2-hosted data.

Link related captures

wifi-manage link <id1> <id2> [...] ties multiple captures together (e.g., several short sessions from the same flight). Patches meta.json and data.json.gz on R2 to add bidirectional related_captures entries. Results pages render a "Related" banner with links to sibling captures.

Trim mixed captures

wifi-manage trim <id> splits a capture that spans two environments (e.g., inflight GEO satellite + post-landing terrestrial WiFi). Downloads the CSV from R2, filters to a specified time window, re-uploads as a new capture with correct aggregate stats.

Combined analysis

When captures are linked, a combined analysis script merges their data into a unified view — aggregated stats, merged timelines, and an overall quality assessment across all sessions. This gives a single-flight summary instead of forcing users to mentally piece together 5 separate results pages.

Hide / unhide captures

wifi-manage hide <id> marks a capture as hidden — excluded from the browse page and listings but still accessible by direct link. Data stays on R2 (nothing is deleted). wifi-manage unhide <id> reverses it. The results page also has a toggle button (VPN-only). Captures shorter than 1 minute are auto-hidden on upload.

macOS App Integration

The macOS app (macos-app/) is a native SwiftUI wrapper that spawns the Python tool as a subprocess with --json mode. Communication is JSON lines on stdout (events) and stderr (diagnostics).

The app provides: - SetupView: Configure location, network type, duration before capture - RunningView: Live stats during capture (latency, loss, throughput, RSSI) - ResultsView: Post-capture summary with quality ratings - CapturePickerView: Browse and load past captures from the captures directory - BootstrapView: First-run setup (installs Python dependencies via uv)

The Python side emits 20 JSON event types: info, status, probe, throughput_status, throughput_result, cdn_throughput_result, traceroute_result, complete, analysis_complete, upload_start, upload_complete, upload_error, upload_skipped, upload_summary, capture_reminder, capture_auto_stop, system_sleep, network_change, note, error.

Auto-update is handled by Sparkle. publish-update.sh signs the DMG, generates an appcast, and uploads to howismywifi.com/download/.

How We Compare to Other Tools

Network-quality tools occupy different points in a design space. Here's where WiFi Monitor sits relative to the major reference points, with sources cited so you can verify each comparison.

Feature Speedtest (Ookla) Cloudflare Speed Test Orb.net WiFi Monitor
Test duration ~30 sec single-shot ~30 sec single-shot continuous (24/7) 2 min → 30 min → continuous
Concurrent multi-protocol ❌ TCP only ❌ HTTP only ❌ HTTPS/h3 only ✅ ICMP + UDP + TCP + HTTP + QUIC
Adaptive duration for high-RTT ✅ scales to RTT ⚠️ fixed ⚠️ fixed ✅ 4s default, 15s GEO
Loaded latency / bufferbloat ✅ "working latency" ✅ AIM ⚠️ implicit ✅ idle / DL-loaded / UL-loaded + gateway split
Packet loss measurement ❌ not surfaced ✅ via WebRTC TURN ✅ over time ✅ from ICMP seq + UDP
Traceroute / path ✅ every 30s with hop deltas
TCP PEP / proxy detection ⚠️ implicit (h3 vs HTTPS) ✅ explicit via QUIC handshake
Multi-CDN diversity ❌ single server ❌ Cloudflare only ⚠️ Cloudflare + Fastly ✅ Cloudflare + Akamai + Netflix OCA
Real video-player measurement ✅ headless Chromium YouTube ABR + startup time
Streaming resolution tier (4K/1080p/720p/SD) ✅ Netflix (OCA) + YouTube (segment + ABR)
WiFi signal correlation (RSSI/PHY) ✅ RSSI + channel + MCS + NSS
Composite quality score ✅ AIM (3 scenarios) ✅ Orb Score 0–100 ✅ 6-tier per-metric (link-type aware)
Designed for satellite ❌ generic ❌ generic ❌ generic ✅ that is the whole point

The bold cells are wifi-monitor's distinctive strengths — none of the consumer tools above do them. Bufferbloat and PEP detection are surfaced as named metrics on every results page, and the full-stack video tier (Netflix max bitrate + real YouTube playback) closes the gap between "this link can theoretically do X Mbps" and "this is what streaming actually looks like on it." Our remaining gaps are in summary scoring — RPM as a single-number responsiveness metric, and per-scenario quality ratings (see "Coming Soon" below).

The deepest empirical comparison of these tools is MacMillan et al. (2023), A Comparative Analysis of Ookla Speedtest and M-Lab NDT7. Headline finding: on high-latency links (500–600ms RTT), single-stream short-duration tests under-report throughput by 60–70%. This is the empirical justification for our calibrated multi-stream approach with extended duration on GEO links.

What We Don't Measure (and Why)

Honest disclosure of methodology gaps. These are deliberate choices rather than oversights, and we revisit them as priorities shift.

IPv6 path testing. All probes target IPv4 today. Some inflight satellite networks have IPv6 quirks worth surfacing, but the implementation cost is moderate and inflight v6 is rare at the passenger edge. Tracked as a future enhancement.

PMTU / MTU discovery. Path MTU black-holes are a known inflight gotcha (especially in VPN-in-VPN scenarios — corporate VPN inside satellite VPN). We don't probe for them today. No consumer tool does either, but it's a real diagnostic gap. Tracked.

HLS, DASH-outside-YouTube, and WebRTC measurement. The YouTube ABR probe simulates one real video player end-to-end — startup time, selected resolution, stalls, ABR switches, dropped frames — and the Netflix probe measures real OCA throughput mapped to a resolution tier. What we still don't do: generic HLS rebuffering tests against arbitrary streams (Disney+, Hulu, Apple TV), or WebRTC connection setup (Zoom / Teams / Webex call-quality simulation). The YouTube probe is the closest a passenger-side tool can get without operator cooperation, and it captures the failure modes that matter most for in-flight video. WebRTC simulation in particular is tracked as a roadmap item.

Real-time UDP packet loss (separate from ICMP). Packet loss today is inferred from ICMP sequence gaps. Real-time apps care about UDP loss, which can diverge from ICMP loss on networks that shape ICMP. We could add a UDP echo probe but it requires hosting a small echo server. Deferred until ICMP-shaping is shown to mislead our numbers.

Per-passenger fleet aggregation. howismywifi.com displays individual captures. We don't aggregate across captures to produce "median GEO Viasat capture by airline" or "this route compared to last month." That comparative context is a credibility flywheel and is tracked as a roadmap item.

Single-CDN comparisons across paths. We test multiple CDNs from one device, but we don't compare the same CDN reached from different satellite ground stations or from a single-passenger device vs. gateway-hosted measurements. A research-grade question, out of scope.

Long-term persistent monitoring of a single endpoint. Orb.net does this; we don't. Different tool, different design center.

Coming Soon

Active development, ordered by priority. See docs/plan-research-improvements-priorities.md for the full roadmap.

RPM (Round-trips Per Minute) as a responsiveness metric. RPM = 60,000 / loaded_round_trip_ms. Apple ships it in the OS, Cloudflare ships it via the mach CLI, the IETF has standardized it. "Higher is better" framing translates across link types in a way that raw milliseconds doesn't — a great GEO satellite link still has a poor RPM, which correctly forecasts that interactive use will struggle. We'll integrate via mach (with AIM upload disabled) and surface RPM alongside Mbps.

Per-scenario quality scores. Instead of a single overall rating, "what can this link actually do right now?" — Voice call: ★★★, HD streaming: ★★★★, Gaming: ★★, etc. Calibrated for the link type (a great-for-aviation 200ms RTT GEO link shouldn't always score badly). Scenario list tuned for the actual audience: video conferencing on satellite (Zoom/Teams/Webex), VPN-tunneled work, cloud sync (Slack/Dropbox/iCloud), SSH/remote shell. Reference: Cloudflare AIM scoring and LibreQoS QoO.

Design Considerations

Why Multi-Protocol?

No single measurement tells the whole story:

The power is in the comparison. When ICMP says 600ms and TCP says 5ms, you've found a PEP. When QUIC says 600ms and TCP says 5ms, the PEP is confirmed — QUIC bypasses the proxy entirely. When ICMP is fine but HTTP spikes, the problem is at the application layer.

Why High Frequency?

Ping and UDP at 5/sec (not 1/sec) because:

TCP and HTTP are slower (2s, 5s) because they're heavier: each probe opens a connection and does real work. The trade-off is more detail from lightweight probes, broader coverage from heavyweight probes.

Why Subprocess Ping on macOS?

macOS's ICMP implementation via SOCK_DGRAM (non-privileged sockets, used by icmplib) introduces measurement artifacts: periodic ~1-second latency spikes that appear real but are OS-level queuing artifacts. The system ping binary uses SOCK_RAW (setuid) and doesn't have this issue.

This was a hard-won finding after investigating mysterious periodic spikes that only appeared on macOS with VPN active. Full investigation in docs/vpn-periodic-latency-spikes.md.

Why Cloudflare for Throughput?

Why Iterative Calibration?

A fixed transfer size either wastes time on slow links (downloading 100 MB on a 3 Mbps hotel connection takes 4.5 minutes) or undersamples fast links (5 MB on a 500 Mbps fiber completes in 80ms, mostly TCP ramp-up).

The iterative approach (5→10→20→40→100 MB until convergence) adapts to the actual link speed in 2-3 rounds, using minimal data on slow links and scaling up only as needed. Total calibration overhead: 15-175 MB depending on speed.

Why Wall-Clock Upload Timing?

When uploading, sample-based timing (recording timestamps as data is generated) measures how fast the OS accepts data into the TCP send buffer, not how fast it actually leaves the machine. On fast local networks, the buffer fills almost instantly while the actual upload takes much longer.

Wall-clock timing (total_bytes / elapsed_time) is correct because requests.post() only returns after the server acknowledges receipt. The full round-trip is included in the measurement.

Download doesn't have this problem because iter_content() blocks until data actually arrives from the network.

References & Further Reading

The methodology choices above draw on a body of network-measurement research and standards. The most useful sources, organized by topic.

Standards & specifications

Comparative analysis

Bufferbloat

Tool methodologies

Inflight-specific

Internal documents