Skip to content

perf(tuic-server): fix browser-traffic latency in the relay data plane#17

Merged
Itsusinn merged 1 commit into
mainfrom
perf/browser-latency-relay
Jun 7, 2026
Merged

perf(tuic-server): fix browser-traffic latency in the relay data plane#17
Itsusinn merged 1 commit into
mainfrom
perf/browser-latency-relay

Conversation

@Itsusinn

@Itsusinn Itsusinn commented Jun 7, 2026

Copy link
Copy Markdown
Member

Summary

Investigated and fixed three issues in the tuic-server data plane that made proxied browser traffic feel slow. All three are on the per-connection hot path.

Changes

1. Enable TCP_NODELAY on outbound TCP (highest-impact)

Outbound sockets never disabled Nagle's algorithm. Proxied browser traffic is dominated by small writes (TLS records, HTTP request headers); with Nagle on, each small segment waits for the previous ACK and interacts with the peer's delayed-ACK timer to add up to ~40 ms of latency per round trip — the classic "browsing feels laggy through the proxy" symptom.

Now set on:

  • connect_direct_tcp (wind-base/src/direct.rs)
  • SOCKS5 action egress via get_socket_ref().set_nodelay(true) (wind-socks/src/action.rs)
  • SOCKS5 chained outbound direct socket (wind-socks/src/outbound.rs)

The QUIC inbound has no such buffering, so the outbound hop is the only one that needs it.

2. Wire configured congestion control into the quinn server backend

The quinn server always used quinn's default (CUBIC + small initial window). The operator's backend.quinn.congestion_control (controller + initial_window) was silently ignored — only the quiche backend honored it.

TuicInboundOpts now carries congestion_control + initial_window, and create_server_config builds a ControllerFactory (BBR via quinn-congestions, or Cubic / NewReno via quinn::congestion) with the configured initial window. Short-lived browser connections now leave slow-start faster instead of trickling the first few round trips.

3. Bump relay buffer 8 KiB → 64 KiB

The relay used copy_bidirectional's default 8 KiB buffer, capping single-stream throughput over the high bandwidth-delay-product QUIC tunnel. Switched wind_core::io::copy_io to copy_bidirectional_with_sizes with a 64 KiB RELAY_BUF_SIZE, and routed every relay path (direct + both SOCKS5 paths) through copy_io so the larger buffer applies uniformly and the duplicated raw copy_bidirectional calls are gone.

Notes for reviewers

  • CongestionController in tuic-server is a type alias for wind_tuic::quinn::CongestionControl, so the config value flows straight through without a mapping layer (unlike the quiche path, which maps to a different enum).
  • bbr and bbr3 config aliases both map to the single quinn-congestions BBR implementation (matching the existing client-side outbound behavior).
  • A fourth finding from the investigation — the default System DNS resolver has no caching — was intentionally left out of this PR per scope.

Testing

  • cargo check across all touched crates: clean.
  • cargo test -p wind-core -p wind-base -p wind-socks -p wind-tuic: 114 passed, 0 failed.
  • Formatted with nightly rustfmt (repo uses unstable rustfmt options).

🤖 Generated with Claude Code

Three issues in the server data plane that made proxied browsing feel
slow:

1. Outbound TCP never set TCP_NODELAY. Proxied browser traffic is
   dominated by small writes (TLS records, HTTP request headers); with
   Nagle on, each small segment waits for the previous ACK and interacts
   with the peer's delayed-ACK timer to add up to ~40ms per round trip.
   Now set on the direct outbound and both SOCKS5 egress paths (the QUIC
   inbound has no such buffering, so this is the only hop that needs it).

2. The quinn server backend ignored the configured congestion controller
   and initial window — it always used quinn's default (CUBIC, small
   initial window), so backend.quinn.congestion_control was silently a
   no-op. Wire it through TuicInboundOpts into the TransportConfig,
   selecting BBR / Cubic / NewReno and applying the initial window. Short
   browser connections now leave slow-start faster.

3. The relay used copy_bidirectional's default 8 KiB buffer, capping
   single-stream throughput over the high-BDP QUIC tunnel. Bump to 64 KiB
   via copy_bidirectional_with_sizes, centralized in wind_core::io::copy_io
   and now used by every relay path (direct + both SOCKS5 paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Itsusinn Itsusinn closed this Jun 7, 2026
@Itsusinn Itsusinn deleted the perf/browser-latency-relay branch June 7, 2026 14:23
@Itsusinn Itsusinn restored the perf/browser-latency-relay branch June 7, 2026 14:28
@Itsusinn Itsusinn reopened this Jun 7, 2026
@Itsusinn Itsusinn merged commit 20b602c into main Jun 7, 2026
34 checks passed
@Itsusinn Itsusinn deleted the perf/browser-latency-relay branch June 7, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant