perf(tuic-server): fix browser-traffic latency in the relay data plane#17
Merged
Conversation
Three issues in the server data plane that made proxied browsing feel slow: 1. Outbound TCP never set TCP_NODELAY. Proxied browser traffic is dominated by small writes (TLS records, HTTP request headers); with Nagle on, each small segment waits for the previous ACK and interacts with the peer's delayed-ACK timer to add up to ~40ms per round trip. Now set on the direct outbound and both SOCKS5 egress paths (the QUIC inbound has no such buffering, so this is the only hop that needs it). 2. The quinn server backend ignored the configured congestion controller and initial window — it always used quinn's default (CUBIC, small initial window), so backend.quinn.congestion_control was silently a no-op. Wire it through TuicInboundOpts into the TransportConfig, selecting BBR / Cubic / NewReno and applying the initial window. Short browser connections now leave slow-start faster. 3. The relay used copy_bidirectional's default 8 KiB buffer, capping single-stream throughput over the high-BDP QUIC tunnel. Bump to 64 KiB via copy_bidirectional_with_sizes, centralized in wind_core::io::copy_io and now used by every relay path (direct + both SOCKS5 paths). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Investigated and fixed three issues in the tuic-server data plane that made proxied browser traffic feel slow. All three are on the per-connection hot path.
Changes
1. Enable
TCP_NODELAYon outbound TCP (highest-impact)Outbound sockets never disabled Nagle's algorithm. Proxied browser traffic is dominated by small writes (TLS records, HTTP request headers); with Nagle on, each small segment waits for the previous ACK and interacts with the peer's delayed-ACK timer to add up to ~40 ms of latency per round trip — the classic "browsing feels laggy through the proxy" symptom.
Now set on:
connect_direct_tcp(wind-base/src/direct.rs)get_socket_ref().set_nodelay(true)(wind-socks/src/action.rs)wind-socks/src/outbound.rs)The QUIC inbound has no such buffering, so the outbound hop is the only one that needs it.
2. Wire configured congestion control into the quinn server backend
The quinn server always used quinn's default (CUBIC + small initial window). The operator's
backend.quinn.congestion_control(controller +initial_window) was silently ignored — only the quiche backend honored it.TuicInboundOptsnow carriescongestion_control+initial_window, andcreate_server_configbuilds aControllerFactory(BBR viaquinn-congestions, or Cubic / NewReno viaquinn::congestion) with the configured initial window. Short-lived browser connections now leave slow-start faster instead of trickling the first few round trips.3. Bump relay buffer 8 KiB → 64 KiB
The relay used
copy_bidirectional's default 8 KiB buffer, capping single-stream throughput over the high bandwidth-delay-product QUIC tunnel. Switchedwind_core::io::copy_iotocopy_bidirectional_with_sizeswith a 64 KiBRELAY_BUF_SIZE, and routed every relay path (direct + both SOCKS5 paths) throughcopy_ioso the larger buffer applies uniformly and the duplicated rawcopy_bidirectionalcalls are gone.Notes for reviewers
CongestionControllerin tuic-server is a type alias forwind_tuic::quinn::CongestionControl, so the config value flows straight through without a mapping layer (unlike the quiche path, which maps to a different enum).bbrandbbr3config aliases both map to the singlequinn-congestionsBBR implementation (matching the existing client-side outbound behavior).Testing
cargo checkacross all touched crates: clean.cargo test -p wind-core -p wind-base -p wind-socks -p wind-tuic: 114 passed, 0 failed.🤖 Generated with Claude Code