Fix kernel interrupt on client disconnect with FastAPI 0.136.3#278
Fix kernel interrupt on client disconnect with FastAPI 0.136.3#278mishushakov wants to merge 1 commit into
Conversation
Bumping FastAPI 0.111.0 -> 0.136.3 pulls in Starlette 1.2.1, which broke the #213 disconnect->interrupt behavior. Starlette >= 1.0 takes a new StreamingResponse path for ASGI spec_version >= 2.4 (advertised by uvicorn 0.30.1): it no longer runs listen_for_disconnect concurrently and no longer cancels the response body iterator on http.disconnect. The interrupt relied on that cancellation, so an abandoned execution was never interrupted and the next execution blocked behind it and timed out. This was the only failing test in both SDKs on the Renovate bump (#207): - js: tests/interrupt.test.ts > subsequent execution works after client timeout - python: test_async_interrupt.py::test_subsequent_execution_works_after_client_timeout Detect the disconnect explicitly: thread the Request into execute(), and on each keepalive tick poll request.is_disconnected(). When it flips, raise an internal _ClientDisconnected, interrupt the kernel, and stop streaming. The old cancellation path is still handled for older Starlette. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR SummaryMedium Risk Overview Starlette no longer cancels the streaming body on disconnect under ASGI 2.4+, so disconnect handling no longer relies only on A changeset documents the template patch for release notes. Reviewed by Cursor Bugbot for commit cf6255b. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf6255b891
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # after polling request.is_disconnected() (Starlette >= 1.0). | ||
| try: | ||
| async for item in self._wait_for_result(message_id): | ||
| async for item in self._wait_for_result(message_id, request=request): |
There was a problem hiding this comment.
Preserve the FastAPI request for disconnect polling
For any execution that produces no output for KEEPALIVE_INTERVAL (for example time.sleep(10)), the local request has already been rebound above to the JSON websocket payload string, so this call passes a str into _wait_for_result. On the first timeout _wait_for_result calls await request.is_disconnected(), which raises AttributeError and aborts the stream instead of continuing or interrupting on an actual disconnect. Keep the FastAPI Request in a separate variable from the Jupyter execute payload.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cf6255b. Configure here.
| # after polling request.is_disconnected() (Starlette >= 1.0). | ||
| try: | ||
| async for item in self._wait_for_result(message_id): | ||
| async for item in self._wait_for_result(message_id, request=request): |
There was a problem hiding this comment.
HTTP request shadowed in execute
High Severity
The Starlette Request passed into execute() is overwritten when the send loop assigns request to the Jupyter execute payload string. _wait_for_result then receives that payload instead of the HTTP request, so request.is_disconnected() is never called and disconnect-driven kernel interrupts on Starlette 1.x do not run.
Reviewed by Cursor Bugbot for commit cf6255b. Configure here.
| raise _ClientDisconnected() | ||
| # Yield a keepalive so Starlette writes to the socket. This | ||
| # also forces uvicorn to notice a dead connection on the next | ||
| # tick, so is_disconnected() flips to True. |
There was a problem hiding this comment.
Disconnect not polled during output
Medium Severity
On Starlette 1.x the stream body is no longer cancelled on disconnect, and this change only calls request.is_disconnected() in the queue.get() timeout branch. While the kernel keeps emitting stdout/stderr more often than every KEEPALIVE_INTERVAL, disconnect is never checked, so the kernel may not be interrupted and a later /execute can stay blocked.
Reviewed by Cursor Bugbot for commit cf6255b. Configure here.
| # When we detected the disconnect ourselves, stop streaming | ||
| # gracefully. Cancellation/GeneratorExit must propagate. | ||
| if isinstance(e, _ClientDisconnected): | ||
| return |
There was a problem hiding this comment.
Write disconnect not handled
Medium Severity
The streaming disconnect handler only catches CancelledError, GeneratorExit, and internal _ClientDisconnected. On Starlette ≥ 1.0 a dead client can also surface as connection errors (including Starlette’s ClientDisconnect) while flushing a yielded chunk, so those paths skip interrupt() even though the finally block drops the execution entry.
Reviewed by Cursor Bugbot for commit cf6255b. Configure here.
| await execution.queue.put(UnexpectedEndOfExecution()) | ||
|
|
||
| # Stream the results. | ||
| # If the client disconnects (Starlette cancels the task), we | ||
| # interrupt the kernel so the next execution isn't blocked (#213). | ||
| # If the client disconnects, we interrupt the kernel so the next | ||
| # execution isn't blocked (#213). Depending on the Starlette | ||
| # version this surfaces either as cancellation of this generator | ||
| # (older Starlette) or as a _ClientDisconnected we raise ourselves | ||
| # after polling request.is_disconnected() (Starlette >= 1.0). | ||
| try: | ||
| async for item in self._wait_for_result(message_id): | ||
| async for item in self._wait_for_result(message_id, request=request): |
There was a problem hiding this comment.
🔴 Critical: The new request: Optional[Request] = None parameter on execute() is shadowed by the local assignment at line 415 — request = self._get_execute_request(message_id, complete_code, False) — which returns a JSON str. When self._wait_for_result(message_id, request=request) then runs, request.is_disconnected() on the string raises AttributeError on the first 5s keepalive tick. AttributeError isn't in execute()'s except (CancelledError, GeneratorExit, _ClientDisconnected) tuple, so it propagates, the kernel is never interrupted, and the very test_subsequent_execution_works_after_client_timeout test this PR exists to fix will still fail. Fix: rename the local payload variable (e.g. exec_payload).
Extended reasoning...
What the bug is
The PR adds a new request: Optional[Request] = None parameter to ContextWebSocket.execute() (messaging.py:341) so that _wait_for_result can poll request.is_disconnected() and raise _ClientDisconnected to interrupt the kernel. The problem is that inside execute()'s retry loop at lines 411-415, the local variable name request is reused for the JSON payload to send over the WebSocket:
request = self._get_execute_request(message_id, complete_code, False)
await self._ws.send(request)_get_execute_request is annotated -> str and returns json.dumps(...). This rebinds the local name request to a string, shadowing the Request parameter for the remainder of the function.
How it manifests
At line 421, async for item in self._wait_for_result(message_id, request=request): then passes the JSON string (not the original Starlette Request) into _wait_for_result. Inside _wait_for_result (lines 295-296):
if request is not None and await request.is_disconnected():
raise _ClientDisconnected()A non-empty string is truthy, so Python evaluates request.is_disconnected — strings have no such attribute, so Python raises AttributeError: 'str' object has no attribute 'is_disconnected' before await even runs.
Why existing code doesn't prevent it
execute()'s exception handler only catches (asyncio.CancelledError, GeneratorExit, _ClientDisconnected) (line 423). AttributeError is not in that tuple, so it escapes the generator and propagates up through StreamingListJsonResponse, breaking the streaming response entirely. The kernel-interrupt path on line 427 never runs.
Impact
The KEEPALIVE_INTERVAL is 5 seconds, so any execution that takes longer than 5 seconds will trigger the first timeout and immediately crash with AttributeError. This means:
- The disconnect-detection mechanism this PR adds is dead on arrival —
_ClientDisconnectedis never raised in normal flow. - The keepalive mechanism itself is now broken for any execution > 5s.
- The kernel is never interrupted on client disconnect, leaving the failing-test scenario (
test_subsequent_execution_works_after_client_timeout, which sleeps 300s) unfixed.
Step-by-step proof
- Client
POST /executewith code that runs long enough to require keepalives (e.g.time.sleep(300)). post_execute(main.py:126) callsws.execute(..., request=request), whererequestis the StarletteRequest.- Inside
execute(), the for-loop body executesrequest = self._get_execute_request(message_id, complete_code, False)at line 415.requestis now'{"header": {...}, ...}'(a JSON string). await self._ws.send(request)at line 416 sends that string, thenbreakexits the loop withrequeststill bound to the string.- Line 421 calls
self._wait_for_result(message_id, request=request)— passing the string. - After 5 seconds with no kernel output,
asyncio.wait_for(queue.get(), timeout=5)raisesTimeoutError. - The handler evaluates
if request is not None and await request.is_disconnected():. The string is non-None, so it accessesrequest.is_disconnected→AttributeError. - The
AttributeErrorpropagates out of_wait_for_resultintoexecute()'stryblock. - The
except (CancelledError, GeneratorExit, _ClientDisconnected)clause does not match, so it propagates out ofexecute(), breaking the streaming response.self.interrupt()is never called. - Result: client gets a broken stream, kernel keeps running, and the next
POST /executeblocks waiting for the lock (the same regressionasyncio.Lockinmessaging.pynot released on client disconnect → cascading timeouts #213 was meant to fix).
How to fix
Rename the local payload variable so the Request parameter is preserved. Smallest patch is at lines 411-416:
exec_payload = self._get_execute_request(message_id, complete_code, False)
await self._ws.send(exec_payload)(_cleanup_env_vars and change_current_directory also use a local named request for the JSON payload, but they don't take a Request parameter, so the shadowing only matters in execute().)


What
Supersedes the Renovate bump #207 (
fastapi 0.111.0 → 0.136.3), which fails CI, by including the dependency bump plus the fix it requires.The problem
PR #207 only bumps
fastapi==0.111.0→fastapi==0.136.3intemplate/server/requirements.txt, yet both SDK jobs fail — on the same test:tests/interrupt.test.ts > subsequent execution works after client timeouttest_async_interrupt.py::test_subsequent_execution_works_after_client_timeoutBoth fail with a
TimeoutError/TimeoutExceptionon the second execution.Root cause
This is the #213 behavior: when a client disconnects mid-execution, the server interrupts the kernel so the next execution isn't blocked. The interrupt relied on Starlette cancelling the streaming response body iterator on
http.disconnect.Bumping FastAPI
0.111.0→0.136.3pulls Starlette0.37.2→1.2.1. Starlette ≥ 1.0 added a newStreamingResponse.__call__path for ASGIspec_version >= (2, 4)(which uvicorn 0.30.1 advertises):It no longer runs
listen_for_disconnectconcurrently and no longer cancels the body iterator. So theexecute()generator never receivesCancelledError/GeneratorExit, the kernel is never interrupted, and the abandonedtime.sleep(300)blocks the next execution until it times out.Verified empirically against
starlette==1.2.1+uvicorn==0.30.1: on client disconnect the body generator is not cancelled, butrequest.is_disconnected()does flip toTrue.The fix
Detect the disconnect explicitly instead of relying on Starlette cancellation:
RequestintoContextWebSocket.execute()._wait_for_result, pollrequest.is_disconnected(). When it flips, raise an internal_ClientDisconnected.execute()catches it (alongside the existingCancelledError/GeneratorExitpath for older Starlette), interrupts the kernel, and stops streaming gracefully.End-to-end reproduction (uvicorn 0.30.1 + starlette 1.2.1, mirroring the real queue/keepalive/
StreamingListJsonResponsestructure) confirms the kernel is interrupted within ~1–2s of disconnect.Notes
X-Access-Token(hyphenated).ruff check/ruff formatpass.🤖 Generated with Claude Code