Fix kernel interrupt on client disconnect with FastAPI 0.136.3 by mishushakov · Pull Request #278 · e2b-dev/code-interpreter

mishushakov · 2026-06-01T19:35:59Z

What

Supersedes the Renovate bump #207 (fastapi 0.111.0 → 0.136.3), which fails CI, by including the dependency bump plus the fix it requires.

The problem

PR #207 only bumps fastapi==0.111.0 → fastapi==0.136.3 in template/server/requirements.txt, yet both SDK jobs fail — on the same test:

js: tests/interrupt.test.ts > subsequent execution works after client timeout
python: test_async_interrupt.py::test_subsequent_execution_works_after_client_timeout

Both fail with a TimeoutError/TimeoutException on the second execution.

Root cause

This is the #213 behavior: when a client disconnects mid-execution, the server interrupts the kernel so the next execution isn't blocked. The interrupt relied on Starlette cancelling the streaming response body iterator on http.disconnect.

Bumping FastAPI 0.111.0 → 0.136.3 pulls Starlette 0.37.2 → 1.2.1. Starlette ≥ 1.0 added a new StreamingResponse.__call__ path for ASGI spec_version >= (2, 4) (which uvicorn 0.30.1 advertises):

if spec_version >= (2, 4):
    try:
        await self.stream_response(send)   # no task group, no listen_for_disconnect
    except OSError:
        raise ClientDisconnect()

It no longer runs listen_for_disconnect concurrently and no longer cancels the body iterator. So the execute() generator never receives CancelledError/GeneratorExit, the kernel is never interrupted, and the abandoned time.sleep(300) blocks the next execution until it times out.

Verified empirically against starlette==1.2.1 + uvicorn==0.30.1: on client disconnect the body generator is not cancelled, but request.is_disconnected() does flip to True.

The fix

Detect the disconnect explicitly instead of relying on Starlette cancellation:

Thread the Request into ContextWebSocket.execute().
On each keepalive tick in _wait_for_result, poll request.is_disconnected(). When it flips, raise an internal _ClientDisconnected.
execute() catches it (alongside the existing CancelledError/GeneratorExit path for older Starlette), interrupts the kernel, and stops streaming gracefully.

End-to-end reproduction (uvicorn 0.30.1 + starlette 1.2.1, mirroring the real queue/keepalive/StreamingListJsonResponse structure) confirms the kernel is interrupted within ~1–2s of disconnect.

Notes

The fastapi 0.136 change "do not accept underscore headers" does not affect us — the only header read is X-Access-Token (hyphenated).
ruff check / ruff format pass.

🤖 Generated with Claude Code

Bumping FastAPI 0.111.0 -> 0.136.3 pulls in Starlette 1.2.1, which broke the #213 disconnect->interrupt behavior. Starlette >= 1.0 takes a new StreamingResponse path for ASGI spec_version >= 2.4 (advertised by uvicorn 0.30.1): it no longer runs listen_for_disconnect concurrently and no longer cancels the response body iterator on http.disconnect. The interrupt relied on that cancellation, so an abandoned execution was never interrupted and the next execution blocked behind it and timed out. This was the only failing test in both SDKs on the Renovate bump (#207): - js: tests/interrupt.test.ts > subsequent execution works after client timeout - python: test_async_interrupt.py::test_subsequent_execution_works_after_client_timeout Detect the disconnect explicitly: thread the Request into execute(), and on each keepalive tick poll request.is_disconnected(). When it flips, raise an internal _ClientDisconnected, interrupt the kernel, and stop streaming. The old cancellation path is still handled for older Starlette. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor · 2026-06-01T19:36:07Z

PR Summary

Medium Risk
Touches core execute streaming and kernel interrupt semantics; reviewers should confirm the Starlette Request is not overwritten by the Jupyter execute_request JSON variable inside execute() before _wait_for_result runs disconnect polling.

Overview
Bumps FastAPI to 0.136.3 (Starlette ≥ 1.0) and restores #213 behavior: when a client abandons a streaming /execute response, the Jupyter kernel is interrupted so the next run is not blocked.

Starlette no longer cancels the streaming body on disconnect under ASGI 2.4+, so disconnect handling no longer relies only on CancelledError/GeneratorExit. The HTTP Request is threaded from post_execute into ContextWebSocket.execute, and during each keepalive tick in _wait_for_result the server polls request.is_disconnected(), raises _ClientDisconnected, interrupts the kernel (same shielded path as before), and stops streaming gracefully. Older Starlette still uses cancellation.

A changeset documents the template patch for release notes.

^{Reviewed by Cursor Bugbot for commit cf6255b. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf6255b891

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T19:37:25Z

+            # after polling request.is_disconnected() (Starlette >= 1.0).
            try:
-                async for item in self._wait_for_result(message_id):
+                async for item in self._wait_for_result(message_id, request=request):


Preserve the FastAPI request for disconnect polling

For any execution that produces no output for KEEPALIVE_INTERVAL (for example time.sleep(10)), the local request has already been rebound above to the JSON websocket payload string, so this call passes a str into _wait_for_result. On the first timeout _wait_for_result calls await request.is_disconnected(), which raises AttributeError and aborts the stream instead of continuing or interrupting on an actual disconnect. Keep the FastAPI Request in a separate variable from the Jupyter execute payload.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit cf6255b. Configure here.}

cursor · 2026-06-01T19:39:26Z

+            # after polling request.is_disconnected() (Starlette >= 1.0).
            try:
-                async for item in self._wait_for_result(message_id):
+                async for item in self._wait_for_result(message_id, request=request):


HTTP request shadowed in execute

High Severity

The Starlette Request passed into execute() is overwritten when the send loop assigns request to the Jupyter execute payload string. _wait_for_result then receives that payload instead of the HTTP request, so request.is_disconnected() is never called and disconnect-driven kernel interrupts on Starlette 1.x do not run.

^{Reviewed by Cursor Bugbot for commit cf6255b. Configure here.}

cursor · 2026-06-01T19:39:26Z

+                    raise _ClientDisconnected()
+                # Yield a keepalive so Starlette writes to the socket. This
+                # also forces uvicorn to notice a dead connection on the next
+                # tick, so is_disconnected() flips to True.


Disconnect not polled during output

Medium Severity

On Starlette 1.x the stream body is no longer cancelled on disconnect, and this change only calls request.is_disconnected() in the queue.get() timeout branch. While the kernel keeps emitting stdout/stderr more often than every KEEPALIVE_INTERVAL, disconnect is never checked, so the kernel may not be interrupted and a later /execute can stay blocked.

^{Reviewed by Cursor Bugbot for commit cf6255b. Configure here.}

cursor · 2026-06-01T19:39:26Z

+                # When we detected the disconnect ourselves, stop streaming
+                # gracefully. Cancellation/GeneratorExit must propagate.
+                if isinstance(e, _ClientDisconnected):
+                    return


Write disconnect not handled

Medium Severity

The streaming disconnect handler only catches CancelledError, GeneratorExit, and internal _ClientDisconnected. On Starlette ≥ 1.0 a dead client can also surface as connection errors (including Starlette’s ClientDisconnect) while flushing a yielded chunk, so those paths skip interrupt() even though the finally block drops the execution entry.

^{Reviewed by Cursor Bugbot for commit cf6255b. Configure here.}

claude · 2026-06-01T19:41:33Z

                await execution.queue.put(UnexpectedEndOfExecution())

            # Stream the results.
-            # If the client disconnects (Starlette cancels the task), we
-            # interrupt the kernel so the next execution isn't blocked (#213).
+            # If the client disconnects, we interrupt the kernel so the next
+            # execution isn't blocked (#213). Depending on the Starlette
+            # version this surfaces either as cancellation of this generator
+            # (older Starlette) or as a _ClientDisconnected we raise ourselves
+            # after polling request.is_disconnected() (Starlette >= 1.0).
            try:
-                async for item in self._wait_for_result(message_id):
+                async for item in self._wait_for_result(message_id, request=request):


🔴 Critical: The new request: Optional[Request] = None parameter on execute() is shadowed by the local assignment at line 415 — request = self._get_execute_request(message_id, complete_code, False) — which returns a JSON str. When self._wait_for_result(message_id, request=request) then runs, request.is_disconnected() on the string raises AttributeError on the first 5s keepalive tick. AttributeError isn't in execute()'s except (CancelledError, GeneratorExit, _ClientDisconnected) tuple, so it propagates, the kernel is never interrupted, and the very test_subsequent_execution_works_after_client_timeout test this PR exists to fix will still fail. Fix: rename the local payload variable (e.g. exec_payload).

Extended reasoning...

What the bug is

The PR adds a new request: Optional[Request] = None parameter to ContextWebSocket.execute() (messaging.py:341) so that _wait_for_result can poll request.is_disconnected() and raise _ClientDisconnected to interrupt the kernel. The problem is that inside execute()'s retry loop at lines 411-415, the local variable name request is reused for the JSON payload to send over the WebSocket:

request = self._get_execute_request(message_id, complete_code, False) await self._ws.send(request)

_get_execute_request is annotated -> str and returns json.dumps(...). This rebinds the local name request to a string, shadowing the Request parameter for the remainder of the function.

How it manifests

At line 421, async for item in self._wait_for_result(message_id, request=request): then passes the JSON string (not the original Starlette Request) into _wait_for_result. Inside _wait_for_result (lines 295-296):

if request is not None and await request.is_disconnected(): raise _ClientDisconnected()

A non-empty string is truthy, so Python evaluates request.is_disconnected — strings have no such attribute, so Python raises AttributeError: 'str' object has no attribute 'is_disconnected' before await even runs.

Why existing code doesn't prevent it

execute()'s exception handler only catches (asyncio.CancelledError, GeneratorExit, _ClientDisconnected) (line 423). AttributeError is not in that tuple, so it escapes the generator and propagates up through StreamingListJsonResponse, breaking the streaming response entirely. The kernel-interrupt path on line 427 never runs.

Impact

The KEEPALIVE_INTERVAL is 5 seconds, so any execution that takes longer than 5 seconds will trigger the first timeout and immediately crash with AttributeError. This means:

The disconnect-detection mechanism this PR adds is dead on arrival — _ClientDisconnected is never raised in normal flow.

The keepalive mechanism itself is now broken for any execution > 5s.

The kernel is never interrupted on client disconnect, leaving the failing-test scenario (test_subsequent_execution_works_after_client_timeout, which sleeps 300s) unfixed.

Step-by-step proof

Client POST /execute with code that runs long enough to require keepalives (e.g. time.sleep(300)).

post_execute (main.py:126) calls ws.execute(..., request=request), where request is the Starlette Request.

Inside execute(), the for-loop body executes request = self._get_execute_request(message_id, complete_code, False) at line 415. request is now '{"header": {...}, ...}' (a JSON string).

await self._ws.send(request) at line 416 sends that string, then break exits the loop with request still bound to the string.

Line 421 calls self._wait_for_result(message_id, request=request) — passing the string.

After 5 seconds with no kernel output, asyncio.wait_for(queue.get(), timeout=5) raises TimeoutError.

The handler evaluates if request is not None and await request.is_disconnected():. The string is non-None, so it accesses request.is_disconnected → AttributeError.

The AttributeError propagates out of _wait_for_result into execute()'s try block.

The except (CancelledError, GeneratorExit, _ClientDisconnected) clause does not match, so it propagates out of execute(), breaking the streaming response. self.interrupt() is never called.

Result: client gets a broken stream, kernel keeps running, and the next POST /execute blocks waiting for the lock (the same regression asyncio.Lock in messaging.py not released on client disconnect → cascading timeouts #213 was meant to fix).

How to fix

Rename the local payload variable so the Request parameter is preserved. Smallest patch is at lines 411-416:

exec_payload = self._get_execute_request(message_id, complete_code, False) await self._ws.send(exec_payload)

(_cleanup_env_vars and change_current_directory also use a local named request for the JSON payload, but they don't take a Request parameter, so the shadowing only matters in execute().)

mishushakov requested review from ValentaTomas and jakubno as code owners June 1, 2026 19:36

cla-bot Bot added the cla-signed label Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

cursor Bot reviewed Jun 1, 2026

View reviewed changes

claude Bot reviewed Jun 1, 2026

View reviewed changes

mishushakov marked this pull request as draft June 1, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix kernel interrupt on client disconnect with FastAPI 0.136.3#278

Fix kernel interrupt on client disconnect with FastAPI 0.136.3#278
mishushakov wants to merge 1 commit into
mainfrom
fix/interrupt-on-disconnect-fastapi-0.136

mishushakov commented Jun 1, 2026

Uh oh!

cursor Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

claude Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mishushakov commented Jun 1, 2026

What

The problem

Root cause

The fix

Notes

Uh oh!

cursor Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

HTTP request shadowed in execute

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

Disconnect not polled during output

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

Write disconnect not handled

Uh oh!

claude Bot Jun 1, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Why existing code doesn't prevent it

Impact

Step-by-step proof

How to fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 1, 2026 •

edited

Loading