Skip to content

Implement Lambda runtime init error reporting#103

Draft
joe4dev wants to merge 5 commits into
localstack-api-compat-testfrom
devx-1-implement-lambda-runtime-init-error-reporting
Draft

Implement Lambda runtime init error reporting#103
joe4dev wants to merge 5 commits into
localstack-api-compat-testfrom
devx-1-implement-lambda-runtime-init-error-reporting

Conversation

@joe4dev
Copy link
Copy Markdown
Member

@joe4dev joe4dev commented Jun 3, 2026

Summary

When a Lambda runtime exits unexpectedly or throws an error during initialization, LocalStack previously received no callback and would wait until the environment timeout. This PR adds two complementary error-reporting paths so LocalStack immediately receives a structured ErrorResponse instead of timing out, and also adds RIE-side support for the init-phase timeout retry protocol.

Changes

cmd/localstack/supervisor.go — new

LocalStackSupervisor wraps the sandbox's ProcessSupervisor and intercepts process termination events on a background goroutine. When a runtime-* process exits and the runtime is not already shutting down intentionally, it constructs a FaultData with fatalerror.RuntimeExit and calls eventsAPI.SendFault(). An atomic isShuttingDown flag (set on Terminate/Kill, cleared on Exec) prevents duplicate fault events during graceful restarts.

cmd/localstack/events.go — new

LocalStackEventsAPI wraps telemetry.StandaloneEventsAPI and overrides SendFault to forward runtime faults to LocalStack as error status callbacks. It embeds the requestId in the error message and tracks the current invoke ID (thread-safe via sync.RWMutex) for faults that originate outside of an active invocation.

cmd/localstack/custom_interop.go — modified

  • ErrorResponse.RequestId changed from string to *string with omitempty: a pointer-to-empty-string serializes as "" (required by LocalStack's init error contract), while nil (used in fault events) is omitted.
  • NewCustomInteropServer now accepts a pre-created *LocalStackAdapter instead of constructing one internally, so the adapter is shared with the events API.
  • SendInitErrorResponse now parses the raw error payload from the runtime, enriches it with the current invoke ID, and asynchronously POSTs the structured response to LocalStack /status/{runtime_id}/error before propagating to the delegate. Falls back to the raw payload if parsing fails.
  • InvokeRequest gains an IsInitRetry bool field (is-init-retry): when true, the RIE suppresses the Init Duration line from the REPORT log (LocalStack prepends a pre-computed INIT_REPORT line for the timed-out first attempt instead).
  • CustomInteropServer tracks initStart time.Time and warmStart bool to emit Init Duration on the first invocation only.
  • On ErrInvokeTimeout, the REPORT line appends Status: timeout and the error response includes ErrorType: "Sandbox.Timedout".

cmd/localstack/awsutil.go — modified

  • PrintEndReports signature gains a status string parameter.
  • Init Duration is placed after Max Memory Used in the REPORT line to match the AWS field order.

cmd/localstack/main.go — modified

LocalStackAdapter is created once upfront and passed to both NewLocalStackEventsAPI and NewCustomInteropServer. The sandbox is configured with SetEventsAPI(lsEventsAPI) and SetSupervisor(localStackSupv). The supervisor's context is cancelled in the existing shutdown func alongside the file watcher.

Tests

Covered by the integration tests in localstack/localstack-pro#7293:

Scenario Test
Exception raised during module import test_lambda_runtime_error
sys.exit() called during init test_lambda_runtime_exit
Missing AWS_LAMBDA_EXEC_WRAPPER script test_lambda_runtime_wrapper_not_found
Init phase exceeds 10 s → transparent retry with function timeout test_lambda_timeout_init_phase

Related

Depends on #101
Closes DEVX-1

joe4dev and others added 5 commits June 3, 2026 18:14
…s API

Ports the supervisor and events API from PR #41 to enable proper error
reporting when a Lambda runtime process exits unexpectedly (e.g. sys.exit()
or missing wrapper script), instead of LocalStack timing out with a generic
error.

- Add LocalStackSupervisor: wraps ProcessSupervisor, detects unexpected
  runtime-* process exits and emits SendFault(RuntimeExit) events
- Add LocalStackEventsAPI: wraps StandaloneEventsAPI, overrides SendFault
  to forward errors to LocalStack via SendStatus(error, ...)
- Wire both into SandboxBuilder via SetEventsAPI / SetSupervisor
- Refactor NewCustomInteropServer to accept a pre-created *LocalStackAdapter
  shared with the events API
- Improve SendInitErrorResponse: properly deserialises the payload, includes
  RequestId, and sends asynchronously (non-blocking)

Enables test_lambda_runtime_exit and test_lambda_runtime_wrapper_not_found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use *string for the RequestId field in ErrorResponse so that an empty
string is serialized (not omitted by omitempty), while nil — used for
fault events — stays omitted. Fixes test_lambda_runtime_error snapshot
mismatch where requestId: "" was expected but absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move Init Duration after Max Memory Used in REPORT line (matches AWS)
- Add Status: timeout to REPORT line on invoke timeout
- Fix timeout error message format to "RequestId: <id> Error: Task timed out after N.00 seconds"
- Add ErrorType: "Sandbox.Timedout" to timeout error response
- Track init start time and emit Init Duration on first non-retry invocation
- Add is-init-retry field to InvokeRequest to suppress Init Duration on retry invokes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Close response bodies in SendStatus/SendLogs/SendResult so idle
  connections are released instead of leaked.
- Use errors.New instead of fmt.Errorf with no format arguments.
- Document the single-invoke assumption behind the unsynchronized
  initStart/warmStart fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolve the fault request ID in the events API: prefer an explicit ID,
then the current invoke ID so a mid-invocation runtime crash reports the
actual request, and only synthesize a UUID as a fallback for init-phase
faults where no invocation has been dispatched yet. Previously the
supervisor always passed a random UUID, masking the real invoke ID.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant