feat(supervisor): add opt-in dequeue backpressure by nicktrn · Pull Request #3836 · triggerdotdev/trigger.dev

nicktrn · 2026-06-04T17:45:41Z

The supervisor can now pause dequeuing - and freeze consumer-pool scale-up - when a backpressure signal says the cluster can't place more work, then ramp dequeuing back up gradually once it clears. The signal is a verdict published to a Redis key by a cluster-side component; the supervisor reads it on a short refresh and gates preDequeue on it.

Off by default (TRIGGER_DEQUEUE_BACKPRESSURE_ENABLED). Everything fails open: a missing, stale, or unreadable verdict never pins the brake, and the hot-path read is a synchronous cached lookup with no I/O. The scale-up freeze leaves scale-down untouched, and on release the resume is ramped so a deep queue isn't hammered all at once.

Dry-run is on by default (TRIGGER_DEQUEUE_BACKPRESSURE_DRY_RUN): even once enabled it only logs what it would have done, and surfaces the computed state through metrics, until explicitly set to act. Prometheus: supervisor_backpressure_engaged, _dry_run, _skipped_dequeues_total.

Refs TRI-5354

Cached, fail-open monitor that decides whether to skip dequeues based on a pluggable signal source. Disabled is a total no-op (no refresh, no reads). The hot-path read is synchronous and never performs I/O; every failure mode (source throws, returns null, or verdict goes stale) fails open.

Reads the backpressure verdict from a Redis key (written by the cluster-side aggregator). Malformed or wrong-shaped values are treated as unknown so the monitor fails open. Adds @internal/redis + @internal/testcontainers deps.

Gate dequeues on the backpressure verdict via the existing preDequeue hook, on all paths including k8s (where the resource monitor is a no-op). Construct the Redis-backed monitor only when TRIGGER_DEQUEUE_BACKPRESSURE_ENABLED is set; require a Redis host when enabled. Off by default - no Redis client, no effect.

isEngaged() exposes the hard backpressure state (drives scale-up freeze), while shouldSkipDequeue() additionally ramps after release - skipping a linearly-decaying fraction of attempts over rampMs so the aggregate dequeue rate climbs back to full instead of snapping and re-flooding the cluster.

Add optional shouldPauseScaling to ScalingOptions; when it returns true the pool stops scaling up (scale-down still allowed), so it won't add consumers to drain a queue backpressure is deliberately holding.

Pass shouldPauseScaling (monitor.isEngaged) into the consumer pool so scale-up freezes while hard-engaged, and feed TRIGGER_DEQUEUE_BACKPRESSURE_RAMP_MS into the monitor's post-release ramp. Off by default.

Dry-run (default on via env) keeps the gates inert while computeEngaged still reflects the real signal and verdict transitions are logged. Adds BackpressureMetrics (engaged/dry_run gauges, skipped-dequeues counter).

changeset-bot · 2026-06-04T17:45:47Z

🦋 Changeset detected

Latest commit: 44842f9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages

Name	Type
@trigger.dev/core	Patch
@trigger.dev/build	Patch
trigger.dev	Patch
@trigger.dev/plugins	Patch
@trigger.dev/python	Patch
@trigger.dev/redis-worker	Patch
@trigger.dev/schema-to-json	Patch
@trigger.dev/sdk	Patch
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/llm-model-catalog	Patch
@trigger.dev/rbac	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/tsql	Patch
@internal/zod-worker	Patch
@internal/sdk-compat-tests	Patch
@trigger.dev/react-hooks	Patch
@trigger.dev/rsc	Patch
@trigger.dev/database	Patch
@trigger.dev/otlp-importer	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-06-04T17:45:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e543f166-0716-4fa8-ab25-1eeacd0d75b4

📥 Commits

Reviewing files that changed from the base of the PR and between e6b5a67 and 44842f9.

📒 Files selected for processing (1)

apps/supervisor/src/index.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/supervisor/src/index.ts

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (49)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: units / e2e-webapp / 🧪 E2E Tests: Webapp
GitHub Check: typecheck / typecheck
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: typecheck / typecheck
GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
GitHub Check: packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: audit
GitHub Check: Analyze (javascript-typescript)

Walkthrough

This pull request implements a Redis-backed dequeue backpressure system for the supervisor. When enabled, the supervisor periodically reads a backpressure verdict from Redis and uses it to pause consumer scale-up and probabilistically skip dequeue attempts. The system includes verdicts with optional timestamps to handle staleness, a post-release ramp that gradually resumes dequeueing after backpressure clears, and a dry-run mode for testing. Configuration is managed through environment variables, metrics are exported via Prometheus, and the consumer pool receives a new shouldPauseScaling callback to freeze scale-up while allowing scale-down. The feature is opt-in and disabled by default.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is substantially complete, explaining the feature, configuration, defaults, and failure modes. However, it lacks required template sections including Testing and Changelog sections, and does not include the issue reference in the Closes format.	Add 'Closes #' at the top, include Testing section describing test steps, and add a Changelog section summarizing the change.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding an opt-in dequeue backpressure feature to the supervisor. It is concise and clearly conveys the primary purpose of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/supervisor-dequeue-backpressure-tri-5354

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Guard against overlapping refresh ticks when a read hangs; use the staleness-aware computeEngaged() for transition/ramp/gauge bookkeeping; close the backpressure Redis client on supervisor shutdown.

Add TRIGGER_DEQUEUE_BACKPRESSURE_REDIS_PASSWORD to the secret strip-list so it never lands in the DEBUG startup log, with a comment to keep new secrets out.

devin-ai-integration

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-06-04T18:39:40Z

+  computeEngaged(): boolean {
+    const verdict = this.verdict;
+    if (verdict?.engaged !== true) {
+      return false;
+    }
+
+    const maxAge = this.opts.maxVerdictAgeMs;
+    if (maxAge !== undefined && verdict.ts !== undefined && Date.now() - verdict.ts > maxAge) {
+      return false;
+    }
+
+    return true;
+  }


🚩 Staleness check bypassed when verdict lacks a ts field

The staleness fail-open in computeEngaged() at apps/supervisor/src/backpressure/backpressureMonitor.ts:106 requires both maxVerdictAgeMs and verdict.ts to be defined. If the cluster-side aggregator writes a verdict like {engaged: true} without a ts field and then dies, the supervisor will never age out that verdict — it stays engaged until the next successful refresh (which might read the same stale key forever if it has no Redis TTL). This is by design (ts is optional in the schema), but operators should be aware: if maxVerdictAgeMs is relied upon for safety, the aggregator must include ts in every verdict it writes. Worth documenting this operational requirement.

Was this helpful? React with 👍 or 👎 to provide feedback.

nicktrn added 7 commits June 4, 2026 17:35

feat(supervisor): add RedisBackpressureSignalSource

7245038

Reads the backpressure verdict from a Redis key (written by the cluster-side aggregator). Malformed or wrong-shaped values are treated as unknown so the monitor fails open. Adds @internal/redis + @internal/testcontainers deps.

feat(core): freeze consumer-pool scale-up under backpressure

8190215

Add optional shouldPauseScaling to ScalingOptions; when it returns true the pool stops scaling up (scale-down still allowed), so it won't add consumers to drain a queue backpressure is deliberately holding.

feat(supervisor): wire scale-up freeze and resume ramp

30e476b

Pass shouldPauseScaling (monitor.isEngaged) into the consumer pool so scale-up freezes while hard-engaged, and feed TRIGGER_DEQUEUE_BACKPRESSURE_RAMP_MS into the monitor's post-release ramp. Off by default.

feat(supervisor): backpressure dry-run mode and prometheus metrics

e8cfce9

Dry-run (default on via env) keeps the gates inert while computeEngaged still reflects the real signal and verdict transitions are logged. Adds BackpressureMetrics (engaged/dry_run gauges, skipped-dequeues counter).

nicktrn self-assigned this Jun 4, 2026

This comment was marked as resolved.

Sign in to view

fix(supervisor): address review feedback on backpressure monitor

e6b5a67

Guard against overlapping refresh ticks when a read hangs; use the staleness-aware computeEngaged() for transition/ramp/gauge bookkeeping; close the backpressure Redis client on supervisor shutdown.

nicktrn marked this pull request as ready for review June 4, 2026 18:20

This comment was marked as resolved.

Sign in to view

fix(supervisor): strip backpressure redis password from debug env log

44842f9

Add TRIGGER_DEQUEUE_BACKPRESSURE_REDIS_PASSWORD to the secret strip-list so it never lands in the DEBUG startup log, with a comment to keep new secrets out.

devin-ai-integration Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(supervisor): add opt-in dequeue backpressure#3836

feat(supervisor): add opt-in dequeue backpressure#3836
nicktrn wants to merge 9 commits into
mainfrom
feat/supervisor-dequeue-backpressure-tri-5354

nicktrn commented Jun 4, 2026

Uh oh!

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nicktrn commented Jun 4, 2026

Uh oh!

changeset-bot Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (2 warnings)

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading