feat(authz): register agentex resources and grant agent/task ownership by asherfink · Pull Request #270 · scaleapi/scale-agentex

asherfink · 2026-06-03T15:16:52Z

Related work

Parent ticket: AGX1-242 - AgentEx authorization dual-write.

This change is part of a 4-PR stack across 3 repos.

Repo	PR	Purpose
scaleapi/scaleapi	scaleapi/scaleapi#146335	merged flag registry update for the shared AgentEx resources rollout flags
scaleapi/agentex	scaleapi/agentex#364	agentex-auth routes each auth verb to one backend target
scaleapi/scale-agentex	this PR	registers agentex resources in the authorization graph and grants agent/task ownership
scaleapi/scale-agentex	scaleapi/scale-agentex#271	denied direct agent access collapses to 404

Merge/deploy order: scaleapi/scaleapi#146335 is merged; deploy scaleapi/agentex#364 before this PR, then merge/deploy scaleapi/scale-agentex#271. Rollout per account is: enable fgac-agentex-resources-dual-write, backfill existing resources into Spark, then enable fgac-agentex-resources reads.

What

Registers agentex resources in the authorization graph on create and removes them on delete, separating resource lifecycle from explicit ownership:

Agents and tasks register in the authorization graph on create and record an explicit owner; on delete they deregister and the owner record is revoked.
Schedules and agent API keys register under their parent agent on create and deregister on delete. Permissions cascade from the parent agent, so no separate owner record is written.

Registration runs before the resource is persisted, so an authorization failure fails closed with no orphaned row. If the persist (or the Temporal create, for schedules) later fails, a compensating deregister runs.

Why

Giving each verb a single backend keeps the existing ownership writes (grant/revoke) unchanged and confines the new behavior to the new lifecycle interface (register/deregister). Explicit ownership is recorded only for agents and tasks; schedules and agent API keys inherit access from their parent agent, so they need no owner record of their own. Issuing the two write kinds independently keeps both backends current, so reads can cut over per account with a clean rollback.

Testing

Unit and integration tests covering the agent, task, schedule, and api_key authorization writes (register/deregister ordering, fail-closed-before-persist, and compensating cleanup).
ruff check / ruff format.

Greptile Summary

This PR adds register_resource / deregister_resource lifecycle calls to agents, tasks, schedules, and agent API keys so that each resource is tracked in the authorization graph from the moment it is created. register_resource fires before the Postgres (or Temporal) write so an auth failure fails closed with no orphaned row; a compensating deregister_resource cleans up if the downstream write later fails. Ownership writes (grant / revoke) remain at the route layer, keeping both backends current during the dual-write rollout.

AgentsUseCase gains _register_in_auth (called only on genuine first creates, not on updates/idempotent re-registers) and _safe_deregister (best-effort, called after soft-delete and as compensation for race conditions or persist failures).
ScheduleService and AgentAPIKeysUseCase already had this pattern; their docstrings and comments are updated to reflect the unified terminology.
principal_context is removed from agent schemas because the authorization service now derives the principal from its own injected context.

Confidence Score: 5/5

Safe to merge; register/deregister ordering is correct across all resource types and all failure paths are covered with compensating actions or best-effort swallowing.

The core invariant — register before persist so an auth failure aborts the create cleanly — holds in both AgentsUseCase create paths. Compensation on DuplicateItemError and generic Exception correctly deregisters only when a register call was actually made. The delete path deregisters after the soft-delete/Postgres delete, consistent with existing task and schedule patterns. Tests cover the full lifecycle including race conditions and the no-creator skip path.

No files require special attention.

Important Files Changed

Filename	Overview
agentex/src/domain/use_cases/agents_use_case.py	Adds DAuthorizationService dependency, _register_in_auth (skips when no principal, fail-closed before persist), and _safe_deregister (swallows errors). Both genuine-create paths call _register_in_auth with correct compensation on DuplicateItemError and generic Exception; update paths are correctly excluded.
agentex/src/api/routes/agents.py	Removes per-resource post-fetch auth check from get_agent_by_name (pre-check dependency remains); removes principal_context forwarding; switches register_build from register_resource to grant at route level.
agentex/tests/integration/use_cases/test_agent_authz_dual_write.py	New integration test covering all key scenarios: register before persist, fail-closed on register failure, compensation on persist failure, duplicate-race compensation, update path exclusion, no-creator skip, and delete deregistration.
agentex/tests/unit/api/test_tasks_authz.py	Adds TestTaskDeleteAuthWrites covering delete_task and delete_task_by_name routes, verifying revoke is called with the correctly resolved task ID.
agentex/src/api/schemas/agents.py	Removes principal_context field from RegisterAgentRequest and RegisterBuildRequest; reflected in openapi.yaml.

_{Reviews (9): Last reviewed commit: "feat(authz): register agentex resources ..." | Re-trigger Greptile}

github-actions · 2026-06-03T15:17:41Z

✱ Stainless preview builds

This PR will update the agentex-sdk SDKs with the following commit messages.

openapi

feat(api): remove principal_context field from agent models

python

feat(api): remove principal_context parameter from agents.register_build

typescript

fix(api): remove principal_context parameter from agent register build

Edit this comment to update them. They will appear in their respective SDK's changelogs.

✅ agentex-sdk-typescript studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️ → build ✅ (prev: build ⏭️) → lint ✅ (prev: lint ⏭️) → test ✅
npm install https://pkg.stainless.com/s/agentex-sdk-typescript/9af35c30b77687c7bb9994da378706b7856ffa79/dist.tar.gz

✅ agentex-sdk-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅

⚠️

agentex-sdk-python studio · code · diff

Your SDK build had a failure in the lint CI job, which is a regression from the base state.
generate ⚠️ → build ✅ (prev: build ⏭️) → lint ❗ (prev: lint ⏭️) → test ✅
pip install https://pkg.stainless.com/s/agentex-sdk-python/16a061a6c1ca165d5dfdffecd3ad030f78d98b98/agentex_sdk-0.12.0-py3-none-any.whl

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-09 00:24:32 UTC

harvhan · 2026-06-04T07:19:02Z

    """Get an agent by its unique name."""
    agent_entity = await agents_use_case.get(name=agent_name)
-
-    await authorization.check(


I think we should keep the authorization.check/revoke/grant. Otherwise, how can we maintain backward compatibility?

1, removing authorization.check will remove authz altogether.
2, removing grant/revoke will break the old authz even if the user opts out of the new FGAC

Even if a user turns FGAC on, we may still want to keep grant/revoke for permission data dual-write, so that the old authz still works in case the user turns FGAC off.

went and traced this through agentex-auth to make sure we're solid, and I don't think we need to keep grant/revoke here

tl;dr: on the legacy (SGP) backend, register_resource is grant and deregister_resource is revoke. so dropping the explicit grant/revoke and going through register/deregister is behavior-preserving for old authz, not a regression.

1. removing authorization.check removes authz altogether

it doesn't. the only check I removed is the explicit one in get_agent_by_name, but that route still has agent_name: DAuthorizedName(agent, read) in its signature, and that dependency calls authorization.check(agent, read) itself before the handler body runs (authorization_shortcuts.py). so the line I deleted was a redundant second check of the same tuple. read authz is still enforced. on create I kept the check(agent("*"), create) too, just dropped the unused principal_context= kwarg.

2 + 3. removing grant/revoke breaks old authz when FGAC is off

two things:

agentex-auth routes every call (grant/revoke/check/register/deregister) to exactly one provider per account based on FGAC_AGENTEX_AUTH_SPARK (_resolve_provider). flag off → legacy SGP, flag on → Spark. there's no write-to-both happening, it's a routing switch.

on the SGP gateway, register_resource delegates straight to grant and deregister_resource delegates to revoke

so for an FGAC-off account, register_resource(agent) → SGP → grant(owner), which is exactly what the old route did. deregister_resource → revoke. legacy authz is fully preserved. for FGAC-on it does the proper Spark resource+owner write (tenant scoping comes from account_id automatically). either way it's correct, and keeping grant/revoke on top would just be a redundant duplicate write.

asherfink · 2026-06-08T23:02:08Z

didn't mean to delete the greptile comments from^^, claude did that as they were from the old impl

rpatel-scale · 2026-06-08T23:10:13Z

-    await authorization.check(
-        resource=AgentexResource.agent(agent_entity.id),
-        operation=AuthorizedOperationType.read,
-    )
-


Why are we removing this? Does it now happen at a higher layer?

it's moved up into the parameter dependency, so the body call was a duplicate. agent_name is typed DAuthorizedName(AgentexResourceType.agent, read), and that dependency (_ensure_authorized_name in src/utils/authorization_shortcuts.py) resolves the agent by name + runs authorization.check(agent(resolved_id), read) before the handler body executes.

get_agent_by_id directly above already works this way

this was duplicative even before this PR.

rpatel-scale · 2026-06-08T23:19:15Z

    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e)) from e
-    await authorization_service.register_resource(
+    await authorization_service.grant(


So we do a register resource in agentex/src/domain/use_cases/agents_use_case.py and then a grant out here. Do we need to do both? Can we keep the authz checks/grants at the router layer ideally instead of nested in the business logic? Thoughts?

we need both, they're the two backends of the dual-write, not redundant. register_resource records the agent in the new authz graph; grant records the legacy owner edge.

sure makes sense but still can we move it up from the business layer? or move this call down. Ideally all authz checks and writes happen at the highest layer possible IMO

agree on the principle for the checks. gating belongs at the edge and it already is there: the check runs in the route and in the DAuthorizedId / DAuthorizedName param dependencies (which resolve before the handler body executes), so access control happens at the highest layer.

register_resource / deregister_resource are not authz gating. they're resource-graph writes that have to be transactionally consistent with the DB persist:

run before the row is persisted (fail-closed → no orphaned row),

compensate with a deregister if the persist loses a duplicate-create race or fails,

fire only on a genuine create (the get-or-create logic in the use case is what knows create-vs-update; re-registering on an update would rewrite ownership).

The route only gets control after register_agent returns (post-persist), and the agent id is generated inside it. so it can't do register-before-persist or know create-vs-update without exposing the transaction internals upward. Moving it to the router would pull persistence/race logic into the route, which is worse

asherfink · 2026-06-09T00:23:05Z

rebased onto latest main

#271) ## Related work Parent ticket: AGX1-242 - AgentEx authorization dual-write. This change is part of a 4-PR stack across 3 repos. | Repo | PR | Purpose | |---|---|---| | scaleapi/scaleapi | [scaleapi/scaleapi#146335](scaleapi/scaleapi#146335) | merged flag registry update for the shared AgentEx resources rollout flags | | scaleapi/agentex | [scaleapi/agentex#364](scaleapi/agentex#364) | agentex-auth routes each auth verb to one backend target | | scaleapi/scale-agentex | [#270](#270) | registers agentex resources in the authorization graph and grants agent/task ownership | | **scaleapi/scale-agentex** | **this PR** | **denied direct agent access collapses to 404** | **Merge/deploy order:** [scaleapi/scaleapi#146335](scaleapi/scaleapi#146335) is merged; deploy [scaleapi/agentex#364](scaleapi/agentex#364), then merge/deploy [#270](#270), then this PR. Rollout per account is: enable `fgac-agentex-resources-dual-write`, backfill existing resources into Spark, then enable `fgac-agentex-resources` reads. ## What Collapses denied direct agent access from 403 to 404 for agent lookups by id, name, and query. Allowed checks pass through unchanged, and list behavior is unchanged (the list route already filters to authorized ids). ## Why Returning 403 for a specific agent id or name reveals that the agent exists; returning 404 makes "not found" and "exists but not visible to this caller" indistinguishable. This is read-side behavior only. ## Testing - Unit tests for the agent and task authorization route behavior.  <h3>Greptile Summary</h3> This PR extends the authorization shortcut helpers to collapse denied agent checks from 403 to 404, preventing cross-tenant existence probing via error-code differentiation. - Adds `_check_agent_or_collapse_to_404` and wires it into all three shortcuts (`DAuthorizedId`, `DAuthorizedQuery`, `DAuthorizedName`) for `AgentexResourceType.agent`; the earlier review concern about `DAuthorizedQuery` (`GET /events` list endpoint) is addressed in this diff. - Introduces 263-line unit test suite (`test_agents_authz.py`) covering the helper directly plus each shortcut's agent path, and updates `test_schedules_authz.py` / `test_tasks_authz.py` to reflect the new expected behavior. <details><summary><h3>Confidence Score: 5/5</h3></summary> Safe to merge; all three agent-lookup shortcuts collapse denied checks to 404 and the earlier concern about DAuthorizedQuery on the events list endpoint is resolved in this diff. The change is narrow and consistent: one new helper function wired into three existing shortcuts, each path exercised by dedicated unit tests. DAuthorizedBodyId is only used for task resources so it requires no changes. The updated schedule and task tests correctly reflect the new collapse behavior. No files require special attention. </details> <h3>Important Files Changed</h3> | Filename | Overview | |----------|----------| | agentex/src/utils/authorization_shortcuts.py | Adds _check_agent_or_collapse_to_404 and applies it in DAuthorizedId, DAuthorizedQuery, and DAuthorizedName; all three agent-lookup paths now collapse AuthorizationError to 404. DAuthorizedBodyId is unaffected and only used with task resources. | | agentex/tests/unit/api/test_agents_authz.py | New test file covering the helper, DAuthorizedId, DAuthorizedQuery, DAuthorizedName, and list-filtering for agent resources; denied/absent/allowed cases are all represented. | | agentex/tests/unit/api/test_schedules_authz.py | Updates TestCreateParentAgentCheck to expect ItemDoesNotExist (404) instead of AuthorizationError (403) on a denied parent-agent check, consistent with the new behavior in DAuthorizedId for agents. | | agentex/tests/unit/api/test_tasks_authz.py | Renames test_non_task_name_skips_wrap to test_agent_name_collapses_to_404 and flips the expected exception from AuthorizationError to ItemDoesNotExist, aligning with the agent-name collapse now in DAuthorizedName. | </details> <details><summary><h3>Flowchart</h3></summary> ```mermaid %%{init: {'theme': 'neutral'}}%% flowchart TD A[Caller Request] --> B{Authorization Shortcut} B -->|DAuthorizedId agent| C[_check_agent_or_collapse_to_404] B -->|DAuthorizedName agent| D[repo.get name to id then _check_agent_or_collapse_to_404] B -->|DAuthorizedQuery agent| E[_check_agent_or_collapse_to_404] C --> F{authorization.check} D --> F E --> F F -->|Allowed| G[Pass through - return resource id] F -->|AuthorizationError denied| H[raise ItemDoesNotExist - HTTP 404] style H fill:#f96,color:#000 style G fill:#6f6,color:#000 ``` </details> <sub>Reviews (10): Last reviewed commit: ["feat(agents): collapse denied agent rout..."](99106f8) | [Re-trigger Greptile](https://app.greptile.com/api/retrigger?id=35397402)</sub>

Deployed agent pods were CrashLoopBackOff-ing -- 422 on POST /agents/register. The route is whitelisted (pods self-register on startup with no user login), so #270's added create-check/grant ran with a None principal, which agentex-auth rejects with 422. Changes: - Skip the create check and ownership grant in register_agent when no creator is resolvable from the principal context; an authenticated caller is still enforced. The agent is already owned from build time (register-build runs before deploy-time register). - Align the legacy-authz integration test (which mocked agentex-auth and masked the crashloop) to assert no check/grant on the self-reg path, and add unit coverage for the resolvable-creator gate.

asherfink requested a review from a team as a code owner June 3, 2026 15:16

asherfink mentioned this pull request Jun 3, 2026

feat(agents): collapse denied agent route access to 404 instead of 403 #271

Merged

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch from 685126a to 0d99f31 Compare June 3, 2026 15:31

asherfink changed the title ~~feat(agents): register agents in the authorization graph on create/delete~~ feat(agents): register/deregister agents in the authorization graph on create/delete Jun 3, 2026

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch from 0d99f31 to 14796e9 Compare June 3, 2026 15:35

harvhan reviewed Jun 4, 2026

View reviewed changes

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch from 14796e9 to 883921e Compare June 5, 2026 17:07

asherfink changed the title ~~feat(agents): register/deregister agents in the authorization graph on create/delete~~ feat(agents): write agent ownership to the authorization graph on create/delete Jun 5, 2026

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch 3 times, most recently from f1f7a42 to 7bc61d6 Compare June 8, 2026 18:30

asherfink mentioned this pull request Jun 8, 2026

fix(authz): register build agents as resources #283

Merged

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch from 7bc61d6 to 1dd4143 Compare June 8, 2026 20:28

asherfink changed the title ~~feat(agents): write agent ownership to the authorization graph on create/delete~~ feat(agents): write legacy grants and Spark lifecycle auth Jun 8, 2026

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch 3 times, most recently from c00f3e9 to 4c4e73a Compare June 8, 2026 22:52

asherfink changed the title ~~feat(agents): write legacy grants and Spark lifecycle auth~~ feat(authz): register agentex resources and grant agent/task ownership Jun 8, 2026

scaleapi deleted a comment from greptile-apps Bot Jun 8, 2026

rpatel-scale reviewed Jun 8, 2026

View reviewed changes

rpatel-scale approved these changes Jun 9, 2026

View reviewed changes

asherfink enabled auto-merge (squash) June 9, 2026 00:20

rpatel-scale approved these changes Jun 9, 2026

View reviewed changes

feat(authz): register agentex resources and grant agent/task ownership

73a5056

asherfink force-pushed the asher.fink/agx1-242-agent-dual-write branch from 4c4e73a to 73a5056 Compare June 9, 2026 00:21

asherfink merged commit 76353de into main Jun 9, 2026
30 checks passed

asherfink deleted the asher.fink/agx1-242-agent-dual-write branch June 9, 2026 00:27

harvhan mentioned this pull request Jun 9, 2026

fix(authz): skip create-check/grant on unauthenticated agent self-register #292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(authz): register agentex resources and grant agent/task ownership#270

feat(authz): register agentex resources and grant agent/task ownership#270
asherfink merged 1 commit into
mainfrom
asher.fink/agx1-242-agent-dual-write

asherfink commented Jun 3, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

harvhan Jun 4, 2026

Uh oh!

asherfink Jun 4, 2026

Uh oh!

asherfink commented Jun 8, 2026

Uh oh!

rpatel-scale Jun 8, 2026

Uh oh!

asherfink Jun 9, 2026

Uh oh!

harvhan Jun 9, 2026

Uh oh!

rpatel-scale Jun 8, 2026

Uh oh!

asherfink Jun 8, 2026

Uh oh!

rpatel-scale Jun 9, 2026

Uh oh!

asherfink Jun 9, 2026

Uh oh!

asherfink commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

asherfink commented Jun 3, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related work

What

Why

Testing

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asherfink commented Jun 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asherfink commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asherfink commented Jun 3, 2026 •

edited by greptile-apps Bot

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading