Skip to content

chore(proto): add buf-based proto skeleton + tooling (Phase 0)#614

Closed
lakhansamani wants to merge 1 commit into
mainfrom
feat/proto-skeleton
Closed

chore(proto): add buf-based proto skeleton + tooling (Phase 0)#614
lakhansamani wants to merge 1 commit into
mainfrom
feat/proto-skeleton

Conversation

@lakhansamani
Copy link
Copy Markdown
Contributor

Summary

  • Foundation for the upcoming multi-protocol public API surface (GraphQL + gRPC + REST). Public ops only — admin ops (_/_authz_ prefixed) stay GraphQL.
  • Adds proto/ rooted at buf.build/authorizerdev/authorizer (buf v2 layout) with common/v1:
    • annotations.proto — custom method options: required_permissions, mcp_tool, audit_log, public
    • pagination.protoPaginationRequest/Pagination accepting both legacy page/limit and AIP-158 page_token
    • errors.protoErrorReason enum surfaced via google.rpc.Status ErrorInfo
    • types.proto — shared AppData (mirrors the GraphQL Map scalar)
  • Generated Go committed under gen/go/ so go build works without protoc.
  • Makefile targets: proto-tools (auto-installs buf v1.47.2), proto-lint, proto-breaking, proto-gen.
  • New CI job proto using bufbuild/buf-action@v1: lint always, breaking-check on PRs vs main.
  • No behaviour change — no existing Go code paths are altered.

Test plan

  • make proto-tools && make proto-lint exits 0
  • make proto-gen produces 4 .pb.go files cleanly
  • go build ./... succeeds (including gen/)
  • CI proto job green on this PR
  • CI test job still green

Follow-ups (separate PRs per the approved plan)

  • Phase 1 — extract transport-agnostic internal/service layer
  • Phase 2 — proto for 9 public services + gRPC server + grpc-gateway REST
  • Phase 4 — MCP server reading proto annotations at runtime

🤖 Generated with Claude Code

Sets up the foundation for the upcoming multi-protocol public API surface
(GraphQL + gRPC + REST). No behaviour change: only the proto module, its
generated Go bindings, Makefile helpers, and a CI lint/breaking-check job.

- proto/ rooted at buf.build/authorizerdev/authorizer with v2 layout
- common/v1: annotations (required_permissions, mcp_tool, audit_log,
  public), pagination, errors (ErrorReason enum), shared AppData
- Generated Go committed under gen/go so go build works without protoc
- Makefile: proto-tools (auto-installs buf), proto-lint, proto-breaking,
  proto-gen
- CI: new "proto" job runs buf lint always and buf breaking on PRs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@lakhansamani lakhansamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Principal-engineer review — Phase 0 proto skeleton

Overall this is a well-scoped Phase-0 PR: it lands the foundations (buf v2 module, custom annotations, pagination, errors, AppData) without touching any runtime code, the lint/breaking CI gate is wired up correctly, and the generated Go is committed so consumers don't need protoc. I have no blocking objections, but there are a handful of things I'd tighten before more services pile on top of this surface — they get progressively harder to change once Phase 2/4 services depend on them.

Strengths

  • buf v2 module layout with a single named module (buf.build/authorizerdev/authorizer) is the right shape for a monorepo. Pinning bufbuild/protovalidate via buf.lock and deferring googleapis to Phase 2 (to avoid the "unused dep" warning) is exactly the discipline I'd want.
  • Managed mode with a disable for protovalidate is the correct workaround for BSR-published Go SDKs — easy to miss, glad it's there with a comment explaining why.
  • go_package_prefix override giving a stable github.com/authorizerdev/authorizer/gen/go path means consumers can go get the module without a separate publish step. Good.
  • Annotation design is sensible: required_permissions modelled as repeated PermissionRequirement (AND semantics) cleanly mirrors the existing authz resource/scope model. mcp_tool.destructive is forward-thinking — many MCP hosts already key off this for confirmation prompts.
  • Error taxonomy maps 1:1 to gRPC canonical codes and documents the HTTP mapping inline. Using google.rpc.ErrorInfo.reason is the right vehicle (clients branch on the enum, not the string).
  • Pagination dual-shape (page/limit for back-compat, page_token for AIP-158) is pragmatic and the doc comment makes the precedence explicit.
  • CI: lint on every push, breaking only on PRs against main. Correct — running buf breaking on push-to-main would always be a no-op or false positive.
  • Makefile auto-installs buf with a pinned version; commits generated Go so go build ./... works without protoc — both reduce contributor friction.

Issues / suggestions

Important (worth fixing before Phase 2 lands)

  1. No git diff --exit-code check after buf generate in CI. The proto skeleton commits generated Go but nothing verifies it stays in sync with .proto. A reviewer editing a proto can forget to regenerate and the PR will still be green (no Go code imports gen/ yet, so even go build won't catch it). Add a third step to the proto job:

    - run: make proto-gen
    - run: git diff --exit-code -- gen/ || (echo "::error::run make proto-gen and commit gen/"; exit 1)

    This becomes load-bearing the moment Phase 2 lands, so easier to add now.

  2. ERROR_REASON_VERIFICATION_REQUIRED and TOKEN_EXPIRED and ACCOUNT_DEACTIVATED and OPERATION_DISABLED all map to FAILED_PRECONDITION / HTTP 412. That's four distinct user-actionable failure modes collapsing into one transport status. The reason enum disambiguates, which is the whole point — but worth confirming: (a) FAILED_PRECONDITION is canonically HTTP 400, not 412 per google.rpc.Code and grpc-gateway's default mapping; 412 is precondition_failed which gateway doesn't emit by default. The doc comments say 412 for all of these — that won't match what grpc-gateway actually returns unless you customise the error handler. Suggest either correcting the comments to 400, or noting the planned gateway override.

  3. Pagination semantics for limit=0 and page=0. (buf.validate.field).int64 = {gte: 0} allows zero for both. What does limit=0 mean — "default" (10), "none" (0 results), or "all"? Same for page=0 vs. the "1-based" doc comment. A oneof { offset_paging, page_token } would express the precedence more honestly than the "server picks page_token when set, else falls back" prose, and (buf.validate.field).int64 = {gt: 0} plus an optional wrapper would let "unset" mean "default" without conflating with zero. As-is, every server implementation will independently re-invent the "is this 0 or unset" logic.

  4. required_permissions AND-only semantics will hurt. The comment acknowledges this: "to express OR, list alternatives in a single PermissionRequirement with a wildcard scope and let the policy engine evaluate." That pushes OR semantics into the resource/scope strings — i.e. you encode policy in identifier strings instead of the annotation. Two cleaner options to consider before Phase 2: (a) make the option a PermissionRequirementSet message with repeated PermissionRequirement all_of and repeated PermissionRequirement any_of; or (b) keep the field repeated but document that the interceptor evaluates a boolean expression ((resource, scope) OR (resource, scope2)) — but then you need a grammar. Option (a) is simpler and forward-compatible. Doing this after services already use the simple form is a breaking proto change.

  5. Pagination.offset is redundant with page * limit and not present in the GraphQL Pagination type's intent (it's there but always derivable). For a new API surface I'd drop it — clients should compute it or use page_token. Keeping it perpetuates a small piece of the legacy shape into a v1 you're committing to.

Nit

  1. Extension field numbers 50001-50004 are in the "internal use" range (1000-99999 per protobuf docs). Fine for a server-internal options module, but if you ever publish authorizer.common.v1.annotations for third parties to consume (which the BSR module name suggests is plausible), consider requesting a range in the protobuf-global-extension-registry. Not blocking; worth a comment in the .proto noting the rationale.

  2. Lint excepts PACKAGE_VERSION_SUFFIX but every current package (authorizer.common.v1) already has the .v1 suffix, so the rule would pass. The exception signals "future packages may not have the suffix" which is the opposite of what we want for an evolving public API. Drop the exception or pin a comment explaining why it's there.

  3. Makefile BUF ?= $(shell command -v buf …) is evaluated at parse time. Inside proto-tools the if [ -z "$(BUF)" ] works the first time, but downstream targets invoke bare buf (not $(BUF)), which requires $GOBIN to already be on PATH. CI is unaffected (uses bufbuild/buf-action), but a fresh local dev without GOBIN in PATH will see proto-lint fail right after install. Either point BUF at $(shell go env GOBIN)/buf after install, or document the PATH requirement.

  4. proto-gen runs buf dep update unconditionally. Fine for the common case but flakes offline. Consider buf dep prune/skip when buf.lock is fresh, or split into proto-deps + proto-gen.

  5. buf_action breaking_against URL uses https://github.com/${{ github.repository }}.git#branch=main. Works for public repos. If the repo ever goes private, this needs a token. Worth a comment in the workflow as a future-self reminder.

  6. AppData wraps a single google.protobuf.Struct. Defensible (lets you add schema_version etc. later without breaking) but if no further fields are anticipated, naked Struct removes a layer. Either way, document that JSON-int64 → float64 fidelity loss is expected (this bites people every time).

  7. Field ordering disconnect: GraphQL PaginationRequest is {limit, page}; proto is {page=1, limit=2, page_token=3}. Cosmetic; doesn't affect wire compatibility (different schemas) but a teammate trained on one will reach for the other's order. Worth aligning if cheap.

  8. go.mod adds google.golang.org/protobuf as indirect. Once the gRPC server lands in Phase 2 it'll become direct. No action needed now, just flagging that go mod tidy will reshuffle it.

Test coverage gaps

This PR adds zero test files, which is reasonable for a code-generation PR — but a couple of cheap tests would lock in the contract:

  • Generated-code drift test. As noted above: a CI step that runs make proto-gen and git diff --exit-code -- gen/. Catches stale generated files.
  • Annotation round-trip test (defer to Phase 1/2). A tiny Go test that constructs a descriptorpb.MethodOptions, attaches each of the four extensions, marshals + unmarshals, and asserts you can read them back via proto.GetExtension. Cheap insurance that the option numbers don't drift and that consumers (interceptors, MCP scanner) can rely on the contract.
  • buf-format check. Consider adding format: true to the buf-action step (or a make proto-fmt-check running buf format -d --exit-code) so style drift doesn't accumulate.

Recommendation

Approve with comments. None of the issues block merging Phase 0 — but items 1 (gen-drift CI), 2 (FAILED_PRECONDITION docs), 3 (pagination zero semantics), and 4 (AND-only permissions) all become breaking-to-fix the moment Phase 2 services start using these types, so I'd prefer to see them resolved before or in the same window as Phase 2.

@lakhansamani
Copy link
Copy Markdown
Contributor Author

Superseded by #620, which consolidates this stack into a single PR against main. All blocking review findings from this PR were addressed in #620; see its body for the per-finding traceback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant