Skip to content

[SEA-NodeJS] SEA async execute, TLS & connection/statement options#416

Open
msrathore-db wants to merge 7 commits into
mainfrom
msrathore/sea-async-tls-options-main
Open

[SEA-NodeJS] SEA async execute, TLS & connection/statement options#416
msrathore-db wants to merge 7 commits into
mainfrom
msrathore/sea-async-tls-options-main

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Recreation of #413 pointing at main (the original #413 was based on the #412 branch and got merged there instead of into main, so its changes never landed on main). Same content, replayed cleanly on top of the merged #412 (94d63bd) — zero conflicts (the #413 fork point is byte-identical to main).

What's here (the 6 #413 commits)

  • SEA connection & statement options — TLS (checkServerCertificate secure-by-default + customCaCert), maxConnections, rowLimit, statementConf, queryTags.
  • SEA async executesubmitStatement + poll-to-terminal (waitUntilReady) + awaitResult; cancel stays responsive.
  • Review fixes (from [SEA-NodeJS] SEA async execute, TLS & connection/statement options #413's reviews, all carried over):
    • Server-driven Cancelled/Closed/Unknown throw OperationStateError (not bare HiveDriverError) so the facade mirrors cancelled/closed.
    • queryTimeout enforced client-side (poll deadline → OperationStateError(Timeout) + best-effort cancel); not forwarded to the kernel-ignored submit field.
    • customCaCert PEM check requires BEGIN+END; SeaAuth reads options via InternalConnectionOptions; corrected poll-loop comment.
    • TIMESTAMP_NTZ bound natively, TIMESTAMP_LTZ mapped to TIMESTAMP (Spark has no distinct LTZ type) — valid on both backends.
    • queryTimeout coerced via numberToInt64(...).toNumber() (handles Int64; Number(int64) was NaN).

Verification

Note: the kernel-rev pin (#414) and the SEA→kernel rename (#415) are stacked after this; once this merges to main I'll re-point them at main too.

This pull request and its description were written by Isaac.

Wire the SEA connection-level and per-statement option surfaces onto the
merged-kernel napi binding (thin forwarding — the kernel owns the behaviour):

Connection options (SeaAuth.buildSeaConnectionOptions):
- `maxConnections` → kernel pool size, validated as a positive integer within
  the napi u32 range.
- TLS: `checkServerCertificate` (secure-by-default — omit to keep the kernel's
  verify-on default; `false` opts into insecure) and `customCaCert` (PEM string
  or Buffer; strings are PEM-sanity-checked and normalised to a Buffer before
  the FFI boundary), via the new `buildSeaTlsOptions`.
- `intervalsAsString: true` is always set so SEA interval/duration columns
  render as strings — a byte-compatible drop-in for the Thrift backend.
  `complexTypesAsJson` is intentionally left at the kernel default (native
  Arrow), which already decodes identically to Thrift via the shared converter.

Statement options (SeaSessionBackend.executeStatement, via buildExecuteOptions):
- `queryTimeout` → `queryTimeoutSecs`; `rowLimit` → `rowLimit` (SEA-only cap).
- `queryTags` serialised JS-side (reusing Thrift's `serializeQueryTags`) into
  the reserved `query_tags` conf key, merged with any explicit `statementConf` —
  the napi `queryTags` field can't carry null-valued tags, and the kernel
  rejects setting both. `queryTags` / `queryTimeout` are no longer rejected.
- Still rejected (genuinely unsupported on SEA): `useCloudFetch`,
  `useLZ4Compression`, `stagingAllowedLocalPath`.

`rowLimit` / `statementConf` added to the public `ExecuteStatementOptions`;
SEA-only knobs (`maxConnections` / `checkServerCertificate` / `customCaCert`)
added to the internal `InternalConnectionOptions`.

Validated against a live warehouse: secure-by-default connect, maxConnections,
checkServerCertificate, rowLimit (caps rows), queryTimeout, queryTags,
statementConf, and non-PEM customCaCert rejection.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Switch the SEA query path from the blocking `executeStatement` to the kernel's
async `submitStatement`, matching the Thrift backend's always-async
(`runAsync: true`) model. `submitStatement` returns immediately with a pending
`AsyncStatement` (kernel `wait_timeout=0s`) while the query runs server-side.

SeaOperationBackend becomes dual-mode (exactly one of):
- `asyncStatement` (query path): `waitUntilReady()` polls `status()` to a
  terminal state on a 100ms cadence (matching Thrift), firing the progress
  callback each tick. Polling `status()` rather than blocking on `awaitResult()`
  keeps `cancel()` responsive — a blocking awaitResult would hold the kernel
  statement mutex for the whole query and queue cancel behind it. On Succeeded
  it materialises the result handle (first fetch is free); on Failed it drives
  `awaitResult()` to surface the kernel's typed SQL-error envelope; on a
  server-side Cancelled/Closed/Unknown it throws a clear error. `status()`
  reports the real Pending/Running/Succeeded state.
- `statement` (metadata path): the kernel `list*`/`get*` statement is already
  terminal, so `waitUntilReady()` stays the one-shot completion tick.

The fetch pipeline is shared: `awaitResult()`'s `AsyncResultHandle` and the
metadata `Statement` expose the same `fetchNextBatch()` / `schema()` surface, so
`SeaResultsProvider` → `ArrowResultConverter` → `ResultSlicer` consume either
interchangeably via a single memoised fetch handle. cancel()/close() route
through a `lifecycleHandle` abstraction over whichever handle backs the op.

Re-exports the kernel `AsyncStatement` / `AsyncResultHandle` types from
`SeaNativeLoader`.

Validated against a live warehouse: async fetchAll correctness, multi-row drain
(5000 rows), long-running aggregate (count over 20M), kernel SQL-error
surfacing, and cancellation mid-execution. PR1's params/metadata/getInfo all
still pass through the new async path.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
… code-style)

The CI "Check code style" step runs `prettier . --check` (whole repo);
these files were committed without prettier formatting. Formatting-only.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Code-review #413 (81/100). Validated each against the code + a live warehouse:

- F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven
  Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its
  cancelled/closed flags when `err instanceof OperationStateError` (and
  OperationStateError extends HiveDriverError, not the reverse), so a
  server-side cancel/close/admin-kill left the facade desynced. Now throws
  OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend.
  The Failed branch still surfaces the kernel SQL-error envelope via awaitResult.

- F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError),
  which passes for both the correct and incorrect type — it couldn't catch F1.
  Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test.

- F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores
  queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public
  option was a silent no-op, and the poll loop had no client-side deadline (a
  stalled Running statement polled forever). Now enforced client-side: the poll
  loop tracks a deadline, best-effort cancels the statement on expiry, and
  throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT
  outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options.
  Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT.

- F4 (LOW): customCaCert PEM string check now requires the END marker too (a
  truncated/headerless cert no longer passes), consistent with the Buffer path.

- F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate /
  customCaCert / maxConnections) through `InternalConnectionOptions` instead of
  ad-hoc inline casts, so a typo'd key fails to compile.

- F6 (LOW): corrected the poll-loop comment — the prior text justified polling
  by an incorrect "blocking awaitResult holds the mutex and queues cancel"
  claim; cancel() is documented lock-free. The real rationale (real-time
  status to the progress callback + cancel observed between ticks) is now stated.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Consolidates the last net-new bit of the superseded #408: two SEA-path
DBSQLParameterType variants for binding timezone-explicit timestamps. The type
name flows through the existing param codec (toSparkParameter → sqlType), which
the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the
bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…ryTimeout coercion

Code-review #413 (gopalldb). Two P1s:

- TIMESTAMP_LTZ was sent verbatim on the wire, but Spark has no distinct
  TIMESTAMP_LTZ type (TIMESTAMP already carries LTZ semantics) — so a Thrift
  caller got an opaque server bind error, and the enum comment falsely claimed
  NTZ/LTZ "degrade to a plain TIMESTAMP bind" (there was no such logic).
  `toSparkParameter` now maps TIMESTAMP_LTZ → `TIMESTAMP` (valid on both Thrift
  and kernel); TIMESTAMP_NTZ stays native (a real Spark type). Comment corrected.
  Added DBSQLParameter tests for both wire types (the Thrift behaviour the
  review flagged as untested) and updated the kernel positional-params test.

- queryTimeout (`number | bigint | Int64`) was coerced with `Number(...)`, which
  yields NaN for an Int64 (node-int64 has no valueOf) → the client-side deadline
  was silently disabled for Int64 inputs. Now uses
  `numberToInt64(...).toNumber()`, matching the Thrift backend. Added a
  regression test that an `Int64(1)` queryTimeout actually fires the deadline
  (OperationStateError(Timeout)) rather than polling forever.

(P1 "queryTimeout silently dropped on submit" and the unbounded-poll note were
already resolved earlier by the client-side deadline fix; doc comment updated to
match. P2 polarity/Date-NTZ items noted for the public-surface follow-up.)

Validated live: NTZ binds natively and LTZ binds as TIMESTAMP on the kernel path.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@gopalldb
Copy link
Copy Markdown
Collaborator

gopalldb commented Jun 4, 2026

🟡 P1

  1. seaCancel resets isCancelled to false on kernel-cancel failure — loses user's cancel intent (SeaOperationLifecycle.ts:134-141). Brief true window can be observed by the poll loop and throw
    OperationStateError(Canceled), then the rollback hides the intent from any other consumer. Worst case: kernel-side statement keeps running while the user believes it was canceled. Fix: don't reset
    isCancelled on cancel-RPC failure.
  2. Cancel during in-flight status() poll is unbounded (SeaOperationBackend.ts:372-419). Cancel arrives → flag flipped → but observation deferred until current status() resolves. No AbortSignal threaded to
    napi. Hung warehouse = unbounded latency to observe cancel. Fix: race in-flight status() against a lifecycle promise on cancel. Add a test where cancel fires while status() is mid-flight (existing test only
    covers cancel-before-poll-starts).
  3. PEM marker check is order-insensitive substring match (SeaAuth.ts:201-209). "...-----END...-----BEGIN..." (e.g. proxy intercept page with both literals) passes. Fix: ordered regex /-----BEGIN
    CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/.
  4. customCaCert honored regardless of checkServerCertificate=false — silent insecure combo (InternalConnectionOptions.ts:40, SeaAuth.ts:184-223). User providing both gets verification fully off but with a
    wasted custom CA install — the secure-looking customCaCert masks the disabled verification. Fix: error or warn when both set; today's doc explicitly endorses the ambiguous combo.
  5. No close() cleanup on poll-loop terminal error — kernel statement leaks (SeaOperationBackend.ts:394-415). Failed/Cancelled/Closed/Unknown/Timeout all throw without closing the kernel-side statement.
    fetchChunk does best-effort close on error; the poll loop doesn't. Fix: best-effort seaClose on all terminal-error branches.
  6. rowLimit/statementConf added to cross-backend ExecuteStatementOptions but silently dropped by Thrift (IDBSQLSession.ts:30-42). Exactly the anti-pattern the PR's own code comments warn against ("silent
    drop is the worst failure mode"). User sets rowLimit: 100 on Thrift, gets full result set, no warning. Fix: warn at the Thrift session when these are passed, or move to an SEA-specific options type.

🟢 P2

  • OperationStateError messages lack statement_id and state name (matches Thrift behavior, just an enhancement).
  • "Secure-by-default" should be owned by the JS layer, not inherited from the kernel default. Add explicit checkServerCertificate: true when undefined for defense in depth.
  • Missing test: cancel-mid-status() (the actual race the P1 Setup E2E tests with GitHub Actions #2 fix needs).
  • Missing test: server-Cancelled + client-Timeout race on same tick.
  • Missing test: PEM with reversed marker order, BEGIN-only, END-only.
  • hasResultSet() hardcoded true even for DDL/DML — leave for now; revisit once napi exposes the kernel's has_result_set bit.

Make the SEA execution path user-configurable between sync and async,
toggled by the EXISTING `runAsync` option — no new public field, exactly
mirroring the Thrift backend's `runAsync` distinction. Default is SYNC
(`runAsync: false`): faster, and with the kernel sync canceller fully
cancellable mid-compute.

SeaSessionBackend.executeStatement now branches on `options.runAsync`:
  - false/undefined (DEFAULT) -> Connection.executeStatementCancellable:
    the kernel blocks on execute() (poll-to-terminal server-side), driven
    lazily in the operation backend's result(). `queryTimeoutSecs` IS
    forwarded (the kernel execute() honours it).
  - true -> Connection.submitStatement (submit + poll), unchanged.
    `queryTimeoutSecs` stays client-side (kernel ignores it on submit).

SeaOperationBackend gains a third dual-mode handle kind,
`cancellableExecution`, alongside `asyncStatement` and `statement`:
  - waitUntilReadyCancellable drives result() to the terminal Statement
    (memoised as the fetch handle + close target);
  - the lifecycle handle is a composite: cancel() routes to the detached
    canceller (lock-free, interrupts a running result() mid-COMPUTE and
    is a no-op once terminal); close() routes to the resolved statement;
  - a cancel-induced result() rejection maps to OperationStateError(
    Canceled) so the DBSQLOperation facade mirrors its cancelled flag,
    matching the Thrift path.

Public API, result shape, schema (TTableSchema), and error classes are
identical across both modes and to Thrift — the only observable
difference is lifecycle timing (when executeStatement resolves).

Built against databricks-sql-kernel napi
639e19ef97decc1c5aa2365c0b3a229c1ccd5b58 (executeStatementCancellable /
CancellableExecution); KERNEL_REV bumped to match. Refreshed the
committed binding surface (native/sea/index.{js,d.ts}).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db deployed to azure-prod June 4, 2026 11:30 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants