Skip to content

feat(telemetry): add OTLP export writer#961

Merged
EhabY merged 9 commits into
mainfrom
feat/issue-903-export-telemetry-otlp-writer
Jun 2, 2026
Merged

feat(telemetry): add OTLP export writer#961
EhabY merged 9 commits into
mainfrom
feat/issue-903-export-telemetry-otlp-writer

Conversation

@EhabY
Copy link
Copy Markdown
Collaborator

@EhabY EhabY commented May 17, 2026

Summary

  • add OTLP/JSON mapping for telemetry logs, spans, and metric-shaped events
  • add .otlp.zip export writing with POST-ready logs.json, traces.json, and metrics.json
  • cover signal routing and the full OTLP wire output with golden-file snapshots

Most of the diff is tests, not production. Production is 6 files / ~890 lines; test code adds ~1,120 lines and generated golden snapshots (__golden__/*.json) ~1,390 more, so about 73% of the diff is test and golden files. The golden JSON is committed on purpose so OTLP schema changes show up as a reviewable diff.

Refs #903.

Stack: 3 / 4. Base: #960. Next: #953.

Review follow-up

  • exports spans as internal spans instead of unspecified spans
  • includes window start timestamps for HTTP percentile gauges
  • closes partially opened OTLP writers when setup fails
  • drains queued zip writes before closing file handles on error paths
  • adds golden-file snapshots (__golden__/*.json) for record shapes and envelope output so schema changes surface as a JSON diff
  • documents the read high-water mark, stream.destroy() safety, and why the OTLP types stay SDK-free

@EhabY EhabY self-assigned this May 17, 2026
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 199d2e7 to 3a29ac9 Compare May 17, 2026 19:42
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from ab7dcf4 to acadd7e Compare May 17, 2026 19:42
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3a29ac9 to eb776a7 Compare May 18, 2026 17:03
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from acadd7e to 29c269d Compare May 18, 2026 17:03
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch 2 times, most recently from ba095a9 to 8066586 Compare May 19, 2026 10:00
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 2 times, most recently from 5a186aa to 5ee4862 Compare May 19, 2026 10:44
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3d77d32 to c9c3b74 Compare May 19, 2026 11:02
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 5ee4862 to 539131f Compare May 19, 2026 11:02
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from c9c3b74 to 9c5a19d Compare May 20, 2026 10:28
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 539131f to fe6675e Compare May 20, 2026 10:29
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 9c5a19d to 914025b Compare May 20, 2026 13:49
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from fe6675e to 94ac895 Compare May 20, 2026 13:50
EhabY added a commit that referenced this pull request May 21, 2026
- Move src/telemetry/export/writers.ts to writers/json.ts and the
  matching test to writers/json.test.ts. The writer keeps the same
  shape; the doc comment is tightened and the test drops a
  redundant atomic-failure assertion already covered by
  writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
  isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
  `vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
  as a 2-line dynamic import in the affected test files.

Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 55cccf2 to 71bcce7 Compare May 21, 2026 09:55
EhabY added a commit that referenced this pull request May 21, 2026
- Move src/telemetry/export/writers.ts to writers/json.ts and the
  matching test to writers/json.test.ts. The writer keeps the same
  shape; the doc comment is tightened and the test drops a
  redundant atomic-failure assertion already covered by
  writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
  isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
  `vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
  as a 2-line dynamic import in the affected test files.

Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 034f5bc to 5a9edee Compare May 21, 2026 15:13
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 7ce5f22 to d9f955a Compare May 21, 2026 15:14
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch 3 times, most recently from 1671ea2 to 5cbd8ba Compare May 21, 2026 15:51
EhabY added a commit that referenced this pull request May 21, 2026
- Move src/telemetry/export/writers.ts to writers/json.ts and the
  matching test to writers/json.test.ts. The writer keeps the same
  shape; the doc comment is tightened and the test drops a
  redundant atomic-failure assertion already covered by
  writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
  isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
  `vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
  as a 2-line dynamic import in the affected test files.

Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 5cbd8ba to d8a9b52 Compare May 21, 2026 15:54
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 3 times, most recently from 883e121 to d7db126 Compare May 26, 2026 13:57
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 26, 2026

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 17 R1 findings addressed in b90a626. Five reviewers verified the fixes; Mafuuu, Meruem, Melody, and Kite found no new issues. Meruem traced error paths, concurrency interleaving, and test layer boundaries with no findings. Melody walked every pairing chain (dispatch, wire format, transport lifecycle, enumeration) and confirmed all sides echo.

Two P3 test gaps and two Nits remain, all from Bisky. The production code is correct; these are test-quality improvements.

As Meruem put it: "The structural design is clean: error paths release resources, cumulative state is bounded and clamped, the streaming protocol is correct, and test coverage matches the abstraction layers."

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/telemetry/export/writers/otlp/records.test.ts
Comment thread test/unit/telemetry/export/writers/otlp/records.test.ts
Comment thread test/unit/telemetry/export/metrics.test.ts Outdated
Comment thread test/unit/telemetry/export/writers/otlp/envelope.test.ts Outdated
EhabY added a commit that referenced this pull request May 26, 2026
Addresses R2 review feedback on PR #961:
- records: assert the Math.max(0, ...) negative-counter clamp directly
  (existing Infinity case only exercised the isFinite guard).
- records: cover the result="error" + error object combination so a
  regression on the spread of error.message is caught.
- envelope: also assert stream.destroyed on the stream.end failure path
  (matches the sibling suffix-write test).
- metrics: swap %p (not a Vitest format specifier) for %s in it.each
  titles so boolean rows render.
EhabY added a commit that referenced this pull request May 26, 2026
Addresses R2 review feedback on PR #961:
- records: assert the Math.max(0, ...) negative-counter clamp directly
  (existing Infinity case only exercised the isFinite guard).
- records: cover the result="error" + error object combination so a
  regression on the spread of error.message is caught.
- envelope: also assert stream.destroyed on the stream.end failure path
  (matches the sibling suffix-write test).
- metrics: swap %p (not a Vitest format specifier) for %s in it.each
  titles so boolean rows render.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 6 times, most recently from 4929d92 to 22d73a9 Compare May 26, 2026 22:27
@mtojek mtojek requested review from mafredri and mtojek May 27, 2026 11:32
Copy link
Copy Markdown
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Left minor comments, but would like to have another pair of eyes to review the contribution.

Comment thread src/telemetry/export/writers/otlp/envelope.ts
Comment thread src/telemetry/export/writers/otlp/envelope.ts
Comment thread src/telemetry/export/writers/otlp/types.ts Outdated
Comment thread test/unit/telemetry/export/writers/otlp/envelope.test.ts
Comment thread src/telemetry/export/writers/otlp/writer.ts
EhabY added a commit that referenced this pull request May 31, 2026
Addresses R2 review feedback on PR #961:
- records: assert the Math.max(0, ...) negative-counter clamp directly
  (existing Infinity case only exercised the isFinite guard).
- records: cover the result="error" + error object combination so a
  regression on the spread of error.message is caught.
- envelope: also assert stream.destroyed on the stream.end failure path
  (matches the sibling suffix-write test).
- metrics: swap %p (not a Vitest format specifier) for %s in it.each
  titles so boolean rows render.
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 22d73a9 to addc6b9 Compare May 31, 2026 11:51
@EhabY EhabY requested a review from mtojek May 31, 2026 12:26
EhabY added 9 commits June 2, 2026 20:12
- Restructure: writers.ts → writers/json.ts; combine OTLP mapping and
  writer into writers/otlp.ts; lift signal classification into
  export/signal.ts.
- Group all records of each signal under a single Resource block per
  file (was one Resource per event), matching the OTLP spec's
  recommended shape and shrinking exports at scale.
- Adopt OTLP enums from @opentelemetry/api + api-logs; document the
  +1 wire offset on SpanKind.
- Tighten the public surface: only writeOtlpZipExport and
  OtlpExportCounts are exported; record mappers are module-private.
- Hoist per-event metric attributes once; single-pass partition of
  http.requests measurements into counts and gauges.
- Share parseTelemetryTimestampMs between range.ts and the OTLP
  writer's nanos conversion (was duplicated with identical error).
- Extract asyncIterable test helper to test/mocks/; inline the memfs
  vi.mock pair across affected test files.
- Tests now exercise the public API only, dropping side-effect
  assertions about staging cleanup and atomic semantics already
  covered by writeAtomically's own tests.
Wire-format fixes:
- Switch http.requests count_* sums from DELTA to CUMULATIVE temporality
  (Prom/Mimir/Grafana Cloud reject the delta+sum combination).
- Maintain per-series running totals across the export, anchored at the
  first observed window's startTimeUnixNano.
- Suppress zero-cumulative counters so routes that never errored don't
  ship empty data points.

Semantic-convention compliance:
- Use OTel-standard `service.instance.id` (was `coder.session.id`) and
  `host.id` (was `coder.machine.id`) for portability with OTel-aware
  backends. The previous vendor keys were pure aliases.
- Add `schemaUrl` to each ResourceLogs/ResourceSpans/ResourceMetrics
  container and each scope container.
- Stamp scope.version with the extension version.
Group OTLP code into src/telemetry/export/writers/otlp/ with one file per
concern, and hoist metric classification to src/telemetry/export/metrics.ts
so it can be shared by future writers.

Layout:
- metrics.ts:     domain (isMetricEvent, describeMetricEvent, units)
- otlp/writer.ts: public API + orchestration (signal routing, packZip)
- otlp/envelope.ts: streaming JSON envelope writer + error wrapping
- otlp/records.ts:  log/span/metric record builders + helpers
- otlp/types.ts:    OTLP wire-format interfaces (pinned to proto v1.10.0)

Safety fixes surfaced by the layer split:
- envelope: listen for stream 'error' events so failed opens reject pending
  writes instead of hanging; close() is idempotent; suffix-write failures
  during close are labeled as close failures.
- writer:   on loop failure, use Promise.allSettled to close streams without
  masking the original error.
- records:  coerce NaN/Infinity numeric inputs to 0n instead of throwing
  from BigInt(); clamp startTimeUnixNano <= timeUnixNano for out-of-order
  events; include eventName in the cumulative-series key.

Each layer has its own test file; total grows from 99 to 131 cases.
Apply review findings to the OTLP writer modules and their tests.

writer.ts
- Collapse the triplicate `{logs, traces, metrics}` shape into one
  `Record<Signal, Channel>` driven from `ENVELOPES`.
- Extract `appendRecords(channel, records)`; three route branches
  collapse from four lines each to one and count mutation lives in
  one place.
- Replace try/catch + duplicated close calls with a `succeeded` flag
  + try/finally; the close list builds once.
- Collapse `packZip`'s two try blocks into one.
- Type `zipAsync` properly (was `any` from `promisify(zip)`).
- Drop `Object.fromEntries(map(async))` + `as Record<Signal, Channel>`
  cast in `openChannels` in favour of a direct destructured
  `Promise.all`.

records.ts
- Move OTLP envelope metadata (`ENVELOPES`, `envelopePrefix`, schema
  URL, suffix) here so wire-format knowledge stops leaking into
  writer.ts.
- Carry nanosecond timestamps as `bigint` internally; `String()` only
  at the JSON boundary. Eliminates repeated `BigInt(string)` reparses.
- Rename `ExportState`/`newExportState`/`cumulativeStart` to
  `CumulativeState`/`newCumulativeState`/`anchor`.
- Extract a `MetricContext` so `gaugeMetric(ctx, m)` and
  `sumMetric(ctx, m, state)` replace the five-arg `gaugeRecord` and
  `cumulativeSum`.
- Convert `spanStatus` if-chain to a switch on
  `event.properties.result`.
- Convert `metricRecords` for-loop to `flatMap` with a normalized
  0-or-1 record return.
- Have `toNonNegativeBigInt` delegate to `toIntegerBigInt` so the
  rounding rule lives in one place.

envelope.ts
- Drop the `.cause` archaeology in `close()`. Collapse `stream.write`
  and `stream.end` plumbing into one `awaitOp` helper that handles
  the errRef pre-check and post-callback merge once.
- Move error labelling to the three API boundaries (prefix, append,
  close) so the verb stays at the call site rather than threading
  through parameters. Drop the trailing-underscore `reject_`.

metrics.ts
- Replace the `measurementUnit` if-chain with a lookup table.

errorUtils.ts
- Add a shared `wrapError(verb, target, cause)` helper to replace
  six near-identical `Failed to ... ${path}` throws across
  envelope.ts and writer.ts.

tests
- Extract a shared `helpers.ts` with `TRACE_ID`, `attrs()`, and a
  signal-driven `parseEnvelope(files, signal)` that reads keys from
  the production `ENVELOPES` table.
- Drop the duplicate `TRACE_ID`/`attrs` definitions from
  writer.test.ts and records.test.ts.
- Move `gauge`/`counter` measurement builders to module scope and
  type them as `MetricMeasurement`.
- writer.test.ts derives the file-list assertion from `ENVELOPES`;
  resource-consistency test now uses cross-envelope equality + a
  `toMatchObject` spot-check, deferring full shape to records.test.ts.
- Align `makeEvent`/`context` in writer.test.ts with records.test.ts
  (module-level constants instead of let + beforeEach).

Delete comments that restate code; keep only ones recording a
non-obvious WHY.
- match production "count." prefix in the metric classifier so HTTP request
  counts export as cumulative sums instead of dimensionless gauges
- close opened envelope fds on prefix/suffix write failure and on partial
  openChannels rejection
- collision-proof the cumulative counter series key with JSON.stringify
- clamp negative counter values to preserve the monotonic-sum invariant
- count records actually written, not events routed
- narrow spanRecord to require a traceId at the type level
- drop the stale ssh.network.info entry and tighten doc comments
- use promisify(zip) to match supportBundleLogs.ts
Addresses R2 review feedback on PR #961:
- records: assert the Math.max(0, ...) negative-counter clamp directly
  (existing Infinity case only exercised the isFinite guard).
- records: cover the result="error" + error object combination so a
  regression on the spread of error.message is caught.
- envelope: also assert stream.destroyed on the stream.end failure path
  (matches the sibling suffix-write test).
- metrics: swap %p (not a Vitest format specifier) for %s in it.each
  titles so boolean rows render.
- Stamp coder.event.{extension_version,session_id,deployment_url} on
  every log/span/metric record so multi-session and multi-deployment
  exports preserve the original producer's identity. The resource block
  remains an export-tool snapshot.
- Stage envelopes in os.tmpdir() instead of next to the user's save
  path so cloud-sync agents do not ingest the intermediate uncompressed
  logs.json/traces.json/metrics.json.
- Replace the in-memory fflate.zip with streaming Zip + ZipDeflate so
  multi-GB exports do not hold every envelope plus the zipped output in
  the V8 heap at once.
- Forward staging-cleanup failures via
  OtlpWriteOptions.onStagingCleanupError instead of letting a Windows
  EBUSY in finally mask an otherwise-successful export (which
  writeAtomically would then erase via temp-file cleanup).
- Increment metric channel counts per event rather than per non-empty
  record batch so metric events with all-zero counters stay in the
  user-visible total and JSON/OTLP exports of the same range agree.
- Accept an optional AbortSignal so callers can cancel a long-running
  export between events and between files.
- add golden-file snapshots for OTLP record shapes and envelope wire output
- share one representative export scenario via a helpers fixture
- document the HWM constant, stream.destroy() safety, and SDK-free types
- ignore golden snapshots in prettier so they stay byte-identical
@EhabY EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from aa47ba1 to 3afed75 Compare June 2, 2026 17:13
@EhabY EhabY merged commit f8c10f6 into main Jun 2, 2026
13 checks passed
@EhabY EhabY deleted the feat/issue-903-export-telemetry-otlp-writer branch June 2, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants