Skip to content

Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)#3039

Merged
brendancol merged 2 commits into
mainfrom
issue-2885
Jun 8, 2026
Merged

Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)#3039
brendancol merged 2 commits into
mainfrom
issue-2885

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Follow-up to #2865, which made the dask output assembly lazy in flow_accumulation_mfd, flow_length_mfd, and watershed_mfd. This does the same for the two eager paths #2885 lists:

  • _stream_link_mfd_dask
  • the strahler and shreve paths in stream_order_mfd

(hand_mfd was already lazified separately, so it is left alone.)

Both modules used to call .compute() per tile, run the per-tile kernel eagerly, and stitch the numpy results with da.from_array / da.block, which materialized the whole output raster during the API call. They now build the result with da.map_blocks over the flow_accum array. The flow_accum array is realigned onto the fractions' spatial tile grid (the same rechunk handling watershed_mfd uses for pour points), and the matching 3D fractions strip is sliced inside the block function, so the per-tile kernel runs at compute time.

The boundary-convergence sweep that runs before assembly still reads tile data eagerly. That is inherent to the cross-tile convergence algorithm and stays tracked as the second half of #2885.

Backend coverage

numpy and cupy paths are unchanged. dask+numpy now assembles lazily; dask+cupy goes through the same CPU dask path via the existing map_blocks get()/asarray() wrappers, so it inherits the change.

Test plan

  • existing dask-vs-numpy parity tests still pass
  • new: flow_accum chunked differently from fractions still matches numpy (exercises the rechunk realignment)
  • new: assembly is deferred to compute time (spy on the tile kernel; assert it is invoked during .compute(), not at API-call time)

Closes #2885

Follow-up to #2865. The dask output assembly in _stream_link_mfd_dask
and the strahler/shreve paths of stream_order_mfd computed each tile and
wrapped it with da.from_array / da.block, materializing the full output
raster during the API call. Rebuild the result with da.map_blocks over
the flow_accum array (realigned to the fractions' spatial tile grid) and
slice the matching fractions strip inside the block function so the
per-tile kernel runs at compute time.

The boundary-convergence sweep still reads tile data eagerly by design;
lazifying that phase is tracked separately in #2885.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 8, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None. The change follows the merged #2865 pattern (and the already-lazy hand_mfd), so the established approach carries over directly.

Nits (optional improvements)

  • The _tile closures in both files drop the defensive if block_info is None guard that _hand_mfd_dask keeps. That is safe here: all three map_blocks calls pass an explicit meta=, so dask never does a block_info=None meta dry-run (verified the closure is not called during graph build). Worth a one-line comment so a later edit that removes meta does not silently reintroduce a None-deref.

What looks good

  • accum_da.rechunk((chunks_y, chunks_x)) matches the validated spatial shape and reuses the same realignment watershed_mfd uses for pour points, so a flow_accum array chunked differently from the fractions gets correct tile indices instead of wrong ones.
  • Output dtype (float64) and tiling match the previous da.block assembly, so the downstream coord/attr wrapping is unchanged.
  • Tests cover the rechunk-realignment path (flow_accum 3x3 vs fractions 2x2) and the laziness contract (spy on the tile kernel, assert assembly runs during .compute(), not at API-call time), parametrized across strahler and shreve.

Checklist

  • Algorithm unchanged (assembly-only refactor); per-tile kernels untouched
  • dask+numpy and dask+cupy stay consistent with numpy (existing + new parity tests)
  • NaN handling unchanged (same stream-mask construction inside the closure)
  • Edge cases: chunk mismatch covered; single-cell covered by the existing suite
  • Dask chunk boundaries handled; convergence sweep still eager by design, tracked in #2885
  • No premature materialization: full-raster da.block / per-tile .compute() removed from the assembly path
  • Benchmark: none for hydro MFD; #2865 added none either
  • README / docs: no API or backend-support change
  • Docstrings: internal functions; comments updated to describe the lazy assembly

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 9e66632)

The single nit from the first pass is addressed: both _tile closures now carry a one-line comment noting block_info is always populated because meta is passed to map_blocks, so the dropped block_info is None guard is intentional.

No new findings. The 31 stream_link_mfd / stream_order_mfd tests pass locally, including the new rechunk-realignment and laziness tests. Nothing else changed between the two pushes.

@brendancol brendancol merged commit 59c43e8 into main Jun 8, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend lazy MFD dask assembly to stream/hand modules and study the convergence phase

1 participant