Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885) by brendancol · Pull Request #3039 · xarray-contrib/xarray-spatial

brendancol · 2026-06-08T10:55:39Z

Follow-up to #2865, which made the dask output assembly lazy in flow_accumulation_mfd, flow_length_mfd, and watershed_mfd. This does the same for the two eager paths #2885 lists:

_stream_link_mfd_dask
the strahler and shreve paths in stream_order_mfd

(hand_mfd was already lazified separately, so it is left alone.)

Both modules used to call .compute() per tile, run the per-tile kernel eagerly, and stitch the numpy results with da.from_array / da.block, which materialized the whole output raster during the API call. They now build the result with da.map_blocks over the flow_accum array. The flow_accum array is realigned onto the fractions' spatial tile grid (the same rechunk handling watershed_mfd uses for pour points), and the matching 3D fractions strip is sliced inside the block function, so the per-tile kernel runs at compute time.

The boundary-convergence sweep that runs before assembly still reads tile data eagerly. That is inherent to the cross-tile convergence algorithm and stays tracked as the second half of #2885.

Backend coverage

numpy and cupy paths are unchanged. dask+numpy now assembles lazily; dask+cupy goes through the same CPU dask path via the existing map_blocks get()/asarray() wrappers, so it inherits the change.

Test plan

existing dask-vs-numpy parity tests still pass
new: flow_accum chunked differently from fractions still matches numpy (exercises the rechunk realignment)
new: assembly is deferred to compute time (spy on the tile kernel; assert it is invoked during .compute(), not at API-call time)

Closes #2885

Follow-up to #2865. The dask output assembly in _stream_link_mfd_dask and the strahler/shreve paths of stream_order_mfd computed each tile and wrapped it with da.from_array / da.block, materializing the full output raster during the API call. Rebuild the result with da.map_blocks over the flow_accum array (realigned to the fractions' spatial tile grid) and slice the matching fractions strip inside the block function so the per-tile kernel runs at compute time. The boundary-convergence sweep still reads tile data eagerly by design; lazifying that phase is tracked separately in #2885.

brendancol

PR Review: Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None. The change follows the merged #2865 pattern (and the already-lazy hand_mfd), so the established approach carries over directly.

Nits (optional improvements)

The _tile closures in both files drop the defensive if block_info is None guard that _hand_mfd_dask keeps. That is safe here: all three map_blocks calls pass an explicit meta=, so dask never does a block_info=None meta dry-run (verified the closure is not called during graph build). Worth a one-line comment so a later edit that removes meta does not silently reintroduce a None-deref.

What looks good

accum_da.rechunk((chunks_y, chunks_x)) matches the validated spatial shape and reuses the same realignment watershed_mfd uses for pour points, so a flow_accum array chunked differently from the fractions gets correct tile indices instead of wrong ones.
Output dtype (float64) and tiling match the previous da.block assembly, so the downstream coord/attr wrapping is unchanged.
Tests cover the rechunk-realignment path (flow_accum 3x3 vs fractions 2x2) and the laziness contract (spy on the tile kernel, assert assembly runs during .compute(), not at API-call time), parametrized across strahler and shreve.

Checklist

Algorithm unchanged (assembly-only refactor); per-tile kernels untouched
dask+numpy and dask+cupy stay consistent with numpy (existing + new parity tests)
NaN handling unchanged (same stream-mask construction inside the closure)
Edge cases: chunk mismatch covered; single-cell covered by the existing suite
Dask chunk boundaries handled; convergence sweep still eager by design, tracked in #2885
No premature materialization: full-raster da.block / per-tile .compute() removed from the assembly path
Benchmark: none for hydro MFD; #2865 added none either
README / docs: no API or backend-support change
Docstrings: internal functions; comments updated to describe the lazy assembly

)

brendancol

Follow-up review (after `9e66632`)

The single nit from the first pass is addressed: both _tile closures now carry a one-line comment noting block_info is always populated because meta is passed to map_blocks, so the dropped block_info is None guard is intentional.

No new findings. The 31 stream_link_mfd / stream_order_mfd tests pass locally, including the new rechunk-realignment and laziness tests. Nothing else changed between the two pushes.

github-actions Bot added the performance PR touches performance-sensitive code label Jun 8, 2026

brendancol commented Jun 8, 2026

View reviewed changes

Note block_info is never None because map_blocks meta is explicit (#2885

9e66632

)

brendancol commented Jun 8, 2026

View reviewed changes

brendancol merged commit 59c43e8 into main Jun 8, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)#3039

Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)#3039
brendancol merged 2 commits into
mainfrom
issue-2885

brendancol commented Jun 8, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 8, 2026

Backend coverage

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: Keep stream_link_mfd and stream_order_mfd dask assembly lazy (#2885)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review (after 9e66632)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up review (after `9e66632`)