Skip to content

merge() output-size guard blocks legitimate lazy dask merges #3048

@brendancol

Description

@brendancol

Describe the bug

merge() (xrspatial/reproject/__init__.py) computes the output grid with _compute_output_grid before it decides whether the result will be dask-backed, so the 1-billion-pixel memory guard runs against every merge. The function then detects dask inputs and even auto-promotes large in-memory merges to the lazy _merge_dask path. A merge whose output exceeds 1e9 pixels but runs lazily through _merge_dask is still rejected by the guard.

This is the same pattern fixed for reproject() in #3046 / #3047.

Why it wasn't fixed alongside #3046

The reproject fix detects the backend before computing the grid and passes lazy_output into _compute_output_grid. For merge() the ordering is trickier: the auto-promote-to-dask decision (around line 2328) depends on out_shape, which only exists after the grid is computed. Real dask inputs can be detected up front, but the "large in-memory output auto-promotes to dask" case needs a small restructuring to know the output will be lazy before the guard fires. Kept separate to avoid expanding the scope of #3046.

Expected behavior

Lazy dask merges (dask inputs, or large in-memory merges that auto-promote to _merge_dask) should not be blocked by the 1e9 guard. The guard should still protect the in-memory merge path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdaskDask backend / chunked arraysoomOut-of-memory risk with large datasets

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions