Skip to content

feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata#1568

Open
timsaucer wants to merge 8 commits into
apache:mainfrom
timsaucer:feat/df54-arrow-cast-fns
Open

feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata#1568
timsaucer wants to merge 8 commits into
apache:mainfrom
timsaucer:feat/df54-arrow-cast-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

@timsaucer timsaucer commented Jun 4, 2026

Which issue does this PR close?

No tracking issue; gap surfaced during the v54 upstream coverage audit.

Rationale for this change

Five scalar functions from datafusion::functions::expr_fn (DataFusion 54) were not exposed through the Python bindings. They round out the Arrow type-introspection and casting surface alongside the existing arrow_typeof, arrow_cast, and arrow_metadata wrappers.

What changes are included in this PR?

  • Exposed functions
  • Add docstring tests or unit tests as appropriate
  • skills/datafusion_python/SKILL.md: list the new functions and document the cast_to_type kwarg behavior so users understand the single-entry-point design.

Are there any user-facing changes?

Yes. Five new public functions in datafusion.functions:

  • arrow_field(expr)
  • arrow_try_cast(expr, data_type)
  • cast_to_type(value, type_ref, *, try_cast=False)
  • with_metadata(expr, metadata)

No breaking changes.

timsaucer and others added 8 commits June 4, 2026 12:56
Adds Python bindings for five scalar functions from
datafusion::functions::expr_fn that were not previously surfaced:

- arrow_field: returns a struct describing an expression's Arrow field
  (name, data_type, nullable, metadata).
- arrow_try_cast: like arrow_cast but yields NULL on cast failure.
- cast_to_type / try_cast_to_type: casts a value to the type of a
  reference expression. These are exposed as a single Python entry
  point cast_to_type(value, type_ref, *, try_cast=False); the kwarg
  switches between the strict and try variants.
- with_metadata: attach Arrow field metadata; the inverse of
  arrow_metadata. Accepts a dict[str, str] for ergonomics.

Updates skills/datafusion_python/SKILL.md to list the new functions
and documents the cast_to_type kwarg behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit exposed cast_to_type and try_cast_to_type as two
separate pyo3 bindings and unified them in the Python wrapper via a
try_cast kwarg. That left try_cast_to_type in datafusion._internal
without a matching public Python name, breaking
test_datafusion_missing_exports.

Move the dispatch into the rust binding: cast_to_type now takes a
try_cast kwarg and selects between functions::expr_fn::cast_to_type
and try_cast_to_type internally. Only one pyo3 binding is registered,
so the wrapper-coverage check passes and the Python entrypoint is
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors arrow_cast: arrow_try_cast now accepts `pa.DataType` in addition
to `str` and `Expr`. Adds `Expr.try_cast(pa.DataType)` PyO3 binding for
the pyarrow-type routing path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty `metadata` dict now returns the input expression unchanged
(previously bubbled an opaque DataFusion error about minimum arg
count). Empty keys raise `ValueError` to match the docstring contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous doctest set metadata on the input field but only checked the
name — the metadata setup was dead. Now the example asserts the full
returned struct (name, data_type, nullable, metadata) so the demo
shows what the function actually produces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ith_metadata

Mirrors the existing test_arrow_cast pattern. Covers:
- arrow_try_cast: string-syntax, pa.DataType, and null-on-failure paths
- arrow_field: full returned struct shape (name, data_type, nullable, metadata)
- cast_to_type: type-from-expr happy path and try_cast=True null behavior
- with_metadata: round-trip through arrow_metadata, empty-dict no-op, and
  empty-key ValueError

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Folds the previous four cast tests (arrow_cast + arrow_try_cast × str
+ pyarrow target type) into a single parameterized test that runs both
functions across all five target-type variants. Collapses the two
cast_to_type tests (happy path + try_cast=True) into one parameterized
test, and parameterizes arrow_try_cast null-on-failure over both
target-type syntaxes. 7 test functions, 19 cases — net less code, same
coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a one-line cross-reference so users with a known target type
reach for arrow_cast / arrow_try_cast instead of building a sentinel
expression to feed cast_to_type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant