feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata#1568
Open
timsaucer wants to merge 8 commits into
Open
feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata#1568timsaucer wants to merge 8 commits into
timsaucer wants to merge 8 commits into
Conversation
Adds Python bindings for five scalar functions from datafusion::functions::expr_fn that were not previously surfaced: - arrow_field: returns a struct describing an expression's Arrow field (name, data_type, nullable, metadata). - arrow_try_cast: like arrow_cast but yields NULL on cast failure. - cast_to_type / try_cast_to_type: casts a value to the type of a reference expression. These are exposed as a single Python entry point cast_to_type(value, type_ref, *, try_cast=False); the kwarg switches between the strict and try variants. - with_metadata: attach Arrow field metadata; the inverse of arrow_metadata. Accepts a dict[str, str] for ergonomics. Updates skills/datafusion_python/SKILL.md to list the new functions and documents the cast_to_type kwarg behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit exposed cast_to_type and try_cast_to_type as two separate pyo3 bindings and unified them in the Python wrapper via a try_cast kwarg. That left try_cast_to_type in datafusion._internal without a matching public Python name, breaking test_datafusion_missing_exports. Move the dispatch into the rust binding: cast_to_type now takes a try_cast kwarg and selects between functions::expr_fn::cast_to_type and try_cast_to_type internally. Only one pyo3 binding is registered, so the wrapper-coverage check passes and the Python entrypoint is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors arrow_cast: arrow_try_cast now accepts `pa.DataType` in addition to `str` and `Expr`. Adds `Expr.try_cast(pa.DataType)` PyO3 binding for the pyarrow-type routing path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty `metadata` dict now returns the input expression unchanged (previously bubbled an opaque DataFusion error about minimum arg count). Empty keys raise `ValueError` to match the docstring contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous doctest set metadata on the input field but only checked the name — the metadata setup was dead. Now the example asserts the full returned struct (name, data_type, nullable, metadata) so the demo shows what the function actually produces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ith_metadata Mirrors the existing test_arrow_cast pattern. Covers: - arrow_try_cast: string-syntax, pa.DataType, and null-on-failure paths - arrow_field: full returned struct shape (name, data_type, nullable, metadata) - cast_to_type: type-from-expr happy path and try_cast=True null behavior - with_metadata: round-trip through arrow_metadata, empty-dict no-op, and empty-key ValueError Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Folds the previous four cast tests (arrow_cast + arrow_try_cast × str + pyarrow target type) into a single parameterized test that runs both functions across all five target-type variants. Collapses the two cast_to_type tests (happy path + try_cast=True) into one parameterized test, and parameterizes arrow_try_cast null-on-failure over both target-type syntaxes. 7 test functions, 19 cases — net less code, same coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a one-line cross-reference so users with a known target type reach for arrow_cast / arrow_try_cast instead of building a sentinel expression to feed cast_to_type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
No tracking issue; gap surfaced during the v54 upstream coverage audit.
Rationale for this change
Five scalar functions from
datafusion::functions::expr_fn(DataFusion 54) were not exposed through the Python bindings. They round out the Arrow type-introspection and casting surface alongside the existingarrow_typeof,arrow_cast, andarrow_metadatawrappers.What changes are included in this PR?
skills/datafusion_python/SKILL.md: list the new functions and document thecast_to_typekwarg behavior so users understand the single-entry-point design.Are there any user-facing changes?
Yes. Five new public functions in
datafusion.functions:arrow_field(expr)arrow_try_cast(expr, data_type)cast_to_type(value, type_ref, *, try_cast=False)with_metadata(expr, metadata)No breaking changes.