Skip to content

[C++][Gandiva] castVARCHAR(decimal128) can corrupt native memory and return invalid buffers. #50140

@lriggs

Description

@lriggs

Describe the bug, including details regarding any error messages, version, and platform.

[C++][Gandiva] castVARCHAR(decimal128) can corrupt native memory and return invalid buffers.

Describe the bug

The Gandiva castVARCHAR_decimal128_int64 function path can corrupt native
memory and crash the host process (SIGSEGV) when the arena allocation for the
output string fails — for example when a CAST(decimal AS VARCHAR) runs under
memory pressure.

There are three independent problems that combine to produce the crash:

1. castVARCHAR decimal128 entry is missing kCanReturnErrors

In function_registry_string.cc, the castVARCHAR registry entry for
decimal128 is registered with only NativeFunction::kNeedsContext. Unlike the
other error-producing cast/string functions, it does not set
NativeFunction::kCanReturnErrors.

Because of this, generated LLVM code assumes the function can never fail and
skips the post-call error check. Any error the function reports via the context
is silently ignored, and execution continues with whatever (invalid) buffer and
length the function returned.

2. gdv_fn_dec_to_string reports a positive length on allocation failure

In gdv_function_stubs.cc, gdv_fn_dec_to_string writes the output length
before it checks whether the allocation succeeded:

*dec_str_len = static_cast<int32_t>(dec_str.length());   // positive length
char* ret = reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *dec_str_len));
if (ret == nullptr) {
  // error is set, but *dec_str_len is still positive
  return nullptr;
}

When the allocation fails, the function returns nullptr while *dec_str_len
still holds a positive value. The caller then copies from a null/invalid buffer
using that positive length, i.e. effectively memcpy(dst, nullptr, positive_len),
which is undefined behavior and crashes.

3. castVARCHAR_decimal128_int64 does not validate its output length

In precompiled/decimal_wrapper.cc, castVARCHAR_decimal128_int64 computes the
truncated length and dereferences/returns the buffer from gdv_fn_dec_to_string
without:

  • validating that the requested output length (out_len_param) is non-negative, or
  • handling the case where the upstream allocation failed.

A negative output length flows straight through into the output length used by
the copy, which can produce a huge unsigned size when interpreted by the memory
copy routine.

Component(s)

C++, Gandiva

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions