feat: support multimodal (image) tool results (#363)#666
Conversation
Detect when a tool returns an Array<ContentPart> and pass it through to the adapter instead of JSON.stringify-ing it, so multimodal tool outputs (e.g. an image) can reach the model. Detection is structural (isContentPartArray) and opt-in by shape; strings and all other return values serialize exactly as before. AG-UI stream events stay string-only per spec — the array travels on the tool ModelMessage.
ai-client redeclares ToolResultPart; widen its content to string | Array<ContentPart> to stay compatible with @tanstack/ai, and handle array content in the devtools fixture-hydration path.
…uts (#363) Convert an Array<ContentPart> tool result into each provider's native multimodal tool-output format: OpenAI Responses function_call_output.output, Anthropic tool_result content blocks, and Gemini functionResponse.parts. The Chat Completions path keeps the documented stringify fallback (its API has no multimodal tool message).
Add a deterministic wire-assertion spec proving the OpenAI Responses adapter sends a structured input_image in function_call_output (Anthropic/Gemini end-to-end + unit-covered). Add the /image-tool-repro example: a server tool returns an image of a secret number; the model now reads it back.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughEnables tools to return multimodal results (Array), adds runtime guards and normalizeToolResult, updates tool execution/stream processing to preserve structured arrays, adapts OpenAI/Anthropic/Gemini adapters to emit provider-native multimodal payloads, updates tests and an example repro, and stringifies non-string content for AG-UI events. ChangesMultimodal Tool Results Support
Sequence DiagramsequenceDiagram
participant User
participant Chat as chat()
participant Tool as ToolExecution
participant Adapter as ProviderAdapter
participant Model as LLM
User->>Chat: request that triggers a tool
Chat->>Tool: execute tool()
Tool->>Tool: returns string or Array<ContentPart>
Tool->>Chat: ToolCallManager receives result
Chat->>Chat: normalizeToolResult(result)
Chat->>Chat: emit TOOL_CALL_RESULT (wireContent string)
Chat->>Adapter: emit tool-role ModelMessage (structured array preserved)
Adapter->>Adapter: convert ContentPart[] to provider-specific format
Adapter->>Model: send request with multimodal output
Model->>Chat: respond using tool output
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview5 package(s) bumped directly, 25 bumped as dependents. 🟥 Major bumps
🟨 Minor bumps
🟩 Patch bumps
|
|
View your CI Pipeline Execution ↗ for commit 430fca2
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/ai/tests/tool-result.test.ts (1)
1-68: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winMove this unit test alongside the source module.
This is a unit test for
../src/utilities/tool-result, but it is placed inpackages/ai/tests/instead of next to the source file. Please colocate it with the source module per repo rule.As per coding guidelines, "Place unit tests in *.test.ts files alongside source files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/tests/tool-result.test.ts` around lines 1 - 68, Tests for utilities/tool-result (testing isContentPart, isContentPartArray, normalizeToolResult) are misplaced in the tests directory; move the tool-result.test.ts file to be colocated with the source module file that exports those functions (the utilities/tool-result module), update its imports to use the local relative path (e.g., import from './tool-result' instead of the cross-package path), and run the test suite to ensure imports still resolve and no path references remain to the old tests location.packages/ai/src/activities/chat/tools/tool-calls.ts (1)
226-247:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep
TOOL_CALL_END.resultstringified here.
normalizeToolResult(result)can returnContentPart[], butexecuteTools()now yields that value directly asToolCallEndEvent.result. This class is public and documented as emittingTOOL_CALL_ENDfor client visibility, so directToolCallManagerconsumers would now see spec-invalid non-string wire payloads. Preserve the array on the returnedrole: 'tool'message, but serialize the event field before yielding it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/src/activities/chat/tools/tool-calls.ts` around lines 226 - 247, The TOOL_CALL_END event is yielding a non-string result when normalizeToolResult(result) returns ContentPart[]; update executeTools() so that before emitting the TOOL_CALL_END (the object cast as ToolCallEndEvent) you ensure toolResultContent is a string—e.g. if normalizeToolResult(...) returns an array or non-string, JSON.stringify it (preserve the original array on the role: 'tool' message only) so ToolCallEndEvent.result is always a serialized string; adjust the catch branch and the branch for missing execute similarly to guarantee TOOL_CALL_END.result is a string.
🧹 Nitpick comments (5)
packages/ai-anthropic/tests/tool-result-multimodal.test.ts (1)
1-4: ⚡ Quick winMove this unit test alongside the source adapter file.
This new unit test is under
packages/ai-anthropic/tests/, but the repository rule requires colocated*.test.tsfiles next to source.As per coding guidelines
**/*.test.ts: Place unit tests in *.test.ts files alongside source files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-anthropic/tests/tool-result-multimodal.test.ts` around lines 1 - 4, Move the test file from packages/ai-anthropic/tests/tool-result-multimodal.test.ts to be colocated with the source adapter (next to ../src/adapters/text), so it lives alongside AnthropicTextAdapter; update the import paths in the test to use relative paths from the new location (ensure imports like AnthropicTextAdapter and any test helpers or '`@tanstack/ai`' remain correct) and remove the separate tests/ directory usage to comply with the repository rule that *.test.ts files sit next to their source.packages/ai-gemini/tests/tool-result-multimodal.test.ts (1)
1-4: ⚡ Quick winPlace this unit test next to the Gemini adapter source file.
The new test is in
packages/ai-gemini/tests/, but the project guideline requires*.test.tsunit tests to live alongside the implementation.As per coding guidelines
**/*.test.ts: Place unit tests in *.test.ts files alongside source files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-gemini/tests/tool-result-multimodal.test.ts` around lines 1 - 4, The unit test file tool-result-multimodal.test.ts must be moved from the tests folder to live alongside the Gemini adapter implementation and its imports updated accordingly; move the test next to the GeminiTextAdapter source file, update the import of GeminiTextAdapter (and any other local imports) to use the correct relative path from the adapter directory (e.g., change '../src/adapters/text' to the local './text' or appropriate relative import), and run the test suite to ensure imports resolve and module scope matches project test-location guidelines.examples/ts-react-chat/src/routes/api.image-tool-repro.ts (1)
32-34: 💤 Low valueHTTP 499 is a non-standard status code.
Status code
499is nginx-specific for "Client Closed Request" and not part of the HTTP standard. While it conveys intent, standard codes like408 Request Timeoutor499from RFC 7231 Appendix B would be more portable across different HTTP clients and proxies.Alternative: use standard 408
-return new Response(null, { status: 499 }) +return new Response(null, { status: 408 })Also applies to: 61-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/ts-react-chat/src/routes/api.image-tool-repro.ts` around lines 32 - 34, The code returns a non-standard HTTP status 499 when request.signal.aborted (see the check using request.signal.aborted and the Response construction new Response(null, { status: 499 })); replace this with a standard status such as 408 (Request Timeout) for portability and update all other occurrences of the same pattern (the second instance around the other Response(null, { status: 499 }) use) so both places return 408 (or another chosen standard status) instead of 499.testing/e2e/src/routes/api.multimodal-tool-result-wire.ts (1)
70-78: 💤 Low valueUnconventional: HTTP 200 returned on error.
The error handler returns status
200with{ok: false, error: "..."}rather than a 4xx/5xx status. While this is non-standard REST practice, it's acceptable for a test-only endpoint where the goal is to distinguish adapter serialization crashes (caught here) from network/HTTP errors (which would return non-200 status codes naturally).For production endpoints, prefer standard HTTP status codes (e.g., 500 for server errors, 400 for bad requests).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/e2e/src/routes/api.multimodal-tool-result-wire.ts` around lines 70 - 78, The catch block in the route handler in api.multimodal-tool-result-wire.ts currently returns HTTP 200 on error; change the response to an appropriate error status (e.g., 500 for server errors) so the Response created in the catch uses { status: 500, headers: { 'Content-Type': 'application/json' } } instead of 200 and still returns the JSON payload with ok: false and the error message; locate the catch handling code in this file (the try/catch surrounding the handler logic) and update the status value accordingly.examples/ts-react-chat/scripts/make-repro-image.mjs (1)
12-13: ⚡ Quick winManual synchronization risk between SECRET constants.
The comment requires manual sync between this script's
SECRETandREPRO_SECRETinsrc/lib/image-tool-repro.ts. If they drift, the test verdict logic will break but the error won't be immediately obvious.Consider one of these approaches:
- Export
REPRO_SECRETfrom a shared.mjsmodule that both files import- Add a build-time verification script that parses both files and asserts equality
- Move the secret to an environment variable or config file
Example: shared constant module
Create
src/lib/repro-secret.mjs:export const REPRO_SECRET = '473'Then import in both places:
+import { REPRO_SECRET } from '../src/lib/repro-secret.mjs' -const SECRET = '473' +const SECRET = REPRO_SECRET🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/ts-react-chat/scripts/make-repro-image.mjs` around lines 12 - 13, The SECRET value in make-repro-image.mjs is manually duplicated from REPRO_SECRET in src/lib/image-tool-repro.ts; extract the secret into a single shared source (e.g., create a small module that exports REPRO_SECRET or use an environment/config variable) and update both places to import/read that single source; specifically, replace the inline SECRET constant in make-repro-image.mjs and the inline REPRO_SECRET usage in image-tool-repro.ts to reference the shared export (or process.env variable) so they cannot drift, and add a small runtime/build assertion that the imported value is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/src/activities/chat/messages.ts`:
- Around line 341-344: The conversion path is stripping multimodal tool outputs
by always running tool message content through getTextContent; update
modelMessageToUIMessage (and modelMessagesToUIMessages which delegates to it) to
detect when a ModelMessage has role === 'tool' and content is already a
ContentPart[] (or Array.isArray(msg.content)) and in that case preserve and pass
the ContentPart[] through unchanged instead of calling getTextContent; ensure
normalizeToolResult still produces ContentPart[] where used and that the
UIMessage produced for tool role carries the original array so images/audio/docs
are not lost on persist/rehydrate.
---
Outside diff comments:
In `@packages/ai/src/activities/chat/tools/tool-calls.ts`:
- Around line 226-247: The TOOL_CALL_END event is yielding a non-string result
when normalizeToolResult(result) returns ContentPart[]; update executeTools() so
that before emitting the TOOL_CALL_END (the object cast as ToolCallEndEvent) you
ensure toolResultContent is a string—e.g. if normalizeToolResult(...) returns an
array or non-string, JSON.stringify it (preserve the original array on the role:
'tool' message only) so ToolCallEndEvent.result is always a serialized string;
adjust the catch branch and the branch for missing execute similarly to
guarantee TOOL_CALL_END.result is a string.
In `@packages/ai/tests/tool-result.test.ts`:
- Around line 1-68: Tests for utilities/tool-result (testing isContentPart,
isContentPartArray, normalizeToolResult) are misplaced in the tests directory;
move the tool-result.test.ts file to be colocated with the source module file
that exports those functions (the utilities/tool-result module), update its
imports to use the local relative path (e.g., import from './tool-result'
instead of the cross-package path), and run the test suite to ensure imports
still resolve and no path references remain to the old tests location.
---
Nitpick comments:
In `@examples/ts-react-chat/scripts/make-repro-image.mjs`:
- Around line 12-13: The SECRET value in make-repro-image.mjs is manually
duplicated from REPRO_SECRET in src/lib/image-tool-repro.ts; extract the secret
into a single shared source (e.g., create a small module that exports
REPRO_SECRET or use an environment/config variable) and update both places to
import/read that single source; specifically, replace the inline SECRET constant
in make-repro-image.mjs and the inline REPRO_SECRET usage in image-tool-repro.ts
to reference the shared export (or process.env variable) so they cannot drift,
and add a small runtime/build assertion that the imported value is present.
In `@examples/ts-react-chat/src/routes/api.image-tool-repro.ts`:
- Around line 32-34: The code returns a non-standard HTTP status 499 when
request.signal.aborted (see the check using request.signal.aborted and the
Response construction new Response(null, { status: 499 })); replace this with a
standard status such as 408 (Request Timeout) for portability and update all
other occurrences of the same pattern (the second instance around the other
Response(null, { status: 499 }) use) so both places return 408 (or another
chosen standard status) instead of 499.
In `@packages/ai-anthropic/tests/tool-result-multimodal.test.ts`:
- Around line 1-4: Move the test file from
packages/ai-anthropic/tests/tool-result-multimodal.test.ts to be colocated with
the source adapter (next to ../src/adapters/text), so it lives alongside
AnthropicTextAdapter; update the import paths in the test to use relative paths
from the new location (ensure imports like AnthropicTextAdapter and any test
helpers or '`@tanstack/ai`' remain correct) and remove the separate tests/
directory usage to comply with the repository rule that *.test.ts files sit next
to their source.
In `@packages/ai-gemini/tests/tool-result-multimodal.test.ts`:
- Around line 1-4: The unit test file tool-result-multimodal.test.ts must be
moved from the tests folder to live alongside the Gemini adapter implementation
and its imports updated accordingly; move the test next to the GeminiTextAdapter
source file, update the import of GeminiTextAdapter (and any other local
imports) to use the correct relative path from the adapter directory (e.g.,
change '../src/adapters/text' to the local './text' or appropriate relative
import), and run the test suite to ensure imports resolve and module scope
matches project test-location guidelines.
In `@testing/e2e/src/routes/api.multimodal-tool-result-wire.ts`:
- Around line 70-78: The catch block in the route handler in
api.multimodal-tool-result-wire.ts currently returns HTTP 200 on error; change
the response to an appropriate error status (e.g., 500 for server errors) so the
Response created in the catch uses { status: 500, headers: { 'Content-Type':
'application/json' } } instead of 200 and still returns the JSON payload with
ok: false and the error message; locate the catch handling code in this file
(the try/catch surrounding the handler logic) and update the status value
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4086152f-7a09-429e-8af0-82d6619e137a
⛔ Files ignored due to path filters (1)
examples/ts-react-chat/public/repro-secret.pngis excluded by!**/*.png
📒 Files selected for processing (29)
.changeset/multimodal-tool-results.mdexamples/ts-react-chat/scripts/make-repro-image.mjsexamples/ts-react-chat/src/lib/image-tool-repro.tsexamples/ts-react-chat/src/routeTree.gen.tsexamples/ts-react-chat/src/routes/api.image-tool-repro.tsexamples/ts-react-chat/src/routes/image-tool-repro.tsxpackages/ai-anthropic/src/adapters/text.tspackages/ai-anthropic/tests/tool-result-multimodal.test.tspackages/ai-client/src/devtools.tspackages/ai-client/src/types.tspackages/ai-gemini/src/adapters/text.tspackages/ai-gemini/tests/tool-result-multimodal.test.tspackages/ai/src/activities/chat/index.tspackages/ai/src/activities/chat/messages.tspackages/ai/src/activities/chat/stream/message-updaters.tspackages/ai/src/activities/chat/stream/processor.tspackages/ai/src/activities/chat/tools/tool-calls.tspackages/ai/src/index.tspackages/ai/src/types.tspackages/ai/src/utilities/ag-ui-wire.tspackages/ai/src/utilities/tool-result.tspackages/ai/tests/multimodal-tool-result.test.tspackages/ai/tests/tool-result.test.tspackages/openai-base/src/adapters/chat-completions-text.tspackages/openai-base/src/adapters/responses-text.tspackages/openai-base/tests/responses-text.test.tstesting/e2e/src/routeTree.gen.tstesting/e2e/src/routes/api.multimodal-tool-result-wire.tstesting/e2e/tests/multimodal-tool-result-wire.spec.ts
| messageList.push({ | ||
| role: 'tool', | ||
| content: JSON.stringify(part.output), | ||
| content: normalizeToolResult(part.output), | ||
| toolCallId: part.id, |
There was a problem hiding this comment.
Reverse conversion still strips multimodal tool outputs.
This path can now emit role: 'tool' messages with ContentPart[], but modelMessageToUIMessage() and modelMessagesToUIMessages() below still run tool-message content through getTextContent(...). Any persisted, replayed, or rehydrated ModelMessage[] will therefore lose image/audio/document tool results on the way back to UIMessage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/ai/src/activities/chat/messages.ts` around lines 341 - 344, The
conversion path is stripping multimodal tool outputs by always running tool
message content through getTextContent; update modelMessageToUIMessage (and
modelMessagesToUIMessages which delegates to it) to detect when a ModelMessage
has role === 'tool' and content is already a ContentPart[] (or
Array.isArray(msg.content)) and in that case preserve and pass the ContentPart[]
through unchanged instead of calling getTextContent; ensure normalizeToolResult
still produces ContentPart[] where used and that the UIMessage produced for tool
role carries the original array so images/audio/docs are not lost on
persist/rehydrate.
…in metrics ToolResultPart.content was widened to string | Array<ContentPart> for multimodal tool results, breaking the string-typed lookup in metrics.ts. Coerce non-string content to a serialized form so the type check passes while preserving the existing execute_typescript JSON-parse behavior.
Problem
Tool results were always coerced to a string via
JSON.stringify()before reaching a provider adapter, making it impossible to return multimodal content (e.g. an image) from a tool — even though the OpenAI Responses, Anthropic, and Gemini APIs all support multimodal tool/function outputs. This blocks use cases like returning a screenshot from a tool so the model can see it.Closes #363.
Solution
isContentPartArray/normalizeToolResultin@tanstack/ai): a tool that returns a non-empty array whose every element is a validContentPartis passed through unchanged; strings and all other return values serialize exactly as before, so there are no breaking changes.ToolResultPart.contentwidened tostring | Array<ContentPart>(in both@tanstack/aiand@tanstack/ai-client).function_call_output.output(input_image/input_text)tool_resultcontent blocks (also fixes a latent bug where non-string tool content became'')functionResponse.parts(inlineData/fileData), text →response.contentTOOL_CALL_RESULT.content/TOOL_CALL_END.resultstream events remain string-only per spec; the multimodal array travels on the tool message itself.No SDK upgrades required — the installed
openai/@anthropic-ai/sdk/@google/genaiversions already type multimodal tool outputs.Testing
buildToolResultChunks) assertingContentPart[]is preserved as an array to the adapter while plain objects still stringify.input_imageinfunction_call_output; Anthropic/Gemini covered end-to-end + by unit tests (aimock's journal normalizes away multimodal tool content).examples/ts-react-chat/image-tool-reproroute — a server tool returns an image of a secret number; the model now reads it back correctly (it could not before).🤖 Generated with Claude Code
Summary by CodeRabbit