Skip to content

Code generation for ThingClient subclasses#89

Draft
rwb27 wants to merge 8 commits into
mainfrom
client-code-generation
Draft

Code generation for ThingClient subclasses#89
rwb27 wants to merge 8 commits into
mainfrom
client-code-generation

Conversation

@rwb27

@rwb27 rwb27 commented Dec 4, 2024

Copy link
Copy Markdown
Collaborator

Currently, ThingClient subclasses are generated on-the-fly from a Thing Description. This "gets the job done" and means it's always possible to have a client that's up to date with the server. However, it would be nice to support static analysis, e.g. type checking, autocompletion, etc. with importable client classes for specific instruments.

Functionality

This PR implements client code generation, specifically:

  • A client class is created, subclassing ThingClient and named as per the title of the Thing Description.
  • Properties are added using their names from the TD
    • Property types are converted to Python type hints (with very basic conversion)
    • Read-only properties are made read-only. Write-only properties still have a read method.
    • Implementations use get_property, set_property as provided by ThingClient
  • Actions are added using names from the TD
    • The input schema is broken down into its properties (i.e. arguments), and each property's schema is converted to an argument with a type hint and default.
    • Required properties of the input schema become required arguments
    • Additional keyword arguments are always accepted and passed to the endpoint - TD DataSchema objects don't distinguish whether extra properties are allowed or not.
    • The output schema is converted to a return type hint
    • The function body calls invoke_action.
    • ... is used to distinguish between explicitly set None values, and unspecified values (i.e. not included in the JSON). This allows the server to fill in default values, which is safer than trying to generate default values in the Python code.
  • ast is used to generate a syntax tree that's then turned into code. This has the significant advantage that:
    • the generated code should always be valid Python
    • it should be very robust against attack.

Schema conversion

Conversion from DataSchema to Python types is implemented, and can generate client types for everything in the current OFM simulation. Julian's more recent PR (#300) independently implemented very similar logic, which I take as a good sign.

Types are converted recursively, with support for:

  • integer -> int
  • number -> float
  • boolean -> bool
  • string -> str
  • anyOf -> Union
  • null -> None
  • array -> list (if an item type is specified, it's converted recursively) or tuple (if several item types are specified)
  • object -> dict[str, Any] or a generated pydantic.BaseModel subclass (if properties are specified)
  • Anything else -> Any

object schemas will be recursively converted to pydantic models. This can be turned off by disabling recursion, and more fine-grained control may be helpful.

Key differences from #300 are:

  • array schemas are converted to tuples if items is an array of Data Schema objects, as per JSONSchema 2019 (there's a link in the relevant comment in DataSchemaConverter.convert().
  • object schemas are converted to dynamically-generated pydantic.BaseModel subclasses, if they specify properties. If properties are not specified, object is still rendered as dict.
  • By default, this implementation is recursive, and will nest up to ten levels of type conversion. This may be increased or decreased by passing an argument to the constructor of DataSchemaConverter (though that's not yet exposed in e.g. ThingClient.from_url().

Still to do

Before this can be merged, we need to add a few things:

  • More testing, including all the data types
  • Actual use of the created class. This is done, because ThingClient.from_url now uses this module.
  • Better deduplication of models.
  • Consider how to expose (or not generate) models when returning only a client object, e.g. in ThingClient.from_url.
  • Recognition of Blob in the output (rather than autogenerating a model for it).
  • Expose controls for recursion and/or model generation in ThingClient.from_url so we can revert to something closer to the current client if that's desired.
  • It would be nice to check that the methods/properties of the generated client class match the Thing Description of the thing we're connecting to. This may be left for the future.

Future work

In a subsequent PR, it would be very good to:

  • Allow import of some input/output types, so we can re-use existing models rather than autogenerate copies.
  • Process values that come from the server, as a means of "reinflating" them to Python objects (particularly useful for np.array). This probably uses pydantic validation, probably with an option to skip it if needed.
  • Make it possible to wrap a Thing (analogous to DirectThingClient) to ease the transition between client-side and server-side code.

@VigneshVSV

VigneshVSV commented Dec 4, 2024

Copy link
Copy Markdown

I remember us talking about this. Now that I am somewhat finished with my obligations, I will have deeper look at the latest changes in labthings.

FYI, I already released pydantic based property types from this code base as part of my repository.

@rwb27 rwb27 added this to the v0.0.12 milestone Jul 31, 2025
@rwb27 rwb27 changed the title Draft: Code generation for ThingClient subclasses Code generation for ThingClient subclasses Jul 31, 2025
@rwb27 rwb27 marked this pull request as draft July 31, 2025 08:37
@rwb27 rwb27 modified the milestones: v0.0.12, v0.0.13 Nov 14, 2025
@rwb27

rwb27 commented Nov 14, 2025

Copy link
Copy Markdown
Collaborator Author

I'm going to bump this to 0.0.13 as it's not yet critical. I am keen to get in in though, as it will really improve the experience of using lab things from e.g. a python notebook.

@rwb27 rwb27 modified the milestones: v0.1.0, v0.1.1, v0.2.0 Mar 3, 2026
@rwb27 rwb27 modified the milestones: v0.2.0, v0.3.0 Apr 2, 2026
rwb27 added 2 commits June 4, 2026 12:45
This generates Python code for a module containing a client.
The Python code generated for the test thing executes successfully, though
the name of the class is not quite right yet.
Dataschemas of type Object that have defined properties are now
converted to Pydantic models.

This should allow much better autocompletion.

The inconsistency of using `dict[str, any]` as a fallback rankles, but I think it's the best option.
@rwb27 rwb27 force-pushed the client-code-generation branch from 2c68771 to bff6809 Compare June 4, 2026 16:20
This should be significantly more secure: I believe it should now be
impossible for a malcious Thing Description to create dangerous Python code.

In doing this, I've simplified the way the client creates properties - I need
to check if it type hints them correctly.

There are still a few things to fix:
* Default values probably want to be `...` rather than constants, so they're
  filled in server-side.
* This doesn't yet validate/convert return values.
* This isn't tested enough and probably doesn't actually work!
* We need configurability: as a minimum, recursion limits and fallback to
  `Any` so it fails gracefully if given a bad TD.
@rwb27 rwb27 force-pushed the client-code-generation branch from bff6809 to 4a46348 Compare June 9, 2026 14:11
rwb27 added 2 commits June 10, 2026 23:32
This commit adds:
* a limit on recursion
* a bunch of tidying up
* many more comments
* cleaner generated code
* generated docstrings for actions, properties, and Things.
* thing client properties now derive from BaseDescriptor - this required a few changes, but makes the generated code neater.
* BaseDescriptor may now be defined on non-Thing classes.
This moves the single file out of the folder, keeping the module name the same.

I've also provided the `type_params` argument on Python > 3.11.
@barecheck

barecheck Bot commented Jun 10, 2026

Copy link
Copy Markdown

Barecheck - Code coverage report

Total: 95.9%

Your code coverage diff: -1.07% ▾

Uncovered files and lines
FileLines
src/labthings_fastapi/base_descriptor.py181, 195-196
src/labthings_fastapi/client/__init__.py41, 77, 80-81, 237, 286-289, 403
src/labthings_fastapi/code_generation.py41-42, 138, 151, 182, 299, 303, 323-324, 331, 333, 368-369, 388-390, 463, 468-469, 484-485, 487, 598, 602, 697, 700, 725-726, 758-766, 769-770, 774, 779, 784-785

This compares unparsed strings rather than AST nodes, as the latter
don't properly show as equal.
`ast.compare` will help here, but not
until we drop a few Python versions.
@rwb27

rwb27 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

I think this is now sufficiently close to complete that it's better than the existing ThingClient implementation.

The unpinned tests are failing because of a new Ruff rule, this should be fixed separately (and I'm trying not to establish a precedent that upstream changes hold up merge requests if the pinned tests are passing).

The OFM test failure is, I believe, a bug in the OFM test that's uncovered by these changes, it should be a trivial fix in a feature branch.

I'm very aware this needs a lot more test code, and I'd be very grateful for advice from @bprobert97 on how to do that. I think what I really want is a way to randomly construct challenging type hints, but I think that would be a pig to implement, unless hypothesis is very much cleverer than I guessed.

For a first pass of review, I think I'm mostly hoping for feedback on:

  1. Is code generation using ast from the Thing Description safe? I think it's OK; unless there's a bug in ast.unparse there's no way to achieve code execution from any string passed in from the Thing Description.
  2. Is writing to a temporary file and loading it as a module a reasonable way to dynamically generate a subclass? It would also be possible to use eval, I don't think there's a huge change in safety but it would break the current implementation of properties, and potentially stop some documentation tools from picking up their docstrings (as this is commonly done by inspecting the source, which you can't do for modules created with eval).

I can't pretend I don't have an opinion here - I like this implementation. However, I'm also aware this sort of code generation needs a lot of care, and I do want to get it right.

@rwb27

rwb27 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

For the record, I've also run the generated code for the microscope client through pdoc and at first glance the docs look really pretty nice.

rwb27 added 2 commits June 11, 2026 10:46
One-line docstrings end up in both the `title` and `description` fields.
We now deduplicate this, so the generated docstrings match the
originals.
This executes `ruff` in a subprocess, which `ruff` flags as potentially insecure.
I've thought through this carefully and I'm happy we can ignore the
warning, as I don't see any opportunity
to inject malicious input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants