Code generation for ThingClient subclasses#89
Conversation
|
I remember us talking about this. Now that I am somewhat finished with my obligations, I will have deeper look at the latest changes in labthings. FYI, I already released pydantic based property types from this code base as part of my repository. |
|
I'm going to bump this to 0.0.13 as it's not yet critical. I am keen to get in in though, as it will really improve the experience of using lab things from e.g. a python notebook. |
This generates Python code for a module containing a client. The Python code generated for the test thing executes successfully, though the name of the class is not quite right yet.
Dataschemas of type Object that have defined properties are now converted to Pydantic models. This should allow much better autocompletion. The inconsistency of using `dict[str, any]` as a fallback rankles, but I think it's the best option.
2c68771 to
bff6809
Compare
This should be significantly more secure: I believe it should now be impossible for a malcious Thing Description to create dangerous Python code. In doing this, I've simplified the way the client creates properties - I need to check if it type hints them correctly. There are still a few things to fix: * Default values probably want to be `...` rather than constants, so they're filled in server-side. * This doesn't yet validate/convert return values. * This isn't tested enough and probably doesn't actually work! * We need configurability: as a minimum, recursion limits and fallback to `Any` so it fails gracefully if given a bad TD.
bff6809 to
4a46348
Compare
This commit adds: * a limit on recursion * a bunch of tidying up * many more comments * cleaner generated code * generated docstrings for actions, properties, and Things. * thing client properties now derive from BaseDescriptor - this required a few changes, but makes the generated code neater. * BaseDescriptor may now be defined on non-Thing classes.
This moves the single file out of the folder, keeping the module name the same. I've also provided the `type_params` argument on Python > 3.11.
Barecheck - Code coverage reportTotal: 95.9%Your code coverage diff: -1.07% ▾ Uncovered files and lines
|
This compares unparsed strings rather than AST nodes, as the latter don't properly show as equal. `ast.compare` will help here, but not until we drop a few Python versions.
|
I think this is now sufficiently close to complete that it's better than the existing The unpinned tests are failing because of a new Ruff rule, this should be fixed separately (and I'm trying not to establish a precedent that upstream changes hold up merge requests if the pinned tests are passing). The OFM test failure is, I believe, a bug in the OFM test that's uncovered by these changes, it should be a trivial fix in a feature branch. I'm very aware this needs a lot more test code, and I'd be very grateful for advice from @bprobert97 on how to do that. I think what I really want is a way to randomly construct challenging type hints, but I think that would be a pig to implement, unless hypothesis is very much cleverer than I guessed. For a first pass of review, I think I'm mostly hoping for feedback on:
I can't pretend I don't have an opinion here - I like this implementation. However, I'm also aware this sort of code generation needs a lot of care, and I do want to get it right. |
|
For the record, I've also run the generated code for the microscope client through |
One-line docstrings end up in both the `title` and `description` fields. We now deduplicate this, so the generated docstrings match the originals.
This executes `ruff` in a subprocess, which `ruff` flags as potentially insecure. I've thought through this carefully and I'm happy we can ignore the warning, as I don't see any opportunity to inject malicious input.
Currently,
ThingClientsubclasses are generated on-the-fly from a Thing Description. This "gets the job done" and means it's always possible to have a client that's up to date with the server. However, it would be nice to support static analysis, e.g. type checking, autocompletion, etc. with importable client classes for specific instruments.Functionality
This PR implements client code generation, specifically:
ThingClientand named as per thetitleof the Thing Description.get_property,set_propertyas provided byThingClientDataSchemaobjects don't distinguish whether extra properties are allowed or not.invoke_action....is used to distinguish between explicitly setNonevalues, and unspecified values (i.e. not included in the JSON). This allows the server to fill in default values, which is safer than trying to generate default values in the Python code.astis used to generate a syntax tree that's then turned into code. This has the significant advantage that:Schema conversion
Conversion from
DataSchemato Python types is implemented, and can generate client types for everything in the current OFM simulation. Julian's more recent PR (#300) independently implemented very similar logic, which I take as a good sign.Types are converted recursively, with support for:
integer->intnumber->floatboolean->boolstring->stranyOf->Unionnull->Nonearray->list(if an item type is specified, it's converted recursively) ortuple(if several item types are specified)object->dict[str, Any]or a generatedpydantic.BaseModelsubclass (ifpropertiesare specified)Anyobjectschemas will be recursively converted topydanticmodels. This can be turned off by disabling recursion, and more fine-grained control may be helpful.Key differences from #300 are:
arrayschemas are converted to tuples ifitemsis an array of Data Schema objects, as per JSONSchema 2019 (there's a link in the relevant comment inDataSchemaConverter.convert().objectschemas are converted to dynamically-generatedpydantic.BaseModelsubclasses, if they specify properties. If properties are not specified,objectis still rendered asdict.DataSchemaConverter(though that's not yet exposed in e.g.ThingClient.from_url().Still to do
Before this can be merged, we need to add a few things:
ThingClient.from_urlnow uses this module.ThingClient.from_url.Blobin the output (rather than autogenerating a model for it).ThingClient.from_urlso we can revert to something closer to the current client if that's desired.Future work
In a subsequent PR, it would be very good to:
np.array). This probably usespydanticvalidation, probably with an option to skip it if needed.Thing(analogous toDirectThingClient) to ease the transition between client-side and server-side code.