Request for Library: Pydantic

As per the documented process:

Name of the library

A Python library: Pydantic

Purpose of the library

This library brings us:

  • Strongly-typed Python classes,
  • with serialization to & from JSON,
  • including validation of the JSON data.

which is useful for the code dealing with Blender extensions, as well as the Remote Asset Libraries project. Pydantic is very commonly used in the Python world, and so I also expect add-ons to start using this library once it’s bundled with Blender.

Expected benefits this library would bring to Blender

Currently the code dealing with extensions JSON data is scattered throughout the code base. Much of it is passed as dict[str, Any], which can be pretty much anything (could be the JSON data as a whole, or almost any subset of it).

Pydantic parses JSON directly to Python classes. Some advantages of that:

  • IDEs understand the type annotations and provide code completion.
  • Mypy can be used for static type checking.
  • It’s possible to find references to these classes and attribute access, whereas a dict can be anything.
  • The Pydantic JSON parser can optionally handle partial JSON. This could be useful for the Remote Asset Libraries project, making it possible to start populating the asset browser while the index file is still downloading.

Here is a code comparison:

Similar to what’s currently used

import json

# Somehow obtain JSON as string, for example via an
# HTTP request:
json_string = """
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse via Python's standard library:
index = json.loads(json_string)

# Use the data from the JSON:
print(f"Loaded extension index {index['name']!r}")
for extension in index["items"]:
    # Default values are handled when the data is used.
    descr = extension.get("descr", "default value")
    print(f"  - ID {extension['id']}: {descr!r}")

print(json.dumps(index))

# Prints:
# Loaded extension index 'Jaap'
#   - ID 3: 'number three'
#   - ID 5: 'default value'
# {"name": "Jaap", "items": [{"id": 3, "descr": "number three"}, {"id": 5}]}

With Pydantic

from pydantic import BaseModel, Field, ValidationError

# Declare the data model:
class Extension(BaseModel):
    id: int
    descr: str = Field("default value")

class ExtensionIndex(BaseModel):
    name: str = Field(description="Name of the repository")
    items: list[Extension]


# Somehow obtain JSON as bytes, for example via an
# HTTP request:
json_bytes = b"""
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse and validate the JSON, and convert to Python classes.
try:
    index = ExtensionIndex.model_validate_json(json_bytes)
except ValidationError as ex:
    raise SystemExit(f"Validation failed: {ex}")


# `index` is now an ExtensionIndex instance:
print(f"Loaded extension index {index.name!r}")
for extension in index.items:
    print(f"  - ID {extension.id}: {extension.descr!r}")
    
print(index.model_dump_json())

# Prints:
# Loaded extension index 'Jaap'
#   - ID 3: 'number three'
#   - ID 5: 'default value'
# {"name":"Jaap","items":[{"id":3,"descr":"number three"},{"id":5,"descr":"default value"}]}

Alternative Approach

An alternative could be to use dataclasses to create the Python classes. The dataclasses.asdict() can convert these classes to a dictionary, which in turn can be serialized to JSON.

The opposite, reliably converting JSON to dataclasses, is more cumbersome. Even though objects could be passed as dict to the constructor, this does not handle nested classes, so the code below will not work:

from dataclasses import dataclass, field
import json

# Declare the data model:
@dataclass
class Extension:
    id: int
    descr: str = field(default="default value")

@dataclass
class ExtensionIndex:
    name: str
    items: list[Extension]


# Somehow obtain JSON as string, for example via an
# HTTP request:
json_string = """
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse the JSON:
as_dict = json.loads(json_string)

# Skipped here: validate the JSON before
# feeding it to the constructors.

# Although this does work for the innermost types...
extension0 = Extension(**as_dict["items"][0])
extension1 = Extension(**as_dict["items"][1])
print(extension0)
print(extension1)

# ... on the outer type it does NOT work, but does not
# raise any exception either. 
index = ExtensionIndex(**as_dict) print(index)
print(type(index.items[0]))

# Prints:
# Extension(id=3, descr='number three')
# Extension(id=5, descr='default value')
# ExtensionIndex(name='Jaap', items=[{'id': 3, 'descr': 'number three'}, {'id': 5}])
# <class 'dict'>    <- this should be <class 'Extension'>

Not only does the conversion not work recursively, this approach also doesn’t
gracefully handle unknown attributes, while Pydantic has specific support for
this (can be turned on/off per class).

Amount of integration work required

This is expected to be used in two places:

  1. Refactor of the existing Extensions handling code in scripts/addons_core/bl_pkg.
  2. New code for Remote Online Libraries.

The refactor is expected to take a week or so. The biggest hurdles will be:

  • Construct a more formal definition of the data model (see blender_extensions_openapi.yaml) for an initial attempt). Given that the JSON served by extensions.blender.org is now fairly well-filled, it shouldn’t be too hard to validate the specs against it.
  • Refactoring the existing code to use the new Pydantic model. Hopefully this is also relatively simple by now, by starting at the uses of item: dict[str, Any] and solving pylint/mypy errors until done.

Writing new code for the Remote Online Libraries system will of course also take time, but with the help of Pydantic it will be faster than without.

Pre-built binaries vs. self-compiled

Pydantic’s core is a binary package, made in Rust. Fortunately the binary packages available on pypi are compatible with Blender and the VFX Reference Platform.

  • Linux: the .so file targets glibc 2.14 (vfx reference: 2.28)
  • macOS: the .so file targets minos: 11.0 (vfx reference: 11.0)
  • Windows: Pydantic was built with MSVS 2022 (17.13), which is problematic if Pydantic uses a mutex (reason). Fortunately, the symptoms are clear (Blender crashes), and testing the above code on Windows showed no problems. This problem should resolve itself when Blender moves to a newer version of MSVS.

With the above, the pypi-provided binary packages seem usable for Blender. We could of course build Pydantic-core ourselves, but that would require adding the Rust compiler to the necessary tools.

Expected change in Blender on disk footprint and distribution

Measuring the size of blender-git/build_linux/bin/4.5:

Without Pydantic: 548 MB

Then install Pydantic:

$ ./bin/4.5/python/bin/python3.11 -m pip install pydantic --no-compile
... snipped ...
Using cached pydantic-2.10.6-py3-none-any.whl (431 kB)
Using cached pydantic_core-2.27.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Installing collected packages: typing-extensions, annotated-types, pydantic-core, pydantic
Successfully installed annotated-types-0.7.0 pydantic-2.10.6 pydantic-core-2.27.2 typing-extensions-4.12.2

Then the directory becomes: 566 MB

So once installed Pydantic adds 18 MB.

I added the --no-compile option, because without that, the installer will generate .pyc files for already-installed packages as well, making it appear that Pydantic takes more space than it does.

License the library is under

MIT License: pydantic/LICENSE at main · pydantic/pydantic · GitHub

Copyrights of the library

Copyright (c) 2017 to present Pydantic Services Inc. and individual contributors.

8 Likes

Thanks for the comprehensive explanation.

To me it seems to be the best alternative to go with Pydantic. Surely it is some megabytes added to the release, but I don’t think it would be practical to invest into re-inventing similar functionality with dataclasses.

Surely, maybe somewhere in the future it will digress from the VFX platform, but it will be solvable problem.

The only concern is that we currently don’t have set-on-stone timeline for switching Blender to MSVS 2022. It is something in the talks, and we are making progress towards it, but currently there is no guarantee Blender 4.5 will be using MSVS 2022. Also, it seems compelling to use LLVM and not MSVC for compilation, not sure if it has any affect on the issue you’ve mentioned,

1 Like

For what it’s worth, this would also be very valuable for addon developers.

I’ve looked into using Pydantic in the past, but the added complexity of using external modules put me off, so this would be a welcome change for me.

3 Likes

I second this!
My studio uses a mix of Python dataclasses and Pydantic (we really should go all in on Pydantic).

I like dataclasses, but every time I use them, I inevitably end up having to manually implement the json serialization for nested objects. I don’t have a ton of experience with Pydantic, but every time I hear about my coworkers using it, it seems to solve all of their pipeline problems.

I personally shy away from installing packages since it complicates delivering add-ons to the end user. But if Pydantic is included in Blender, I would definitely be excited to use it for my add-ons!

2 Likes

Since the Rust compiler is LLVM-based, it might not even be affected by this mutex muckyness. I’m far from an expert in this, though. All I can do is run tests on my Windows machine at home, and see if I can trigger this crash somehow. Maybe @Cessen (lots of Rust experience) or @LazyDodo (lots of Windows experience) can help shed some light on this?

I don’t know enough about rust to answer that, sorry, my uneducated guess is given the mutex thing is in the c++ standard library it likely isn’t gonna matter for rust, but like i said, that’s a guess and an uneducated one at that.

My impression was that Rust implements its own synchronization primitives. But just to be sure, I took a look at the code in Rust’s standard library.

Long story short: Rust’s standard library does more-or-less implement its own primitives, but as far as I can tell does rely on Win32 APIs such as WaitOnAddress and Slim Reader/Writer (SRW) Locks. I’m not familiar enough with the Windows/Microsoft ecosystem to know if those have anything to do with MSVS, but my guess is no…?

Anyway, here are the details:

Rust implements its own Mutex type, which is in turn built on top of Futex. The Rust Futex code for Windows (other than Windows 7) can be found here:

The actual futex types appear to be built from atomic primitives (so no connection to MSVS). There are some function calls (and corresponding error return types) that appear be from Win32, however, such as WaitOnAddress.

Windows 7 has its own Mutex implementation in Rust for some reason:

This uses Slim Reader/Writer (SRW) Locks, which I’m not familiar with, but it also appear to be a Win32 thing.

So basically, Rust uses Win32 APIs on Windows. As long as those have nothing to do with MSVS versions (which I imagine is the case?), then the MSVS version shouldn’t matter.

As far as I’m aware, the only relationship that Rust has to MSVS is that it can (and maybe does by default?) use its linker for final linking on Windows. Actual building of Rust code is always done via LLVM.

2 Likes

Thank you for your insight Nathan!

And with that it seems that the last hurdle to bundle the precompiled Pydantic has evaporated :partying_face:

4 Likes

@sybren Hey, the Pydantic lib was green lighted by the admins.

Whether we can / want to use pip or rather self compile is a topic that I will bring up in the Platforms & Builds module.

Thanks @ThomasDinges , that’s great news :partying_face:

1 Like