Request for Library: Pydantic

As per the documented process:

Name of the library

A Python library: Pydantic

Purpose of the library

This library brings us:

  • Strongly-typed Python classes,
  • with serialization to & from JSON,
  • including validation of the JSON data.

which is useful for the code dealing with Blender extensions, as well as the Remote Asset Libraries project. Pydantic is very commonly used in the Python world, and so I also expect add-ons to start using this library once it’s bundled with Blender.

Expected benefits this library would bring to Blender

Currently the code dealing with extensions JSON data is scattered throughout the code base. Much of it is passed as dict[str, Any], which can be pretty much anything (could be the JSON data as a whole, or almost any subset of it).

Pydantic parses JSON directly to Python classes. Some advantages of that:

  • IDEs understand the type annotations and provide code completion.
  • Mypy can be used for static type checking.
  • It’s possible to find references to these classes and attribute access, whereas a dict can be anything.
  • The Pydantic JSON parser can optionally handle partial JSON. This could be useful for the Remote Asset Libraries project, making it possible to start populating the asset browser while the index file is still downloading.

Here is a code comparison:

Similar to what’s currently used

import json

# Somehow obtain JSON as string, for example via an
# HTTP request:
json_string = """
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse via Python's standard library:
index = json.loads(json_string)

# Use the data from the JSON:
print(f"Loaded extension index {index['name']!r}")
for extension in index["items"]:
    # Default values are handled when the data is used.
    descr = extension.get("descr", "default value")
    print(f"  - ID {extension['id']}: {descr!r}")

print(json.dumps(index))

# Prints:
# Loaded extension index 'Jaap'
#   - ID 3: 'number three'
#   - ID 5: 'default value'
# {"name": "Jaap", "items": [{"id": 3, "descr": "number three"}, {"id": 5}]}

With Pydantic

from pydantic import BaseModel, Field, ValidationError

# Declare the data model:
class Extension(BaseModel):
    id: int
    descr: str = Field("default value")

class ExtensionIndex(BaseModel):
    name: str = Field(description="Name of the repository")
    items: list[Extension]


# Somehow obtain JSON as bytes, for example via an
# HTTP request:
json_bytes = b"""
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse and validate the JSON, and convert to Python classes.
try:
    index = ExtensionIndex.model_validate_json(json_bytes)
except ValidationError as ex:
    raise SystemExit(f"Validation failed: {ex}")


# `index` is now an ExtensionIndex instance:
print(f"Loaded extension index {index.name!r}")
for extension in index.items:
    print(f"  - ID {extension.id}: {extension.descr!r}")
    
print(index.model_dump_json())

# Prints:
# Loaded extension index 'Jaap'
#   - ID 3: 'number three'
#   - ID 5: 'default value'
# {"name":"Jaap","items":[{"id":3,"descr":"number three"},{"id":5,"descr":"default value"}]}

Alternative Approach

An alternative could be to use dataclasses to create the Python classes. The dataclasses.asdict() can convert these classes to a dictionary, which in turn can be serialized to JSON.

The opposite, reliably converting JSON to dataclasses, is more cumbersome. Even though objects could be passed as dict to the constructor, this does not handle nested classes, so the code below will not work:

from dataclasses import dataclass, field
import json

# Declare the data model:
@dataclass
class Extension:
    id: int
    descr: str = field(default="default value")

@dataclass
class ExtensionIndex:
    name: str
    items: list[Extension]


# Somehow obtain JSON as string, for example via an
# HTTP request:
json_string = """
{"name": "Jaap", "items": [
    {"id": 3, "descr": "number three"},
    {"id": 5}
]}"""

# Parse the JSON:
as_dict = json.loads(json_string)

# Skipped here: validate the JSON before
# feeding it to the constructors.

# Although this does work for the innermost types...
extension0 = Extension(**as_dict["items"][0])
extension1 = Extension(**as_dict["items"][1])
print(extension0)
print(extension1)

# ... on the outer type it does NOT work, but does not
# raise any exception either. 
index = ExtensionIndex(**as_dict) print(index)
print(type(index.items[0]))

# Prints:
# Extension(id=3, descr='number three')
# Extension(id=5, descr='default value')
# ExtensionIndex(name='Jaap', items=[{'id': 3, 'descr': 'number three'}, {'id': 5}])
# <class 'dict'>    <- this should be <class 'Extension'>

Not only does the conversion not work recursively, this approach also doesn’t
gracefully handle unknown attributes, while Pydantic has specific support for
this (can be turned on/off per class).

Amount of integration work required

This is expected to be used in two places:

  1. Refactor of the existing Extensions handling code in scripts/addons_core/bl_pkg.
  2. New code for Remote Online Libraries.

The refactor is expected to take a week or so. The biggest hurdles will be:

  • Construct a more formal definition of the data model (see blender_extensions_openapi.yaml) for an initial attempt). Given that the JSON served by extensions.blender.org is now fairly well-filled, it shouldn’t be too hard to validate the specs against it.
  • Refactoring the existing code to use the new Pydantic model. Hopefully this is also relatively simple by now, by starting at the uses of item: dict[str, Any] and solving pylint/mypy errors until done.

Writing new code for the Remote Online Libraries system will of course also take time, but with the help of Pydantic it will be faster than without.

Pre-built binaries vs. self-compiled

Pydantic’s core is a binary package, made in Rust. Fortunately the binary packages available on pypi are compatible with Blender and the VFX Reference Platform.

  • Linux: the .so file targets glibc 2.14 (vfx reference: 2.28)
  • macOS: the .so file targets minos: 11.0 (vfx reference: 11.0)
  • Windows: Pydantic was built with MSVS 2022 (17.13), which is problematic if Pydantic uses a mutex (reason). Fortunately, the symptoms are clear (Blender crashes), and testing the above code on Windows showed no problems. This problem should resolve itself when Blender moves to a newer version of MSVS.

With the above, the pypi-provided binary packages seem usable for Blender. We could of course build Pydantic-core ourselves, but that would require adding the Rust compiler to the necessary tools.

Expected change in Blender on disk footprint and distribution

Measuring the size of blender-git/build_linux/bin/4.5:

Without Pydantic: 548 MB

Then install Pydantic:

$ ./bin/4.5/python/bin/python3.11 -m pip install pydantic --no-compile
... snipped ...
Using cached pydantic-2.10.6-py3-none-any.whl (431 kB)
Using cached pydantic_core-2.27.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Installing collected packages: typing-extensions, annotated-types, pydantic-core, pydantic
Successfully installed annotated-types-0.7.0 pydantic-2.10.6 pydantic-core-2.27.2 typing-extensions-4.12.2

Then the directory becomes: 566 MB

So once installed Pydantic adds 18 MB.

I added the --no-compile option, because without that, the installer will generate .pyc files for already-installed packages as well, making it appear that Pydantic takes more space than it does.

License the library is under

MIT License: pydantic/LICENSE at main · pydantic/pydantic · GitHub

Copyrights of the library

Copyright (c) 2017 to present Pydantic Services Inc. and individual contributors.

7 Likes

Thanks for the comprehensive explanaiton.

To me it seems to be the best alternative to go with Pydantic. Surely it is some megabytes added to the release, but I don’t think it would be practical to invest into re-inventing similar functionality with dataclasses.

Surely, maybe somewhere in the future it will digress from the VFX platform, but it will be solvable problem.

The only concern is that we currently don’t have set-on-stone timeline for switching Blender to MSVS 2022. It is something in the talks, and we are making progress towards it, but currently there is no guarantee Blender 4.5 will be using MSVS 2022. Also, it seems compelling to use LLVM and not MSVC for compilation, not sure if it has any affect on the issue you’ve mentioned,

For what it’s worth, this would also be very valuable for addon developers.

I’ve looked into using Pydantic in the past, but the added complexity of using external modules put me off, so this would be a welcome change for me.

1 Like

I second this!
My studio uses a mix of Python dataclasses and Pydantic (we really should go all in on Pydantic).

I like dataclasses, but every time I use them, I inevitably end up having to manually implement the json serialization for nested objects. I don’t have a ton of experience with Pydantic, but every time I hear about my coworkers using it, it seems to solve all of their pipeline problems.

I personally shy away from installing packages since it complicates delivering add-ons to the end user. But if Pydantic is included in Blender, I would definitely be excited to use it for my add-ons!