String pattern matching language in Blender and Geometry Nodes

The goal of this thread is to figure out if we want some string pattern matching language in Blender and if yes, which one and which implementation we should use.

I’m currently mainly thinking about the use case of specifying sets of named attributes in Geometry Nodes. For example, one may want to remove or propagate a subset of named attributes. In many cases, using some specialized syntax is the most convenient way to achieve this. There are possible alternatives (like processing string lists in geometry nodes), but those are typically much less straight forward and more cumbersome to use.

We do use fnmatch in a few places for filtering like in the outliner but I’m opening up this thread anyway because:

  • fnmatch is seems to be specialized for unix filename pattern matching, which is not really my use case.
  • It is more limited than regular expressions and users would likely ask for more features if we used fnmatch in geometry nodes without an obvious way to provide these features.
  • The decision for the use in geometry nodes is more important than the use in some temporary filter. That’s because geometry nodes requires backward compatibility, so we can’t change the decision easily later on (without providing legacy options). Users may build node trees which build these string patterns dynamically, and that’s not versionable in general.

As mentioned, we could also decide not to use any such language in geometry nodes and provide some nodes/options for special cases (like an enum dropdown offering Starts With among other options in all relevant nodes). For the remainder of this post I’ll assume that we do want some pattern matching language.


There are a couple of possible languages we could use:

  • fnmatch: Is used in Blender already, but is somewhat limited and was designed for file paths. The syntax seems to be quite straight forward.
  • Regular expressions: Very powerful, but slightly less obvious syntax than fnmatch for simple cases (e.g. .* instead of just * for something that matches everything). There is also the problem that there are multiple dialects of regular expressions and we’d have to pick one and that decision has to be documented. C++ comes with a regular expresions system that also supports different syntaxes.
  • Some simple custom language: We would have to design and implement it ourselves. It might be compatible with other software, but problably not many.

I currently lean towards using regular expressions using the modified ECMAScript regular expression grammar that’s used by the C++ standard library by default. The main reason for that is that those are very similar to the regular expressions used in Javascript which probably is the most widely used syntax. Furthermore, we have easy access to an implementation. If we find that the standard library implementations aren’t good enough, we can likely find third party implementations that work as well.

Feedback is welcome!

11 Likes

I don’t really have an opinion but just wanted to point out that there’s already a user facing regex implementation in the batch rename (ctrl+f2) popup (the asterisk option in find/replace)

2 Likes

I might be misunderstanding exactly what you are asking for, but I think that any time we are matching a string we need to also include the method we are using to do so, whether “contains”, “starts with”, “ends with”, or “regex”. I don’t think we can avoid asking that and passing that around by picking one method.

1 Like

I strongly support some standard regexp language as the language for your usecase. While artist users might find the full syntax and possibilities a bit arcane, I’m sure tutorials would quickly show up showing how to do the common cases. Learning to do ".*" instead of "*" seems not to hard to get used to, I think, and covers at least 90% of what anyone would want to do.

If the modified ECMAScript regular expression grammar is already will implemented in C++ libraries we are already using, that seems a strong reason to just go with that.

I will put in a plug for a library that a friend of mine originated, which is in widespread use inside Googe: re2 (Syntax defined here). While not important for your use case, it has a really fast and great implementation that might prove useful in other parts of Blender if there is ever a need to run complicated expressions over large amounts of text, fast.

2 Likes

Personally I think something like fnmatch is about the right level for Blender. Full regexp seems too arcane to me. Where it gets tricky is not just .* but also escaping characters. In a specific regexp node it would be ok, but for pattern matches throughout Blender it seems too much?

Looking at other 3D apps it seems mixed, Houdini and MaterialX seem to have something similar to fnmatch. Some nodes in Maya and Katana use full regexp, though they are specifically nodes for matching and renaming. Renderman seems to have some posix regexp filtering syntax. In USD I couldn’t find anything user facing.

I also don’t know what the UI for this would look like, which also affects how much compatibility matters and if even supporting both can be considered. For example:

  • Regular string socket or property that you can type straight regex in
  • Regular string socket or property where you can type in a prefix like regex:
  • New regex string socket or property
  • Some button next to a text field that changes the behavior
2 Likes

If we are going from something more advanced than fnmatch regexp, then I think that it would be logical to stick to the syntax used in the python regexp module: re — Regular expression operations — Python 3.11.1 documentation
(This is the “perl” syntax)
This way users can easily leverage the same regexp patterns in their python scripts and vise versa.

For me personally, I think perhaps being able to choose between the fnmatch or re syntax would be nice. I’m usually happy with the simplicity of fnmatch, but sometimes I have had to bring out the big guns with more advanced patterns. Having something like re available in the UI would be a life saver in those cases, so I think it should at least be an option.

Per default perhaps the fnmatch regexp engine is used, but users that want more can simply just switch it over into re mode.

4 Likes

I don’t know if this would be interesting or influence anyone’s opinions, but I just thought I’d say it anyway. Chat GPT is REALLY good at generating big Regex expressions. I’d imagine, in the near future, the average user will just use similar AI to generate Regex. Basically, if the user knows how to describe what they’re looking for - Chat GPT can translate in seconds.

1 Like

If Blender uses RE ( modified ECMAScript regular expression grammar ),
would it mean that the end user has to be familiar with RE?

Python does have fnmatch.translate
It turns a wildcard expression into a RE. One could use wildcards without knowing that matching is done with RE behind.

If it is nescessary to have something more artist friendly, like wildcards or others (‘starts with’, ‘ends with’, …), it might be done on top of RE. A translator could turn something more artist friendly to RE. Matching is still done with RE.

Would it be reasonable to support both globbing and regex?

Personally I like it when a search has an option for regular expressions so it’s available for more advanced pattern matching when I need it. As Brecht mentions, character escaping can be a hassle (even when you’re familiar with regex). The exact behavior with brackets isn’t always obvious and differs between implementations.

Regex can be overkill, sometimes you copy part of a string and want to paste it and use it as-is (match a fragment of a literal string). If some of those characters need to be escaped it can lead to wasting time trying to escape different characters or searching online for a list of escape characters. While globbing isn’t immune to this problem - it’s so much simpler it tends to be less of an issue.

5 Likes

Just thought to mention ⚙ D12721 Geometry Nodes: "Replace in String" node This patch used to include regex before it was removed in the final Diffs

1 Like

Thanks for the feedback so far.

Generally, I think it would be nice to just have to differentiate between two matching functions. Mainly because it simplifies the UI. It allows switching between both modes with a single click and there can be a standardized icon for it, like what is used in the ctrl+F2 rename menu. The two modes would then be exact match, and some more sophisticated matching language. For my use case (removing named attributes) the default should always be exact matching. In that mode one also does not have to care about escaping anything.

image

Having only two matching mechanisms also allows using a boolean socket to switch between both. With more than two modes, it would be better to use enum sockets. Those are planned but don’t have very high priority currently. It should be possible to version from a boolean to an enum socket though.

image

I might be misunderstanding exactly what you are asking for, but I think that any time we are matching a string we need to also include the method we are using to do so, whether “contains”, “starts with”, “ends with”, or “regex”. I don’t think we can avoid asking that and passing that around by picking one method.

The thing is, regex and also fnmatch implicitly support “contain”, “starts with” and “ends with”. So by just picking one of those, we can avoid a dropdown to choose between these more specialized modes.

Using the regex syntax that Python uses sounds reasonable as well. I’m not sure that syntax is supported by the C++ standard library though.

The issue with escaping could potentially worked around by adding a “paste escaped” operator to the context menu of text boxes. This would just do all the escaping automatically. The user would still have to be aware that this is necessary though. In vscode for example, when I activate regex search, then select some code, and then hit ctrl+F, the selected code segment is automatically escaped and put into the search box.

7 Likes

I have yet to see a single non-programmer Blender user who knows how to utilize regexp. Even experienced programmers whom don’t use it on daily basis have usually resort to Google search after more than couple of weeks of not using it.

AFAIK there are many Blender users who don’t find the Batch Rename operator usable simply because it uses Regexp. It seems like utility for developers, rather than operator meant for the end users, because it’s unable to do almost any daily bread and butter type of batch renaming without use of regexp, which is so difficult to learn pretty much everyone gives up.

4 Likes

Makes sense. My initial concern (or confusion) was just that you were proposing to settle on only a single matching function.

1 Like

That’s why I couldn’t make it do almost anything beyond the simplest task ?! I’m supposed to use regular expressions ? I admit I painstakingly learned the syntax years ago and then promptly forgot everything about them.

This is closer to what I expect of a user-friendly renaming utility :

This is a (one of many) freeware called bulk rename utility on Windows

So if in the future geonodes allow to handle multiple attributes at once and I have to find-and-replace or strip numbers or something to that effect on hundreds of vertex groups (thinking of character rigging), I hope there are dedicated nodes (the most common string methods found in Python or something) that I can link together in the order I want, in addition to regular expressions.

Or I can learn regular expressions… :neutral_face:

4 Likes

Python does have fnmatch.translate, it converts a globbing (wildcard) expression to RE. You use globbing (wildcards), but RE matching does the job.

I like much the idea to have something which translates into RE. So, people could use a language more suitable for artists. And a translator turns it into RE. I mean, supporting something like globbing (via translation) on top of RE.

1 Like

There are multiple regex generator website out there, and the regex Stackexchange is full of recipes, some programs I used, use a legend portion of the interface with the basics building blocks. Perhaps that can be done here, an expandable subregion with examples.
if they can be clicked and typed automatically in the input box, that would be great…

.* select all
\d a number
\w a letter
() selection group
^ line start

For batch-rename regex is disabled by default (because it’s intended for more technical users - not a requirement for renaming) what did you need to do with batch-rename that would require using regex?


NOTE: this might be best split off into a separate thread.

2 Likes
  • Some simple custom language: We would have to design and implement it ourselves. It might be compatible with other software, but problably not many.

This. The problem with regex is that they are cool (after googling) but almost never enough… simple language would cover more needs, and regex can be a part of syntax easily.

I also dreaming of “string to int” and “string to float” methods to be avaliable in Geometry nodes. Should reduce setups with many endless-rarely-needed parameters to simple “copy-paste string” step and save kilometres of scroll time in modifier panel :slight_smile:

1 Like

I’d think the idea of a self-designed ‘simple language’ would be to be simpler than regex. No a superset of regex.

If a regex is not enough for string matching nothing will be.

1 Like

My understanding of simple language was a generic domain specific language for function nodes but I may be off