Python String Formatting

This post was triggered by a discussion between @ideasman42 and myself in D8072: API docs: intro overhaul.

To provide some context to this post: Python has three different approaches for string formatting:

  • printf-style percent formatting
    formatted = "ssh %s@%s" % (username, hostname)
  • str.format() calls:
    formatted = "ssh {0}@{1}".format(username, hostname)
  • f-strings:
    formatted = f"ssh {username}@{hostname}"

Part of the changes in D8072 is moving from the first to the second style of formatting.

The first two formatting options have the problem that the placeholder in the string and the value that will be placed there are disconnected from each other. If you want to do the string formatting in your head, you constantly have to go back & forth between the string and its arguments.

Iā€™m a big fan of the third formatting option, f-strings. Theyā€™re not brand-spanking new any more, they were introduced 3Ā½ years ago in Python 3.6 (current release is 3.8), so editors/IDEs have proper support for them nowadays.

Personally I donā€™t see the point of going through the effort of replacing the printf-style formatting with calls to str.format(). On one hand there is no hurry to do this; the printf-style is not deprecated by any means. On the other hand, if we want to modernise I think it makes way more sense to use f-strings.

So what do other developers think about this?

6 Likes

I agree with you @sybren.

1 Like

I agree about using f-strings if you are going to update the formatting style.
Personally, Iā€™ve used the %-style since forever, so it feels familiar.

1 Like

Iā€™ve historically been against f-strings in all the languages Iā€™ve worked in for the following 2 reasons:

  • Having expressions in the middle of strings is downright annoying when trying to get a sense of what the string is trying to say at a glance, especially if that triggers the string to need wrapping due to exceeding line-length limits. This is often exasperated by full on expressions instead of just variable names e.g. when wanting to write out a percentage and needing to use {my_percentage * 100} or similar in the middle of text.
  • A very minor few instances required repeating args and using e.g. {1} in several places in the string. For that people would go back to using .format and then you have the problem that thereā€™s 2 styles again.

I leave it up to you guys to decide if those are minor reasons or otherwise :slight_smile:

I feel the same way with the other formats, because I have to keep track of which parameter goes where, and keep going back & forth between the format string and the parameters.

I donā€™t see why that would require using .format(). You can just as well have {name} in several places, but then you have the added advantage of immediately seeing which variable is put there. Of course you donā€™t want to repeat function calls or other expressions, but in that case the solution is to put that into a well-named variable instead of the format string.

1 Like

Personally Iā€™ve switched to f string formatting syntax a year ago for all python related tasks I came across. Itā€™s way more readable, and many IDEs, like eclipse, provide nice color coding for the parameters as well. Havenā€™t had a single occasion where I had to use the other formatting types. Just go for it.

Having a single string of text look like a christmas tree is one other reason I donā€™t like it :slight_smile:

To be honestā€¦ I still format my strings by doing this:

print (someString + ' ' + str(aNumber) + ' ' + " is a string and a number.")

or even

print ("This is a string", string, "This is a number", number)

Iā€™m very ashamed of myself.

2 Likes

F-strings all the way.

I am a bit stymied as I have a plugin that works with 2.78 and that is python 3.5 and didnā€™t support it. I am planning to drop 2.78 support mainly for F-strings.

No issues with moving to str.format, f-strings have some pros and cons. For formatting a values already assigned to variables, it reads nicely.
As with the example f"ssh {username}@{hostname}"

The down-sides for Blenderā€™s Python code areā€¦

  • This doesnā€™t work so well when the contents of f-strings contains function calls, operators ā€¦ etc, take this example.

    return """<input type="checkbox" title="{:s}" {:s} {:s}>""".format(
        title,
        "checked" if value else "",
        ("onclick=\"{:s}\"".format(script)) if script else "",
    )
    

    Of course we can refactor code to read better, assign variables before using them in some cases, but this makes the project a bigger task.

    The case above is a little extreme (especially that it includes a format within a format), even in simpler cases, f-strings donā€™t necessarily read very well, we often use unpacking from vector types, which you donā€™t get with f-strings.


    I find this:

    file.write("{:4f}, {:4f}, {:4f}".format(*(matrix @ vertex.co)))
    

    More readable than this:

    vertex_co = matrix @ vertex.co
    file.write(f"{vertex_co[0]:4f}, {vertex_co[1]:4f}, {vertex_co[2]:4f}")
    

    This function in release/scripts/modules/rna_prop_ui.py, doesnā€™t convert easily.

    def rna_idprop_quote_path(prop):
        return "[\"%s\"]" % prop.replace("\"", "\\\"")
    

    ā€¦ I ran into the error: SyntaxError: f-string expression part cannot include a backslash.

    Again, assigning variables could work here, itā€™s just an example where f-strings canā€™t be used as drop-in replacements for existing code.


    Less extreme cases are always disputable, having already looked into moving to f-strings, there are enough cases in Blenderā€™s code that donā€™t convert cleanly, making them less useful as a general replacement for existing formatting.

  • Many editors still donā€™t properly syntax highlight code in f-strings (including Blenderā€™s).

  • Tooling around Python donā€™t fully support f-strings. Autopep8 for e.g. wont auto-format code inside f-strings.

  • For UI labels f-strings will extract code in the translation string, so translation strings will be more verbose & likely to change more often, although Iā€™d want to double check with @mont29 about this.

  • Moving between f-strings and str.format adds some overhead, we may start out using with formatting that works well as an f-string, later on more complex logic could be added into the existing f-string (which Iā€™d rather avoid), or the f-string needs to be converted into a str.format, which takes time and has potential for human error.

  • If there are cases where we do/donā€™t use f-strings, we need to document & communicate this, explain in patch review, request edits that add/remove it, explain that UI translations are an exception, discuss when it is/isnā€™t readable ā€¦ which takes time and is subjective.

You can use str.format with named arguments, if argument order makes the formatting less readable.

   "{foo:s} {bar:d} {baz:f}".format(
        foo=some(b, 'c'),
        bar=non(d, e),
        baz=trivial('code', ":)"),
    )

This avoids filling the local name-space with single use variables that can be used by accident or show up un-helpfully in auto-complete.


So moving to str.format seems fine.

As for f-strings, Iā€™d rather only use f-strings in cases where they use basic variables & attributes, where itā€™s unlikely we introduce more complex logic. Although having two ways of doing something thats subjective makes me less inclined to want to use them.

Hey, just a quick reminder: F-strings are fully forbidden for anything that needs to be translated (so all UI messages, labels, tooltips, reportsā€¦).

Donā€™t remember the exact details, but it boils down to the fact that they are evaluated as some kind of sub-language, before we can ever get the chance to substitute them with their translated variant.

Me too! I know itā€™s not ā€˜rightā€™ but itā€™s very easy to read.

1 Like

There are two issues Iā€™ve found since my previous post,

  • Bytes donā€™t support f-strings or str.format style formatting (pointed out by @mont29)
  • f-strings and str-format are about twice as slow as old-style % formatting.

I though the performance issues had been resolved since IIRC there were some release notes about this some time back (edit - found the issue), however Iā€™ve tested with Python 3.8 and 3.9b3 and there is a noticeable difference.
It may be the difference isnā€™t much in practice, weā€™d need to check, especially for exporters.

1 Like

This to me is a perfect example that we should actually move to f-strings. This code is IMO unacceptable regardless of formatting method used. Also, for constructing HTML there are way better template engines than just using string formatting. Using plain string formatting for HTML is notorious for introducing nasty bugs (not to mention security issues, depending on the context).

But I prefer this:

projected_co = camera_matrix @ object_location_world.co
file.write(f"{projected_co.x:4f}, {projected_co.y:4f}, {projected_co..z:4f}")

My point is that using extra variables gives you the opportunity to add more semantics to the code, making things easier to understand. It also separates transforming the data from the actual string formatting. I disagree with the argument ā€œI can more easily do multiple things in one big expressionā€. I think we should make it harder to do that, not easier.

When it comes to strings for translations, yes then f-strings are not suitable. In those cases I donā€™t have a strong preference. Printf-style formatting is faster, which is important for the UI performance. On the other hand, format() calls use the same formatting sub-language as f-strings, and having those unified also gives developers an advantage.

If your editor doesnā€™t support f-strings yet, I would suggest either not caring about it, or moving to a different editor. There are plenty of Python editors out there that do support f-strings, and the fact that there are also those that still lag behind (YEARS after the feature was introduced) doesnā€™t mean we should not move forward. By that analogy Blender should still be using Python 2.7.

I donā€™t know what you mean. Moving away from printf-style is what takes time and has potential for human error, but this is exactly what is happening in D8072. And IMO this is happening for no good reason. str.format() and f-strings use the same formating language, so moving back & forth between those should be relatively painless.

2 Likes

Yes, the code isnā€™t nice (not part of Blenderā€™s official code), itā€™s just showing that moving to f-strings requires opinionated refactoring.


While this can be OK, it forces you into defining single use variables, there are cases Iā€™d rather not do this since it can lead to accidental variable re-use which can can be tricky to track down.


This is increasing the scope of the proposal, Iā€™m not convinced making this harder would be a net gain for us.


This is a fairly big down-side in my opinion, I donā€™t want to have to get into discussions about which string formatting to useā€¦ just as with pep8 and clang-format itā€™s nice we donā€™t have to discuss entire aspects of code formatting.


Moving between any of these requires enough manual editing that human error could introduce issues.

My point is that even ignoring the possible mistakes in an initial formatting change, moving between str.format and f-strings isnā€™t error free.

In that case I would argue that the function containing the code is already too large and should be split up. I agree that this is a broader change than just talking about string formatting. However, we are talking about how string formatting would influence the broader scope of the code, so I think these are valid points to discuss here.

I agree. This IMO doesnā€™t stand in the way of saying ā€œuse f-strings unless the format strings needs translatingā€. A rule like that would also get rid of such discussions.

Iā€™m a big supporter of ā€˜in placeā€™ variable formatting in strings such as f-strings in Python.

It makes the string read like a sentence. No need to jump around back and forth to construct the string.

If youā€™ve got complex expressions in your string formatting, they should be moved to a local variable. This self-documents the intent of the expression and makes the string more readable.

I donā€™t think variable reuse should really be a major concern here. Properly named variables are very unlikely to collide, and if it does, it is a hint that either the scope containing those variables is doing too much or that the names are not descriptive enough - after all if they have the same name they should describe the same thing.

I feel like using f-strings by default and only in exceptional cases (whether it be translation, HTML generation, etc) using other string formatting utilities which better suit the use-case. For the vast majority of cases, f-strings lead to better readability.

While the arguments made here for f-strings have merit, there are enough down-sides that Iā€™d rather Blenderā€™s internal scripts stick to percentage style formatting.

  • Doing this means we need to use 2x kinds of string formatting (3x if we count the handful of cases f-strings & str.format arenā€™t supported), this is unavoidable and complicates review/maintenance as well as needing to be familier with quirks of all of them - rationalizing reasons for when each should be used.
  • Theyā€™re around twice as slow to evaluate, meaning we need to be conscious of when performance might be a factor, this isnā€™t always obvious - nor something developers will agree on.
  • Finally, the strongest arguments to use f-strings hardly apply to Blenderā€™s internal Python usage, as a 3D application, weā€™re very rarely doing anything besides very basic string manipulation, UI-labels and constructing reports, since these are cases where we often want to use translations, this leaves even fewer cases where f-strings would end up being used.
2 Likes

I agree. Printf-style formatting is also likely to be easier for C/C++ developers to follow.

While I agree with the technical evaluation of it taking twice as long to evaluate. To rule it out because of it is a bit too far. In 99% of the cases itā€™s still so fast as to be not noticeable and in the extreme corner cases where it may be an issue let those plugin be written that way, donā€™t hobble all code because of it.

I think the translation issue might be excuse enough though to pick the ā€°format.

Python is a dynamic language, I vaguely remember that .format() was meant to replace and deprecate ā€°. For some python coders, new ones, they might only know the new way.