UTF8 character issues

I have some text and variable output that goes in to python’s logger and is written to a log file(as recomended everywhere).
For the coolnes I added some UTF8 chracters. Long story short in 2.80 everything runs like a charm. in 2.81 I keep getthing this error:

Traceback (most recent call last):
  File "E:\Backup\Blender\Builds\Main_Build\2.81\python\lib\logging\__init__.py", line 1028, in emit
    stream.write(msg + self.terminator)
  File "E:\Backup\Blender\Builds\Main_Build\2.81\python\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25e6' in position 218: character maps to <undefined>
Call stack:

I tried it on two computers,one laptop, one desktop. One has 3.5 python one has 3.7.4 python. Running the same code using local python(not the blender one) it prints as it should in pyCharm.
On the 3.5 python machine I was uising 2.80 and it was writing the file fine as I said…then I changed to 2.81 and it sarted giving me that error.

on the 3.7.4 machine I updated from 2.5 and updated the same time to 2.81 and started getting that error.So I thought it has something to do with python but I guess it’s not.

Is this a bug?

Did the version of the logging package change? What does
blender -b --python-expr "import logging; print(logging.__version__)"
output versus running the same expression from within Python?

The little script below works for me, both in python and blender 2.81:

paulm@cmstorm 11:51:/tmp$ cat l.py 
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)-15s %(levelname)8s %(name)s %(message)s')
log = logging.getLogger(__name__)
log.info('doh! \u25e6')

paulm@cmstorm 11:51:/tmp$ blender -b -P l.py 
Blender 2.80 (sub 75) (hash f6cb5f54494e built 2019-07-30 16:57:38)
Read prefs: /home/paulm/.config/blender/2.80/config/userpref.blend
2019-09-11 11:51:46,531     INFO __main__ doh! ◦

Blender quit

paulm@cmstorm 11:51:/tmp$ python l.py 
2019-09-11 11:54:02,852     INFO __main__ doh! ◦

well just tested it now. Versions both on my local enviornment is 0.5.1.2 , that same is for blender
Have you tried to actually write in to a file?
Because the UTF-8 works in blender…I mean I can print in the python console which works fine. I can print in the System Console…although it does not support UTF-8 and I just get those wierd shapes. But this appears when it’s writing in to the file. From the error it’s trying to encode the character…but I don’t know why. And why it cannot.

Something like this? Logging to file of that character seems to work here.

paulm@cmstorm 12:58:/tmp$ cat l2.py 
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)-15s %(levelname)8s %(name)s %(message)s')
fh = logging.FileHandler('doh.log')
fh.setLevel(logging.DEBUG)
log = logging.getLogger(__name__)
log.addHandler(fh)
log.info('doh! \u25e6')

paulm@cmstorm 12:58:/tmp$ ~/software/blender-git/blender -b -P l2.py 
Blender 2.81 (sub 8) (hash f2400c1bb5e2 built 2019-09-05 11:38:06)
Read prefs: /home/paulm/.config/blender/2.81/config/userpref.blend
found bundled python: /home/paulm/software/blender-git/2.81/python
2019-09-11 12:58:51,488     INFO __main__ doh! ◦

Blender quit

paulm@cmstorm 12:58:/tmp$ cat doh.log 
doh! ◦

paulm@cmstorm 12:58:/tmp$ ~/software/blender-git/blender -v
Blender 2.81 (sub 8)
	build date: 2019-09-05
	build time: 11:38:06
	build commit date: 2019-09-05
	build commit time: 08:51
	build hash: f2400c1bb5e2
	build platform: Linux
	build type: Release
	build c flags:  -Wall -Wcast-align -Werror=implicit-function-declaration -Werror=return-type -Werror=vla -Wstrict-prototypes -Wmissing-prototypes -Wno-char-subscripts -Wno-unknown-pragmas -Wpointer-arith -Wunused-parameter -Wwrite-strings -Wlogical-op -Wundef -Winit-self -Wnonnull -Wmissing-include-dirs -Wno-div-by-zero -Wtype-limits -Wformat-signedness -Wrestrict -Wuninitialized -Wredundant-decls -Wshadow -Wno-error=unused-but-set-variable -Wimplicit-fallthrough=5  -fuse-ld=gold -fopenmp -std=gnu11   -msse -pipe -fPIC -funsigned-char -fno-strict-aliasing -msse2
	build c++ flags:  -Wredundant-decls -Wall -Wno-invalid-offsetof -Wno-sign-compare -Wlogical-op -Winit-self -Wmissing-include-dirs -Wno-div-by-zero -Wtype-limits -Werror=return-type -Wno-char-subscripts -Wno-unknown-pragmas -Wpointer-arith -Wunused-parameter -Wwrite-strings -Wundef -Wformat-signedness -Wrestrict -Wuninitialized -Wundef -Wmissing-declarations -Wimplicit-fallthrough=5  -fuse-ld=gold -fopenmp -std=c++11   -msse -pipe -fPIC -funsigned-char -fno-strict-aliasing -msse2
	build link flags:  -Wl,--version-script='/data/c/blender-git/source/creator/blender.map'
	build system: CMake

The encoding happens because the unicode characters in Python are multi-byte values that need to get stored as a sequence of bytes each. Usually UTF8 is used to go from unicode to bytes. In this case it seems codepage 1251 (cp1251) is used instead of UTF8 which fails. Do you enforce a certain encoding explicitly?

hmmm no not really.I don’t think so.
here is my logger class that handles everything. I don’t see anything that could be related to specific encoding. Befure i had u’’ strings but I removed the “u” but still the same thing

import logging

class Log:
    def __init__(self,file):
        self.file = file
        self.logger = logging.getLogger(__file__)
        self.logger.setLevel(logging.INFO)
        self.file_handler = logging.FileHandler(self.file)
        self.formatter = logging.Formatter('%(levelname)s: %(name)s:\n\t%(message)s')
        self.file_handler.setFormatter(self.formatter)
        self.logger.addHandler(self.file_handler)
        self.handlers = []

    def setLevel(self,level=int):
        if level   == 0:
            self.formatter = logging.Formatter('%(levelname)s: %(name)s:\n%(message)s')
        elif level == 1:
            self.formatter = logging.Formatter('\tl1 %(message)s')
        elif level == 2:
            self.formatter = logging.Formatter('\t\tl2 %(message)s')
        elif level == 3:
            self.formatter = logging.Formatter('\t\t\tl3 %(message)s')
        elif level == 4:
            self.formatter = logging.Formatter('\t\t\tl4 %(message)s')

        self.file_handler.setFormatter(self.formatter)
        self.logger.addHandler(self.file_handler)

    def print(self,message):
        self.logger.info(message)

    def debug(self):
        self.logger.debug('debug')

    def clearFile(self):
         open(self.file,'r+').truncate(0)

    def addHandler(self,file):
        self.file_handler = logging.FileHandler(file)
        self.logger.addHandler(self.file_handler)
        self.handlers.append(self.file_handler)
        self.file = file

    def removeHandler(self):
        self.logger.removeHandler(self.file_handler)

    def removeAllHandlers(self):
        for handler in self.handlers:
            self.logger.removeHandler(handler)

it’s odd…I just copied your code and gave it my log file and I got the same error

it’s odd…I just copied your code and gave it my log file and I got the same error

In regular Python or Blender, or both?

Huh…it looks like both. Hmm…

So appears to be an issue at the Python level?

I see…so it’s some changes done in the latest version. I guess I have to go look that up.
I did looked it before…but the only thing I found was that 3.7.4 now uses UTF-8 as default rather than ASCII.

Ok found the issue.
The fileHandler in this case should be:

self.file_handler = logging.FileHandler(self.file, encoding='utf-8')

I’ve tried this before but did it only in the __init__() . And totally forgot that in the addHandler() I’m attaching a file handler and it is not set to encoding = 'utf-8'
Adding that there fixed the thing!

Makes sense, still it never asked me to do it when I was using previous python versions(not 3.7.4)