subprocess stdin encoding

Discussion:

Jean-Paul Calderone

2007-02-05 14:06:22 UTC

[snip]
in site.py . and change if 0: to if 1: to enable string encoding.
Now, you can execute python interpreter with LC_CTYPE='UTF-8'.

While this is sort of a correct answer to the question asked, it
isn't really a correct answer overall. I hope no one actually
goes off and does this. Doing so will result in completely
unportable code with very difficult to track down bugs.

Instead, use the str and unicode methods "encode" and "decode".

Jean-Paul

Jean-Paul Calderone

2007-02-05 14:07:25 UTC

Permalink

I have a encoding problem during using of subprocess. The input is a
string with UTF-8 encoding.
tokenize =
subprocess.Popen(tok_command,stdin=subprocess.PIPE,stdout=subprocess.PIPE,close_fds=True,shell=True)
(tokenized_text,errs) = tokenize.communicate(t)
File "/usr/local/python/lib/python2.5/subprocess.py", line 651, in
communicate
return self._communicate(input)
File "/usr/local/python/lib/python2.5/subprocess.py", line 1115, in
_communicate
bytes_written = os.write(self.stdin.fileno(), input[:512])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
position 204: ordinal not in range(128)
How I change the default encoding from "ascii" to "utf-8"?

You don't need to change the default encoding. You just need to encode
the unicode string you are sending to the child process. Try:

tokenized_text, errs = tokenize.communicate(t.encode('utf-8'))

Jean-Paul

2007-02-05 07:10:29 UTC

Permalink

I have a encoding problem during using of subprocess. The input is a
string with UTF-8 encoding.

the code is:

tokenize =
subprocess.Popen(tok_command,stdin=subprocess.PIPE,stdout=subprocess.PIPE,close_fds=True,shell=True)

(tokenized_text,errs) = tokenize.communicate(t)

the error is:
File "/usr/local/python/lib/python2.5/subprocess.py", line 651, in
communicate
return self._communicate(input)
File "/usr/local/python/lib/python2.5/subprocess.py", line 1115, in
_communicate
bytes_written = os.write(self.stdin.fileno(), input[:512])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
position 204: ordinal not in range(128)

How I change the default encoding from "ascii" to "utf-8"?

Ying Chen

Thinker

2007-02-05 12:54:48 UTC

Permalink

I have a encoding problem during using of subprocess. The input is
a string with UTF-8 encoding.
tokenize =
subprocess.Popen(tok_command,stdin=subprocess.PIPE,stdout=subprocess.PIPE,close_fds=True,shell=True)
(tokenized_text,errs) = tokenize.communicate(t)
the error is: File "/usr/local/python/lib/python2.5/subprocess.py",
line 651, in communicate return self._communicate(input) File
"/usr/local/python/lib/python2.5/subprocess.py", line 1115, in
_communicate bytes_written = os.write(self.stdin.fileno(),
input[:512]) UnicodeEncodeError: 'ascii' codec can't encode
character u'\xa9' in position 204: ordinal not in range(128)
How I change the default encoding from "ascii" to "utf-8"?
Ying Chen

find code like

def setencoding():
"""Set the string encoding used by the Unicode implementation. The
default is 'ascii', but if you're willing to experiment, you can
change this."""
encoding = "ascii" # Default value set by _PyUnicode_Init()
if 0:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]

in site.py . and change if 0: to if 1: to enable string encoding.
Now, you can execute python interpreter with LC_CTYPE='UTF-8'.

- --
Thinker Li - thinker at branda.to thinker.li at gmail.com
http://heaven.branda.to/~thinker/GinGin_CGI.py