ncurses getch & unicode (was: decoding keyboard input when using curses)

Discussion:

Iñigo Serna

2009-08-20 16:36:05 UTC

Hello,

I have the same problem mentioned in
http://groups.google.com/group/comp.lang.python/browse_thread/thread/c70c80cd9bc7bac6?pli=1some
months ago.

Python 2.6 program which uses ncurses module in a terminal configured to use
UTF-8 encoding.

When trying to get input from keyboard, a non-ascii character (like ?) is
returned as 2 integers < 255, needing 2 calls to getch method to get both.
These two integers \xc3 \xa7 forms the utf-8 encoded representation of ?
character.

ncurses get_wch documentation states the function should return an unique
integer > 255 with the ordinal representation of that unicode char encoded
in UTF-8, \xc3a7.

[Please, read the link above, it explains the issue much better that what I
could do.]

Any idea or update on this?

Thanks,
I?igo Serna

PS: my system is a Linux Fedora 11 x86_64. Same happens on console,
gnome-terminal or xterm.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090820/8cba304a/attachment.html>

Iñigo Serna

2009-08-20 22:12:07 UTC

Permalink

Hi again,

2009/8/20 I?igo Serna <inigoserna at gmail.com>

I have the same problem mentioned in http://groups.google.com/group/comp.lang.python/browse_thread/thread/c70c80cd9bc7bac6?pli=1 some months ago.
Python 2.6 program which uses ncurses module in a terminal configured to use UTF-8 encoding.
When trying to get input from keyboard, a non-ascii character (like ?) is returned as 2 integers < 255, needing 2 calls to getch method to get both.
These two integers \xc3 \xa7 forms the utf-8 encoded representation of ? character.
ncurses get_wch documentation states the function should return an unique integer > 255 with the ordinal representation of that unicode char encoded in UTF-8, \xc3a7.

Answering myself, I've copied at the bottom of this email a working
solution, but the question still remains: why win.getch() doesn't
return the correct value?

Kind regards,
I?igo Serna

######################################################################
# test.py
import curses

import locale
locale.setlocale(locale.LC_ALL, '')
print locale.getpreferredencoding()

def get_char(win):
??? def get_check_next_byte():
??????? c = win.getch()
??????? if 128 <= c <= 191:
??????????? return c
??????? else:
??????????? raise UnicodeError

??? bytes = []
??? c = win.getch()
??? if c <= 127:
??????? # 1 bytes
??????? bytes.append(c)
??? elif 194 <= c <= 223:
??????? # 2 bytes
??????? bytes.append(c)
??????? bytes.append(get_check_next_byte())
??? elif 224 <= c <= 239:
??????? # 3 bytes
??????? bytes.append(c)
??????? bytes.append(get_check_next_byte())
??????? bytes.append(get_check_next_byte())
??? elif 240 <= c <= 244:
??????? # 4 bytes
??????? bytes.append(c)
??????? bytes.append(get_check_next_byte())
??????? bytes.append(get_check_next_byte())
??????? bytes.append(get_check_next_byte())
??? buf = ''.join([chr(b) for b in bytes])
??? buf = buf.decode('utf-8')
??? return buf

def getcodes(win):
??? codes = []
??? while True:
??????? try:
??????????? ch = get_char(win)
??????? except KeyboardInterrupt:
??????????? return codes
??????? codes.append(ch)

lst = curses.wrapper(getcodes)
print lst
for c in lst:
??? print c.encode('utf-8'),
print
######################################################################

Thomas Dickey

2009-08-21 08:47:42 UTC

Permalink

Post by IÃ±igo Serna
Hi again,
2009/8/20 I?igo Serna <inigose... at gmail.com>

I have the same problem mentioned inhttp://groups.google.com/group/comp.lang.python/browse_thread/thread/...some months ago.
Python 2.6 program which usesncursesmodule in a terminal configured to use UTF-8 encoding.
When trying to get input from keyboard, a non-ascii character (like ?) is returned as 2 integers < 255, needing 2 calls to getch method to get both.
These two integers \xc3 \xa7 forms the utf-8 encoded representation of ? character.
ncursesget_wch documentation states the function should return an unique integer > 255 with the ordinal representation of that unicode char encoded in UTF-8, \xc3a7.

Answering myself, I've copied at the bottom of this email a working
solution, but the question still remains: why win.getch() doesn't
return the correct value?

The code looks consistent with the curses functions...

Post by IÃ±igo Serna
Kind regards,
I?igo Serna
######################################################################
# test.py
import curses
import locale
locale.setlocale(locale.LC_ALL, '')
print locale.getpreferredencoding()
??????? c = win.getch()

You're using "getch", not "get_wch" (Python's ncurses binding may/may
not have the latter).
curses getch returns 8-bit values, get_wch would return wider values.

Iñigo Serna

2009-08-21 11:58:38 UTC

Permalink

Post by Thomas Dickey

Post by IÃ±igo Serna
??????? c = win.getch()

You're using "getch", not "get_wch" (Python's ncurses binding may/may
not have the latter).
curses getch returns 8-bit values, get_wch would return wider values.

you are right, ncurses binding does not have get_wch, only getch, and
this last is the only one called in ncurses library bindings.

Anyway, I've written a patch to include the get_wch method in the bindings.
See http://bugs.python.org/issue6755

Thanks for the clarification,
I?igo

Thomas Dickey

2009-08-21 19:53:17 UTC

Permalink

Post by IÃ±igo Serna

Post by Thomas Dickey

Post by IÃ±igo Serna
??????? c = win.getch()

You're using "getch", not "get_wch" (Python's ncurses binding may/may
not have the latter).
curses getch returns 8-bit values, get_wch would return wider values.

you are right, ncurses binding does not have get_wch, only getch, and
this last is the only one called in ncurses library bindings.
Anyway, I've written a patch to include the get_wch method in the bindings.
See http://bugs.python.org/issue6755
Thanks for the clarification,

no problem (report bugs)

Post by IÃ±igo Serna
I?igo

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net