Discussion:
[ast-developers] Problems with "read -N5" when reading multibye characters from tty...
Roland Mainz
2012-07-20 10:40:43 UTC
Permalink
Hi!

----

Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale. The tests then fail like this (on Solaris
11/64bit/AMD64, with David's patch for $ read -N1 # applied):
-- snip --
(export SHELL=$PWD/../../../build_i386_64bit_opt_extrabuiltins_allpatches/arch/sol11.i386\-64/bin/ksh
; LC_ALL=zh_CN.UTF-8 $SHELL
../../../build_i386_64bit_opt_extrabuiltins_allpatches/src/cmd/ksh93/tests/shtests
-l builtin_read.sh )
test builtin_read(zh_CN.UTF-8) begins at 2012-07-20+12:23:53
builtin_read.sh[147]:
test_read_one_character1/"\xe2\x82\xac"/Unicode EURO character
U+20AC/-N5: Expected "?????OK", got
$'\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]'
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9d\x8d\xa1"/Unicode CJKV Counting Rod
Numeral character U+1D361/-N5: Expected "OK", got
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9f\x80\x80"/Unicode Mahjong Tile
U+1F000 (character outside BMP!!)/-N5: Expected "OK", got
$'\u[1f000]\u[1f000]\u[1f000]\u[1f000]\u[1f000]'
test builtin_read(zh_CN.UTF-8) failed at 2012-07-20+12:25:57 with exit
code 1 [ 9 tests 1 error ]
-- snip --

Note this output has issues because xterm, Gnome gnome-terminal and
KDE's kconsole have issues with characters outside the Unicode Basic
Multilinguar Plane... but as you can see the problem with $ read -N5 #
happens for the Unicode Euro character, too.
Please do not trust the things you see on the terminal when characters
outside the BMP are involed... many terminal emulators are notoriously
bad (well.. xterm gets at least the ordering of characters and number
of terminal cells right... but selecting text and putting it into
another application causes the characters outside the BMP to "loose
bits").

Questions:
1. Is this a bug with $ read -N5 #, an issue with "pty" or my script ?
Note that I do not write the characters as one block... instead I let
issue "pty" them character-by-character, e.g. "c <character>c
<character>c <character>c <character>c <character>" (technically I
should add a small delay there to simulate a user banging on the
keyboard) ?

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: builtin_read.sh.gz
Type: application/x-gzip
Size: 2243 bytes
Desc: not available
URL: <https://mailman.research.att.com/pipermail/ast-developers/attachments/20120720/9d67407e/attachment.gz>
Roland Mainz
2012-07-20 10:55:21 UTC
Permalink
Post by Roland Mainz
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale.
[snip]

Grumpf... I should sometimes proofread the emails I write... ;-/
... the issue is that $ read -N5 x # returns after reading five ascii
characters but does not return after reading five multibyte
characters. This seems to happen only if the input is a tty (e.g. from
"pty" ) ...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Roland Mainz
2013-04-20 00:57:15 UTC
Permalink
Post by Roland Mainz
Post by Roland Mainz
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale.
[snip]
Grumpf... I should sometimes proofread the emails I write... ;-/
... the issue is that $ read -N5 x # returns after reading five ascii
characters but does not return after reading five multibyte
characters. This seems to happen only if the input is a tty (e.g. from
"pty" ) ...
Erm... ping! ... the issue still occurs in ast-ksh.2013-04-09 (mostly
affecting Asian users in CJKV locales (Olga hit it today again with
Ukrainian/Cyrillic characters...) ... ;-(( ) ...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Roland Mainz
2013-04-23 09:24:00 UTC
Permalink
Post by Roland Mainz
Post by Roland Mainz
Post by Roland Mainz
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale.
[snip]
Grumpf... I should sometimes proofread the emails I write... ;-/
... the issue is that $ read -N5 x # returns after reading five ascii
characters but does not return after reading five multibyte
characters. This seems to happen only if the input is a tty (e.g. from
"pty" ) ...
Erm... ping! ... the issue still occurs in ast-ksh.2013-04-09 (mostly
affecting Asian users in CJKV locales (Olga hit it today again with
Ukrainian/Cyrillic characters...) ... ;-(( ) ...
The bug is still present in ast-ksh.2013-04-22...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Roland Mainz
2013-07-07 05:00:51 UTC
Permalink
Post by Roland Mainz
Post by Roland Mainz
Post by Roland Mainz
Post by Roland Mainz
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale.
[snip]
Grumpf... I should sometimes proofread the emails I write... ;-/
... the issue is that $ read -N5 x # returns after reading five ascii
characters but does not return after reading five multibyte
characters. This seems to happen only if the input is a tty (e.g. from
"pty" ) ...
Erm... ping! ... the issue still occurs in ast-ksh.2013-04-09 (mostly
affecting Asian users in CJKV locales (Olga hit it today again with
Ukrainian/Cyrillic characters...) ... ;-(( ) ...
The bug is still present in ast-ksh.2013-04-22...
I'm still seeing this bug on all platforms (e.g. SuSE Linux 12.3,
Solaris 11/B145, Solaris 11.1 etc.) ... ;-(

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Loading...