Roland Mainz
2012-07-20 10:40:43 UTC
Hi!
----
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale. The tests then fail like this (on Solaris
11/64bit/AMD64, with David's patch for $ read -N1 # applied):
-- snip --
(export SHELL=$PWD/../../../build_i386_64bit_opt_extrabuiltins_allpatches/arch/sol11.i386\-64/bin/ksh
; LC_ALL=zh_CN.UTF-8 $SHELL
../../../build_i386_64bit_opt_extrabuiltins_allpatches/src/cmd/ksh93/tests/shtests
-l builtin_read.sh )
test builtin_read(zh_CN.UTF-8) begins at 2012-07-20+12:23:53
builtin_read.sh[147]:
test_read_one_character1/"\xe2\x82\xac"/Unicode EURO character
U+20AC/-N5: Expected "?????OK", got
$'\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]'
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9d\x8d\xa1"/Unicode CJKV Counting Rod
Numeral character U+1D361/-N5: Expected "OK", got
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9f\x80\x80"/Unicode Mahjong Tile
U+1F000 (character outside BMP!!)/-N5: Expected "OK", got
$'\u[1f000]\u[1f000]\u[1f000]\u[1f000]\u[1f000]'
test builtin_read(zh_CN.UTF-8) failed at 2012-07-20+12:25:57 with exit
code 1 [ 9 tests 1 error ]
-- snip --
Note this output has issues because xterm, Gnome gnome-terminal and
KDE's kconsole have issues with characters outside the Unicode Basic
Multilinguar Plane... but as you can see the problem with $ read -N5 #
happens for the Unicode Euro character, too.
Please do not trust the things you see on the terminal when characters
outside the BMP are involed... many terminal emulators are notoriously
bad (well.. xterm gets at least the ordering of characters and number
of terminal cells right... but selecting text and putting it into
another application causes the characters outside the BMP to "loose
bits").
Questions:
1. Is this a bug with $ read -N5 #, an issue with "pty" or my script ?
Note that I do not write the characters as one block... instead I let
issue "pty" them character-by-character, e.g. "c <character>c
<character>c <character>c <character>c <character>" (technically I
should add a small delay there to simulate a user banging on the
keyboard) ?
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: builtin_read.sh.gz
Type: application/x-gzip
Size: 2243 bytes
Desc: not available
URL: <https://mailman.research.att.com/pipermail/ast-developers/attachments/20120720/9d67407e/attachment.gz>
----
Attached (as "builtin_read.sh.gz") is a _prototype_ test module for
the "read" builtin. While writing the tests I found an issue with $
read -N5 # when the input consists of five multibyte characters in the
zh_CN.UTF-8 locale. The tests then fail like this (on Solaris
11/64bit/AMD64, with David's patch for $ read -N1 # applied):
-- snip --
(export SHELL=$PWD/../../../build_i386_64bit_opt_extrabuiltins_allpatches/arch/sol11.i386\-64/bin/ksh
; LC_ALL=zh_CN.UTF-8 $SHELL
../../../build_i386_64bit_opt_extrabuiltins_allpatches/src/cmd/ksh93/tests/shtests
-l builtin_read.sh )
test builtin_read(zh_CN.UTF-8) begins at 2012-07-20+12:23:53
builtin_read.sh[147]:
test_read_one_character1/"\xe2\x82\xac"/Unicode EURO character
U+20AC/-N5: Expected "?????OK", got
$'\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]'
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9d\x8d\xa1"/Unicode CJKV Counting Rod
Numeral character U+1D361/-N5: Expected "OK", got
builtin_read.sh[147]:
test_read_one_character1/"\xf0\x9f\x80\x80"/Unicode Mahjong Tile
U+1F000 (character outside BMP!!)/-N5: Expected "OK", got
$'\u[1f000]\u[1f000]\u[1f000]\u[1f000]\u[1f000]'
test builtin_read(zh_CN.UTF-8) failed at 2012-07-20+12:25:57 with exit
code 1 [ 9 tests 1 error ]
-- snip --
Note this output has issues because xterm, Gnome gnome-terminal and
KDE's kconsole have issues with characters outside the Unicode Basic
Multilinguar Plane... but as you can see the problem with $ read -N5 #
happens for the Unicode Euro character, too.
Please do not trust the things you see on the terminal when characters
outside the BMP are involed... many terminal emulators are notoriously
bad (well.. xterm gets at least the ordering of characters and number
of terminal cells right... but selecting text and putting it into
another application causes the characters outside the BMP to "loose
bits").
Questions:
1. Is this a bug with $ read -N5 #, an issue with "pty" or my script ?
Note that I do not write the characters as one block... instead I let
issue "pty" them character-by-character, e.g. "c <character>c
<character>c <character>c <character>c <character>" (technically I
should add a small delay there to simulate a user banging on the
keyboard) ?
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: builtin_read.sh.gz
Type: application/x-gzip
Size: 2243 bytes
Desc: not available
URL: <https://mailman.research.att.com/pipermail/ast-developers/attachments/20120720/9d67407e/attachment.gz>