Discussion:
[ast-developers] Crash in libast regex engine when parsing large string via "${x//~(E)([[:digit:]])|([[:alpha:]])/D}" ...
Roland Mainz
2013-05-29 12:15:48 UTC
Permalink
Hi!

----

(Unpatched) ast-ksh.2013-05-24 on SuSE 12.3/AMD64/64bit crashes with
the following stack trace when it processes a large text file
("/usr/share/doc/packages/docbook5-xsl-stylesheets/RELEASE-NOTES.txt"
on SuSE 12.3 ... I've attached it as "RELEASE-NOTES.txt.bz2" ... note
that the text file seems to contain Unicode characters):
-- snip --
$ export LC_ALL=en_US.UTF-8
$ gdb --args ./build_i386_64bit_debug/arch/linux.i386-64/bin/ksh -c
'typeset dummy x="$(
</usr/share/doc/packages/docbook5-xsl-stylesheets/RELEASE-NOTES.txt )"
; dummy="${x//~(E)([[:digit:]])|([[:alpha:]])/D}" ; :'
[snip]
Reading symbols from
/home/test001/work/ast_ksh_20130524/build_i386_64bit_debug/arch/linux.i386-64/bin/ksh...done.
(gdb) run
Starting program:
/home/test001/work/ast_ksh_20130524/build_i386_64bit_debug/arch/linux.i386-64/bin/ksh
-c typeset\ dummy\ x=\"\$\(\
\</usr/share/doc/packages/docbook5-xsl-stylesheets/RELEASE-NOTES.txt\
\)\"\ \;\ dummy=\"\$\{x//\~\(E\)\(\[\[:digit:\]\]\)\|\(\[\[:alpha:\]\]\)/D\}\"\
\;\ :

Program received signal SIGSEGV, Segmentation fault.
0x00000000004ef1bf in parse (env=0x2bfffbe25bd0, rex=0x2bfffbe38180,
cont=0x2bfffbe25c50,
s=0x2bfffbed3600 "rofile\" instead of xsl:copy-of for attributes
so\n they can be more easily customized.\n\nTools\n\nThe following
changes have been made to the tools code since the
1.73.2\nrelease.\n\n ? Michael(tm) Smi"...) at
/home/test001/work/ast_ksh_20130524/build_i386_64bit_debug/src/lib/libast/regex/regnexec.c:836
836 matchcopy(env, rex);
(gdb) print env
$1 = (Env_t *) 0x2bfffbe25bd0
(gdb) print rex
$2 = (Rex_t *) 0x2bfffbe38180
(gdb) print *env
$3 = {rex = 0x2bfffbe38180, disc = 0x7f6d10 <_reg_state+3728>,
_ast_regex = 0x2bfffbf01e08,
beg = 0x2bfffbed3600 "rofile\" instead of xsl:copy-of for attributes
so\n they can be more easily customized.\n\nTools\n\nThe following
changes have been made to the tools code since the
1.73.2\nrelease.\n\n ? Michael(tm) Smi"..., end = 0x2bfffbf01de7 "",
pos = 0x2bfffbf04280, bestpos = 0x2bfffbf03c50, match =
0x2bfffbe635d0,
best = 0x2bfffbe63600, stk = {offset = 175315,
base = 0x2bfffbe388e0 "dummy=DDDDDDD DDDDD DDD DDD DDDDDDD DDD ",
'D' <repeats 11 times>, "\n\n$DDDDDDDD: DDDD $ $DDDD: DDDD-DD-DD
DD:DD:DD +DDDD (DDD, DD DDD DDDD) $\n\nDDDD-DD-DD\n\nDDDD
DDDDDDD-DDDDD DDDDDDDD DD DDDDDDDDD DD DDD DDDDDDDDD DDDD"...}, min =
1, nsub = 2, flags = 2684364802, error = 0, explicit = -1, leading =
-1, refs = 1, done = {
type = 13 '\r', marked = 0 '\000', serial = 0, flags = 2684364802,
explicit = 0, next = 0x0, lo = 0, hi = 0, map = 0x0, re = {alt_catch =
{cont = 0x0}, bm = {mask = 0x0,
skip = 0x0, fail = 0x0, size = 0, back = 0, left = 0, right =
0, complete = 0}, behind_catch = {cont = 0x0, beg = 0x0, end = 0x0},
charclass = 0x0, collate = {
invert = 0, elements = 0x0}, cond_catch = {beg = 0x0, next =
{0x0, 0x0}, cont = 0x0, yes = 0}, conj_left = {beg = 0x0, right = 0x0,
cont = 0x0}, conj_right = {end = 0x0,
cont = 0x0}, data = 0x0, exec = {data = 0x0, text = 0x0, size
= 0}, group = {number = 0, last = 0, size = 0, back = 0, flags = 0,
expr = {binary = {left = 0x0,
right = 0x0, serial = 0}, rex = 0x0}}, group_catch = {cont
= 0x0, eo = 0x0}, neg_catch = {beg = 0x0, index = 0x0}, nest =
{primary = 0, none = 0, type = {0}},
onechar = 0 '\000', rep_catch = {cont = 0x0, ref = 0x0, beg =
0x0, n = 0}, string = {fail = 0x0, base = 0x0, size = 0}, trie = {root
= 0x0, min = 0, max = 0}}}, stats = {
re_flags = 0, re_min = -1, re_max = -1, re_record = 0, env = 0},
fold = '\000' <repeats 255 times>, hard = 1 '\001', once = 0 '\000',
separate = 0 '\000', stack = 1 '\001',
sub = 0 '\000', test = 0 '\000'}
(gdb) print *rex
$4 = {type = 1 '\001', marked = 0 '\000', serial = 1, flags =
2684364802, explicit = -1, next = 0x0, lo = 0, hi = 0, map = 0x0, re =
{alt_catch = {cont = 0x200000001}, bm = {
mask = 0x200000001, skip = 0x0, fail = 0x0, size =
48378442645632, back = 48378442645760, left = 4, right = 0, complete =
0}, behind_catch = {cont = 0x200000001,
beg = 0x0, end = 0x0}, charclass = 0x200000001, collate =
{invert = 1, elements = 0x0}, cond_catch = {beg = 0x200000001 <Address
0x200000001 out of bounds>, next = {0x0,
0x0}, cont = 0x2bfffbe38080, yes = -68976384}, conj_left =
{beg = 0x200000001 <Address 0x200000001 out of bounds>, right = 0x0,
cont = 0x0}, conj_right = {
end = 0x200000001 <Address 0x200000001 out of bounds>, cont =
0x0}, data = 0x200000001, exec = {data = 0x200000001, text = 0x0, size
= 0}, group = {number = 1, last = 2,
size = 0, back = 0, flags = 0, expr = {binary = {left =
0x2bfffbe38080, right = 0x2bfffbe38100, serial = 4}, rex =
0x2bfffbe38080}}, group_catch = {cont = 0x200000001,
eo = 0x0}, neg_catch = {beg = 0x200000001 <Address 0x200000001
out of bounds>, index = 0x0}, nest = {primary = 1, none = 2, type =
{0}}, onechar = 1 '\001', rep_catch = {
cont = 0x200000001, ref = 0x0, beg = 0x0, n = -68976512}, string
= {fail = 0x200000001, base = 0x0, size = 0}, trie = {root =
0x200000001, min = 0, max = 0}}}
-- snip --

A seperate run with the modified (to recognise the libast allocators)
valgrind showed these valgrind hits:
-- snip --
==59064== Invalid read of size 8
==59064== at 0x4EF1BF: parse (regnexec.c:836)
==59064== by 0x4F4606: _ast_regnexec_20120528 (regnexec.c:1970)
==59064== by 0x4ACD54: _ast_strngrpmatch (strmatch.c:141)
==59064== by 0x44D6B7: varsub (macro.c:1822)
==59064== by 0x44928B: copyto (macro.c:634)
==59064== by 0x44763D: sh_mactrim (macro.c:181)
==59064== by 0x451CA1: sh_setlist (name.c:332)
==59064== by 0x479AEF: sh_exec (xec.c:1168)
==59064== by 0x47D967: sh_exec (xec.c:2218)
==59064== by 0x40F2FB: exfile (main.c:599)
==59064== by 0x40E4C6: sh_main (main.c:371)
==59064== by 0x40D620: main (pmain.c:45)
==59064== Address 0x2c is not stack'd, malloc'd or (recently) free'd
==59064==
==59064==
==59064== Process terminating with default action of signal 11 (SIGSEGV)
==59064== Access not within mapped region at address 0x2C
==59064== at 0x4EF1BF: parse (regnexec.c:836)
==59064== by 0x4F4606: _ast_regnexec_20120528 (regnexec.c:1970)
==59064== by 0x4ACD54: _ast_strngrpmatch (strmatch.c:141)
==59064== by 0x44D6B7: varsub (macro.c:1822)
==59064== by 0x44928B: copyto (macro.c:634)
==59064== by 0x44763D: sh_mactrim (macro.c:181)
==59064== by 0x451CA1: sh_setlist (name.c:332)
==59064== by 0x479AEF: sh_exec (xec.c:1168)
==59064== by 0x47D967: sh_exec (xec.c:2218)
==59064== by 0x40F2FB: exfile (main.c:599)
==59064== by 0x40E4C6: sh_main (main.c:371)
==59064== by 0x40D620: main (pmain.c:45)
-- snip --

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RELEASE-NOTES.txt.bz2
Type: application/x-bzip2
Size: 76521 bytes
Desc: not available
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20130529/e2622faf/attachment-0001.bz2>
Loading...