[ast-developers] Severe performance regression between ksh 2010-03-05 and 2013-10-08

Discussion:

Lionel Cons

2013-11-27 22:38:42 UTC

We've observing a *severe* performance regression between ksh
2010-03-05 and 2013-10-08 on Solaris 11, AMD64, LANG is en_US.UTF-8:

# prepare
$ timex seq 1400000 >xxx

# run new ksh
$ timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
"%s\n" "${a[@]}"
; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08

real 32.59
user 32.19
sys 0.30

# run old ksh - much faster
$ timex /bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
"%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version JM 93t+ 2010-03-05

real 14.59
user 13.92
sys 0.56

Can anyone explain this? IO-wise the new ksh is better but consumes
much more CPU time, while the old ksh issues more IO requests but
consumes only half as much CPU time.

Lionel

Glenn Fowler

2013-11-28 07:58:55 UTC

Permalink

here are some points of reference showing real user sys times
these were manually sampled to pinpoint jumps in performance from 206 ksh
binaries from 2006-11-22 through 2013-09-10

ksh-2009-11-17 0m12.06s 0m11.82s 0m0.14s
ksh-2009-12-04 0m13.84s 0m13.58s 0m0.16s
ksh-2010-06-16 0m15.15s 0m14.92s 0m0.17s
ksh-2010-11-16 0m14.06s 0m13.82s 0m0.16s
ksh-2011-04-11 0m12.72s 0m12.44s 0m0.17s
ksh-2011-06-21 0m12.58s 0m12.35s 0m0.15s
ksh-2011-09-21 0m20.58s 0m20.27s 0m0.19s
ksh-2013-04-11 0m23.40s 0m23.15s 0m0.17s
ksh-2013-05-13 0m13.83s 0m13.61s 0m0.12s
ksh-2013-05-31 0m14.15s 0m13.93s 0m0.11s
ksh-2013-06-06 0m27.15s 0m26.87s 0m0.14s

Post by Lionel Cons
We've observing a *severe* performance regression between ksh
# prepare
$ timex seq 1400000 >xxx
# run new ksh
$ timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 32.59
user 32.19
sys 0.30
# run old ksh - much faster
$ timex /bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version JM 93t+ 2010-03-05
real 14.59
user 13.92
sys 0.56
Can anyone explain this? IO-wise the new ksh is better but consumes
much more CPU time, while the old ksh issues more IO requests but
consumes only half as much CPU time.
Lionel
_______________________________________________
ast-users mailing list
ast-users at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131128/d68ab222/attachment.html>

Lionel Cons

2013-11-28 18:28:23 UTC

Permalink

Post by Glenn Fowler
here are some points of reference showing real user sys times
these were manually sampled to pinpoint jumps in performance from 206 ksh
binaries from 2006-11-22 through 2013-09-10
ksh-2009-11-17 0m12.06s 0m11.82s 0m0.14s
ksh-2009-12-04 0m13.84s 0m13.58s 0m0.16s
ksh-2010-06-16 0m15.15s 0m14.92s 0m0.17s
ksh-2010-11-16 0m14.06s 0m13.82s 0m0.16s
ksh-2011-04-11 0m12.72s 0m12.44s 0m0.17s
ksh-2011-06-21 0m12.58s 0m12.35s 0m0.15s
ksh-2011-09-21 0m20.58s 0m20.27s 0m0.19s
ksh-2013-04-11 0m23.40s 0m23.15s 0m0.17s
ksh-2013-05-13 0m13.83s 0m13.61s 0m0.12s
ksh-2013-05-31 0m14.15s 0m13.93s 0m0.11s
ksh-2013-06-06 0m27.15s 0m26.87s 0m0.14s
On Wed, Nov 27, 2013 at 5:38 PM, Lionel Cons <lionelcons1972 at gmail.com>

I looks that the problem is related to the function scope, without a
function scope the loop takes 24 seconds, and with function scope it
takes 32 seconds:

timex ~/bin/ksh -c 'nanosort() { typeset -A a ; integer k=0; while
read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf "%s\n"
"${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08

real 24.98
user 24.57
sys 0.32

timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
"%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08

real 32.79
user 32.39
sys 0.31

Lionel

Glenn Fowler

2013-12-01 16:26:06 UTC

Permalink

I believe this is related to vmalloc changes between 2013-05-31 and
2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation

Post by Lionel Cons

I looks that the problem is related to the function scope, without a
function scope the loop takes 24 seconds, and with function scope it
timex ~/bin/ksh -c 'nanosort() { typeset -A a ; integer k=0; while
read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf "%s\n"
Version AIJMP 93v- 2013-10-08
real 24.98
user 24.57
sys 0.32
timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 32.79
user 32.39
sys 0.31
Lionel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131201/0d7f30e1/attachment.html>

Lionel Cons

2013-12-01 21:58:14 UTC

Permalink

Post by Glenn Fowler
I believe this is related to vmalloc changes between 2013-05-31 and
2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation

timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
"%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08

real 34.60
user 33.27
sys 1.19

VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52

So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is correct.

What does VMALLOC_OPTIONS=getmem=safe do?

Lionel

Post by Glenn Fowler
On Thu, Nov 28, 2013 at 1:28 PM, Lionel Cons <lionelcons1972 at gmail.com>

Post by Lionel Cons

--
Lionel

Glenn Fowler

2013-12-04 14:02:01 UTC

Permalink

Post by Lionel Cons

timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 34.60
user 33.27
sys 1.19
VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52
So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is correct.
What does VMALLOC_OPTIONS=getmem=safe do?

vmalloc has an internal discipline/method for getting memory from the system
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)

** getmemory=f enable f[,g] getmemory() functions if supported,
all by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via mmap(MAP_ANON)
** zero: mmap(/dev/zero)

i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have
not dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to favor
"safe"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131204/a6ef3020/attachment.html>

Irek Szczesniak

2013-12-05 21:50:06 UTC

Permalink

On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1972 at gmail.com>

Post by Lionel Cons

vmalloc has an internal discipline/method for getting memory from the system
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported, all
by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via mmap(MAP_ANON)
** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have not
dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to favor
"safe"

MAP_FIXED should be avoided because its only there for special
purposes like the runtime linker ld.so.1 or debuggers.

Using this for a general-purpose memory allocator causes serious problems:
1. On some systems this is a privileged operation and only available
for users with root privileges

2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
performance from 9 seconds to almost 15 minutes because it utterly
destroys the systems concept of large pages. If two MAP_FIXED mappings
follow directly each other the system downgrades the page size to the
smallest possible size, even trying to break up larger pages, which in
turn must be done by a special deamon (vmtasks)

3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
versions of Solaris

4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a SEGV:
map(0xFFFFCD800B482000, 1048576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFCD800B482000
sigaction(SIGSEGV, 0xFFFFFD7FFFDFDE50, 0xFFFFFD7FFFDFDED0) = 0
Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
Received signal #11, SIGSEGV [caught]
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000,
0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]

Irek

Glenn Fowler

2013-12-06 04:40:40 UTC

Permalink

edit src/lib/libast/vmalloc/vmmaddress.c and change
#define VMCHKMEM 0
this affects vmalloc detecting overbooked memory but will disable the
MAP_FIXED codepath

Post by Glenn Fowler

On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1972 at gmail.com>

On 1 December 2013 17:26, Glenn Fowler <glenn.s.fowler at gmail.com>

vmalloc has an internal discipline/method for getting memory from the

system

several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported,

all

by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via

mmap(MAP_ANON)

** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when

possible)

region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have

not

dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to

favor

"safe"

MAP_FIXED should be avoided because its only there for special
purposes like the runtime linker ld.so.1 or debuggers.
1. On some systems this is a privileged operation and only available
for users with root privileges
2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
performance from 9 seconds to almost 15 minutes because it utterly
destroys the systems concept of large pages. If two MAP_FIXED mappings
follow directly each other the system downgrades the page size to the
smallest possible size, even trying to break up larger pages, which in
turn must be done by a special deamon (vmtasks)
3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
versions of Solaris
map(0xFFFFCD800B482000, 1048576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFCD800B482000
sigaction(SIGSEGV, 0xFFFFFD7FFFDFDE50, 0xFFFFFD7FFFDFDED0) = 0
Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
Received signal #11, SIGSEGV [caught]
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000,
0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
Irek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131205/3021f07f/attachment-0001.html>