Roland Mainz
2013-12-05 23:41:59 UTC
On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1972 at gmail.com>
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 34.60
user 33.27
sys 1.19
VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52
So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
correct.
What does VMALLOC_OPTIONS=getmem=safe do?
vmalloc has an internal discipline/method for getting memory from the system
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported, all
by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via mmap(MAP_ANON)
** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have not
dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to favor
"safe"
Erm... since Irek prodded me by phone I looked at the issue...I believe this is related to vmalloc changes between 2013-05-31 and
2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation
timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 34.60
user 33.27
sys 1.19
VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52
So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
correct.
What does VMALLOC_OPTIONS=getmem=safe do?
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported, all
by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via mmap(MAP_ANON)
** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have not
dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to favor
"safe"
... some observations first (on Solaris 11/Illumos):
1. /dev/zero allocator vs. |sbrk()| allocator on Solaris:
-- snip --
$ VMALLOC_OPTIONS=getmem=zero timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 32.98
user 32.55
sys 0.32
$ VMALLOC_OPTIONS=getmem=break timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 1:08.41
user 1:07.87
sys 0.38
-- snip --
... which means the |sbrk()| allocator is twice a slow as the
/dev/zero allocator.
2. The default block size by the normal |mmap(MAP_ANON)| allocator is
1MB. This is IMHO far to small because there is IMO not enough space
for the coalescing algorithm to operate and a *lot* of fragmentation
occurs.
IMHO a _minimum_ page size of 4MB should be picked (as a side-effect
the shell would get 4MB or 2MB largepages on platforms like Solaris
automagically).
3. After each |mmap(MAP_ANON)| allocation the libast allocator
"manually" clears the obtained memory chunk with zero bytes. This is
IMO a *major* source of wasting CPU time (>= ~30%-38% of a
|_ast_malloc(1024*1024)|) because each memory page is instantiated by
writing zeros to it. If the clearing could be avoided (which is
unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate
pages which we do not use yet.
Just to make it clear: Allocating a 1MB chunk of memory via
|mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
*no* (visible) difference in performance until we touch the pages via
either read/execute or write accesses.
Currently the libast allocator code writes zeros into the whole chunk
of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
performance because *all* pages are created physically instead of just
being some memory marked as "reserved". If libast would stop writing
into memory chunks directly after the |mmap(MAP_ANON)| we could easily
bump the allocation size up to 32MB or better without any performance
penalty...
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)