Roland Mainz
2013-12-09 21:53:05 UTC
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesniak at gmail.com>
#define VMCHKMEM 0
this affects vmalloc detecting overbooked memory but will disable the
MAP_FIXED codepath
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <glenn.s.fowler at gmail.com>
purposes like the runtime linker ld.so.1 or debuggers.
1. On some systems this is a privileged operation and only available
for users with root privileges
2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
performance from 9 seconds to almost 15 minutes because it utterly
destroys the systems concept of large pages. If two MAP_FIXED mappings
follow directly each other the system downgrades the page size to the
smallest possible size, even trying to break up larger pages, which in
turn must be done by a special deamon (vmtasks)
3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
versions of Solaris
4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a
map(0xFFFFCD800B482000, 1048576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFCD800B482000
sigaction(SIGSEGV, 0xFFFFFD7FFFDFDE50, 0xFFFFFD7FFFDFDED0) = 0
Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
Received signal #11, SIGSEGV [caught]
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000,
0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
edit src/lib/libast/vmalloc/vmmaddress.c and changeOn Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1972 at gmail.com>
system
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported,
all
by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via
mmap(MAP_ANON)
** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when
possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have
not
dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to
favor
"safe"
MAP_FIXED should be avoided because its only there for specialOn 1 December 2013 17:26, Glenn Fowler <glenn.s.fowler at gmail.com>
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 34.60
user 33.27
sys 1.19
VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52
So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
correct.
What does VMALLOC_OPTIONS=getmem=safe do?
vmalloc has an internal discipline/method for getting memory from theI believe this is related to vmalloc changes between 2013-05-31 and
2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation
timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;2013-06-09
re-run the tests with
export VMALLOC_OPTIONS=getmem=safe
if that's the problem then it gives a clue on a general solution
details after confirmation
while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
Version AIJMP 93v- 2013-10-08
real 34.60
user 33.27
sys 1.19
VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 15.34
user 14.67
sys 0.52
So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
correct.
What does VMALLOC_OPTIONS=getmem=safe do?
system
several methods are available with varying degrees of thread safety etc.
see src/lib/libast/vmalloc/vmdcsystem.c for the code
and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
description (vmalloc.3 update shortly)
** getmemory=f enable f[,g] getmemory() functions if supported,
all
by default
** anon: mmap(MAP_ANON)
** break|sbrk: sbrk()
** native: native malloc()
** safe: safe sbrk() emulation via
mmap(MAP_ANON)
** zero: mmap(/dev/zero)
i believe the performance regression with "anon" is that on linux
mmap(0....MAP_ANON|MAP_PRIVATE...),
which lets the system decide the address, returns adjacent (when
possible)
region addresses from highest to lowest order
and the reverse order at minimum tends to fragment more memory
"zero" has the same hi=>lo characteristic
i suspect it adversely affects the vmalloc coalescing algorithm but have
not
dug deeper
for now the probe order in vmalloc/vmdcsystem.c was simply changed to
favor
"safe"
purposes like the runtime linker ld.so.1 or debuggers.
1. On some systems this is a privileged operation and only available
for users with root privileges
2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
performance from 9 seconds to almost 15 minutes because it utterly
destroys the systems concept of large pages. If two MAP_FIXED mappings
follow directly each other the system downgrades the page size to the
smallest possible size, even trying to break up larger pages, which in
turn must be done by a special deamon (vmtasks)
3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
versions of Solaris
4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a
map(0xFFFFCD800B482000, 1048576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFCD800B482000
sigaction(SIGSEGV, 0xFFFFFD7FFFDFDE50, 0xFFFFFD7FFFDFDED0) = 0
Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
Received signal #11, SIGSEGV [caught]
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000,
0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
#define VMCHKMEM 0
this affects vmalloc detecting overbooked memory but will disable the
MAP_FIXED codepath
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
Just to demonstrate "reservation" vs. "real usage" via Solaris pmap:
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmhdr.h build_i386_64bit_debug/src/lib/libast/vmalloc/vmhdr.h
--- src/lib/libast/vmalloc/vmhdr.h 2013-08-27 18:44:46.000000000 +0200
+++ src/lib/libast/vmalloc/vmhdr.h 2013-12-09 22:14:12.731227511 +0100
@@ -182,9 +182,9 @@
/* hint to regulate memory requests to discipline functions */
#if _ast_sizeof_size_t > 4 /* the address space is greater than 32-bit */
-#define VM_INCREMENT (1024*1024) /* lots of memory available here */
+#define VM_INCREMENT (16*1024*1024) /* lots of memory available here */
#else
-#define VM_INCREMENT (64*1024) /* perhaps more limited memory */
+#define VM_INCREMENT (4*1024*1024) /* perhaps more limited memory */
#endif
#define VM_PAGESIZE 8192 /* default assumed page size */
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmmaddress.c build_i386_64bit_debug/src/lib/libast/vmalloc/vmmaddress.c
--- src/lib/libast/vmalloc/vmmaddress.c 2013-06-09 06:13:49.000000000 +0200
+++ src/lib/libast/vmalloc/vmmaddress.c 2013-12-09 22:19:47.122281075 +0100
@@ -42,8 +42,16 @@
** Written by Kiem-Phong Vo, phongvo at gmail.com, 07/07/2012
*/
-/* see if a given range of address is available for mapping */
+/*
+ * see if a given range of address is available for mapping
+ * This is used for overcommit detection.
+ *
+ * Solaris (__SunOS) is explicily excluded since it does
+ * not allow overcommitment of memory by default
+ */
+#ifndef __SunOS
#define VMCHKMEM 1 /* set this to zero if signal&sigsetjmp don't work */
+#endif
#if VMCHKMEM
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmopen.c build_i386_64bit_debug/src/lib/libast/vmalloc/vmopen.c
--- src/lib/libast/vmalloc/vmopen.c 2013-09-04 07:15:04.000000000 +0200
+++ src/lib/libast/vmalloc/vmopen.c 2013-12-06 09:40:41.344273508 +0100
@@ -130,7 +130,9 @@
write(9, "vmalloc: panic: heap initialization error #4\n", 45);
return NIL(Vmalloc_t*);
}
+#if 0
memset(base, 0, size);
+#endif
/* make sure memory is properly aligned */
if((algn = (ssize_t)(VMLONG(base)%ALIGN)) == 0 )