Roland Mainz
2013-12-09 22:17:55 UTC
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesniak at gmail.com>
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch abovefrom this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
vmalloc seems to resort to try shared memory:
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8
brk(0x00603480) = 0
shmat(8, 0, 0600) = 0xFFFFFD7FFAA00000
shmdt(0xFFFFFD7FFAA00000) = 0
shmat(8, 0xDFFFFFAFFFA83000, 0600) Err#22 EINVAL
shmat(8, 0xEFFFFE97FD241000, 0600) Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE20000, 0600) Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB410000, 0600) Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL
shmat(8, 0xFFDFFD822AA50000, 0600) Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL
shmat(8, 0xFFFF7D8003601000, 0600) Err#22 EINVAL
shmat(8, 0xFFFFBD7FFF000000, 0600) = 0xFFFFBD7FFF000000
shmdt(0xFFFFBD7FFF000000) = 0
shmat(8, 0xFFFFBD7FFB000000, 0600) = 0xFFFFBD7FFB000000
shmdt(0xFFFFBD7FFB000000) = 0
shmat(8, 0xFFFFBD7FF7000000, 0600) = 0xFFFFBD7FF7000000
shmdt(0xFFFFBD7FF7000000) = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.
Are there any platforms which really have to resort to use shared memory ?
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)