Discussion:
[ast-developers] vmalloc memory allocations via shared memory ? /
Roland Mainz
2013-12-09 22:17:55 UTC
Permalink
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesniak at gmail.com>
[snip]
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
vmalloc seems to resort to try shared memory:
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8
brk(0x00603480) = 0
shmat(8, 0, 0600) = 0xFFFFFD7FFAA00000
shmdt(0xFFFFFD7FFAA00000) = 0
shmat(8, 0xDFFFFFAFFFA83000, 0600) Err#22 EINVAL
shmat(8, 0xEFFFFE97FD241000, 0600) Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE20000, 0600) Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB410000, 0600) Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL
shmat(8, 0xFFDFFD822AA50000, 0600) Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL
shmat(8, 0xFFFF7D8003601000, 0600) Err#22 EINVAL
shmat(8, 0xFFFFBD7FFF000000, 0600) = 0xFFFFBD7FFF000000
shmdt(0xFFFFBD7FFF000000) = 0
shmat(8, 0xFFFFBD7FFB000000, 0600) = 0xFFFFBD7FFB000000
shmdt(0xFFFFBD7FFB000000) = 0
shmat(8, 0xFFFFBD7FF7000000, 0600) = 0xFFFFBD7FF7000000
shmdt(0xFFFFBD7FF7000000) = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.

Are there any platforms which really have to resort to use shared memory ?

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Glenn Fowler
2013-12-10 08:13:37 UTC
Permalink
Post by Roland Mainz
On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler <glenn.s.fowler at gmail.com>
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesniak at gmail.com>
[snip]
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [
anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8
brk(0x00603480) = 0
shmat(8, 0, 0600) = 0xFFFFFD7FFAA00000
shmdt(0xFFFFFD7FFAA00000) = 0
shmat(8, 0xDFFFFFAFFFA83000, 0600) Err#22 EINVAL
shmat(8, 0xEFFFFE97FD241000, 0600) Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE20000, 0600) Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB410000, 0600) Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL
shmat(8, 0xFFDFFD822AA50000, 0600) Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL
shmat(8, 0xFFFF7D8003601000, 0600) Err#22 EINVAL
shmat(8, 0xFFFFBD7FFF000000, 0600) = 0xFFFFBD7FFF000000
shmdt(0xFFFFBD7FFF000000) = 0
shmat(8, 0xFFFFBD7FFB000000, 0600) = 0xFFFFBD7FFB000000
shmdt(0xFFFFBD7FFB000000) = 0
shmat(8, 0xFFFFBD7FF7000000, 0600) = 0xFFFFBD7FF7000000
shmdt(0xFFFFBD7FF7000000) = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.'
if you look at that particular code its probing process address boundaries
and then releasing after the probe (shmdt())
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131210/358385b5/attachment.html>
Lionel Cons
2013-12-10 09:08:01 UTC
Permalink
On Mon, Dec 9, 2013 at 5:17 PM, Roland Mainz <roland.mainz at nrubsig.org>
Post by Roland Mainz
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesniak at gmail.com>
[snip]
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw--- [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8
brk(0x00603480) = 0
shmat(8, 0, 0600) = 0xFFFFFD7FFAA00000
shmdt(0xFFFFFD7FFAA00000) = 0
shmat(8, 0xDFFFFFAFFFA83000, 0600) Err#22 EINVAL
shmat(8, 0xEFFFFE97FD241000, 0600) Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE20000, 0600) Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB410000, 0600) Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL
shmat(8, 0xFFDFFD822AA50000, 0600) Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL
shmat(8, 0xFFFF7D8003601000, 0600) Err#22 EINVAL
shmat(8, 0xFFFFBD7FFF000000, 0600) = 0xFFFFBD7FFF000000
shmdt(0xFFFFBD7FFF000000) = 0
shmat(8, 0xFFFFBD7FFB000000, 0600) = 0xFFFFBD7FFB000000
shmdt(0xFFFFBD7FFB000000) = 0
shmat(8, 0xFFFFBD7FF7000000, 0600) = 0xFFFFBD7FF7000000
shmdt(0xFFFFBD7FF7000000) = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.'
if you look at that particular code its probing process address boundaries
and then releasing after the probe (shmdt())
How useful is this kind of probing? Most operating systems restrict
shared memory to a specific virtual address range which is defined at
boot time. Probing outside that range will always return a failure
because its not in the 'address window' defined by the system.
AFAIK such a probe strikes me as pretty useless because it depends on
a behaviour which is not portable across platforms or different
hardware configurations running the same operating system version.

Lionel
Glenn Fowler
2013-12-10 09:27:05 UTC
Permalink
i'll defer to kpv on this
On Mon, Dec 9, 2013 at 5:17 PM, Roland Mainz <roland.mainz at nrubsig.org>
On Mon, Dec 9, 2013 at 10:53 PM, Roland Mainz <roland.mainz at nrubsig.org
On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler <
glenn.s.fowler at gmail.com>
On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <
iszczesniak at gmail.com>
[snip]
Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...
... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris
2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000 16384 148 20 - rw---
[
anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.
3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris
4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them
Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8
brk(0x00603480) = 0
shmat(8, 0, 0600) = 0xFFFFFD7FFAA00000
shmdt(0xFFFFFD7FFAA00000) = 0
shmat(8, 0xDFFFFFAFFFA83000, 0600) Err#22 EINVAL
shmat(8, 0xEFFFFE97FD241000, 0600) Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE20000, 0600) Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB410000, 0600) Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL
shmat(8, 0xFFDFFD822AA50000, 0600) Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL
shmat(8, 0xFFFF7D8003601000, 0600) Err#22 EINVAL
shmat(8, 0xFFFFBD7FFF000000, 0600) = 0xFFFFBD7FFF000000
shmdt(0xFFFFBD7FFF000000) = 0
shmat(8, 0xFFFFBD7FFB000000, 0600) = 0xFFFFBD7FFB000000
shmdt(0xFFFFBD7FFB000000) = 0
shmat(8, 0xFFFFBD7FF7000000, 0600) = 0xFFFFBD7FF7000000
shmdt(0xFFFFBD7FF7000000) = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.'
if you look at that particular code its probing process address
boundaries
and then releasing after the probe (shmdt())
How useful is this kind of probing? Most operating systems restrict
shared memory to a specific virtual address range which is defined at
boot time. Probing outside that range will always return a failure
because its not in the 'address window' defined by the system.
AFAIK such a probe strikes me as pretty useless because it depends on
a behaviour which is not portable across platforms or different
hardware configurations running the same operating system version.
Lionel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20131210/1e6d4500/attachment-0001.html>
Loading...