Discussion:
[ast-developers] Using |SF_SEQUENTIAL| in AST grep(1) builtin...
Roland Mainz
2013-07-14 16:16:03 UTC
Permalink
Hi!

----

During benchmarking I noticed an issue with AST grep(1) - it uses
|mmap()| but doesn't use |madvise(..., MADV_SEQUENTIAL, ...)| ... I
digged a little bit around in the code and noticed that while sfio has
|SF_SEQUENTIAL| there is no way to set it at |sfopen()| time...

... what would be the best place to fix it ? Putting it into
src/lib/libcmd/grep.c doesn't help other cases where huge regex data
are processed and there are cases when |mmap()| may not work (e.g.
filesystem doesn't support |mmap()| or chunk size is too small) but we
could still use |posix_fadvise(..., POSIX_FADV_SEQUENTIAL)| ... would
be a new |sfioadvise()| call be a good idea ?

** Notes:
- The the performance improvement measured via the "time"/"timex" may
be be minor for idle systems because |madvise(..., MADV_SEQUENTIAL,
...)| (and to a lesser degree |posix_fadvise(...,
POSIX_FADV_SEQUENTIAL)|) affects the time needed until an I/O page
gets re-used for something else. The trouble is that there are
multiple ways to get them re-used... and in some cases (like Solaris
= 11.1) it may even be a seperate "CPU strand" (on the same CPU
(sharing the same MMU)) which does the reusing (Solaris >= 11.1
offloaded some VM tasks to different strands to make applications
faster by parallising the VM work).
Or short: The performance improvement is for a complete system (e.g.
being able to process more data) but may have little effect for an
individual process run (except when the VM system is already under
pressure... then the performance benefit can be huge).

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Roland Mainz
2013-07-27 03:33:03 UTC
Permalink
Post by Roland Mainz
During benchmarking I noticed an issue with AST grep(1) - it uses
|mmap()| but doesn't use |madvise(..., MADV_SEQUENTIAL, ...)| ... I
digged a little bit around in the code and noticed that while sfio has
|SF_SEQUENTIAL| there is no way to set it at |sfopen()| time...
... what would be the best place to fix it ? Putting it into
src/lib/libcmd/grep.c doesn't help other cases where huge regex data
are processed and there are cases when |mmap()| may not work (e.g.
filesystem doesn't support |mmap()| or chunk size is too small) but we
could still use |posix_fadvise(..., POSIX_FADV_SEQUENTIAL)| ... would
be a new |sfioadvise()| call be a good idea ?
- The the performance improvement measured via the "time"/"timex" may
be be minor for idle systems because |madvise(..., MADV_SEQUENTIAL,
...)| (and to a lesser degree |posix_fadvise(...,
POSIX_FADV_SEQUENTIAL)|) affects the time needed until an I/O page
gets re-used for something else. The trouble is that there are
multiple ways to get them re-used... and in some cases (like Solaris
= 11.1) it may even be a seperate "CPU strand" (on the same CPU
(sharing the same MMU)) which does the reusing (Solaris >= 11.1
offloaded some VM tasks to different strands to make applications
faster by parallising the VM work).
Or short: The performance improvement is for a complete system (e.g.
being able to process more data) but may have little effect for an
individual process run (except when the VM system is already under
pressure... then the performance benefit can be huge).
Erm... ping!

... can we please discuss |sfioadvise()| before ast-ksh enters the
beta phase ? IMO sfio should stop guessing about the access patterns
and accept hints via |sfioadvise()| ...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Roland Mainz
2013-09-01 03:33:42 UTC
Permalink
Post by Roland Mainz
Post by Roland Mainz
During benchmarking I noticed an issue with AST grep(1) - it uses
|mmap()| but doesn't use |madvise(..., MADV_SEQUENTIAL, ...)| ... I
digged a little bit around in the code and noticed that while sfio has
|SF_SEQUENTIAL| there is no way to set it at |sfopen()| time...
... what would be the best place to fix it ? Putting it into
src/lib/libcmd/grep.c doesn't help other cases where huge regex data
are processed and there are cases when |mmap()| may not work (e.g.
filesystem doesn't support |mmap()| or chunk size is too small) but we
could still use |posix_fadvise(..., POSIX_FADV_SEQUENTIAL)| ... would
be a new |sfioadvise()| call be a good idea ?
- The the performance improvement measured via the "time"/"timex" may
be be minor for idle systems because |madvise(..., MADV_SEQUENTIAL,
...)| (and to a lesser degree |posix_fadvise(...,
POSIX_FADV_SEQUENTIAL)|) affects the time needed until an I/O page
gets re-used for something else. The trouble is that there are
multiple ways to get them re-used... and in some cases (like Solaris
= 11.1) it may even be a seperate "CPU strand" (on the same CPU
(sharing the same MMU)) which does the reusing (Solaris >= 11.1
offloaded some VM tasks to different strands to make applications
faster by parallising the VM work).
Or short: The performance improvement is for a complete system (e.g.
being able to process more data) but may have little effect for an
individual process run (except when the VM system is already under
pressure... then the performance benefit can be huge).
Erm... ping!
... can we please discuss |sfioadvise()| before ast-ksh enters the
beta phase ? IMO sfio should stop guessing about the access patterns
and accept hints via |sfioadvise()| ...
Cam we please get this issue fixed _before_ ksh93v- goes into the
"beta" phase ? At least both grep(1) and join(1) could benefit from a
way to explicit pick |mmap()| I/O support (join(1) does... but it
doesn't look "pretty") ...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Loading...