Discussion:
[ast-developers] Sparse file support in sfio ...
Roland Mainz
2013-10-08 23:50:46 UTC
Permalink
On Thu, Oct 3, 2013 at 11:17 AM, Roland Mainz <roland.mainz at nrubsig.org> wrote:
[CC:'ing ast-developers again to get feedback from the people there...]
On Tue, Oct 1, 2013 at 5:52 PM, Roland Mainz <roland.mainz at nrubsig.org>
[snip]
I really think that hole processing should be in sfio.
Agreed.
Maybe it needs an
sfopen(), and sfset() option to enable so that it doesn't affect other
commands, but the implementation should be in one spot and sfio is the
logical choice for the spot.
Erm... I'll add a new function to handle hole/data copying called
|sfcopydata()| for now. Putting it into |sfmove()| turned out to be
too risky and IMHO we need a different API.
ping! ... any feedback on this part ? I further crawled over all
|sfmove()|&&co. consumers... and based on that survey I can say that
no other consumer than cp(1) will benefit from |SEEK_HOLE|/|SEEK_DATA|
to copy data for now... therefore I just add |sfcopydata()| as
function which enumerates the file hole/data layout and then uses
|sfmove()| (with |SF_WHOLE| set) to copy data and |sfseek()| to skip
over the holes...

... is that OK so far ?
The other thing I'm trying to figure out whether we can somehow move
holes via pipes using |putmsg()| (the difference between |write()| and
|putmsg()| is that |putmsg()| can preserve the boundaries of a data
block) and maybe a special flag which defines whether data are a hole
or not).
Erm... any comments on that idea ?

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Irek Szczesniak
2013-10-09 10:21:34 UTC
Permalink
Post by Roland Mainz
[CC:'ing ast-developers again to get feedback from the people there...]
On Tue, Oct 1, 2013 at 5:52 PM, Roland Mainz <roland.mainz at nrubsig.org>
[snip]
I really think that hole processing should be in sfio.
Agreed.
Maybe it needs an
sfopen(), and sfset() option to enable so that it doesn't affect other
commands, but the implementation should be in one spot and sfio is the
logical choice for the spot.
Erm... I'll add a new function to handle hole/data copying called
|sfcopydata()| for now. Putting it into |sfmove()| turned out to be
too risky and IMHO we need a different API.
ping! ... any feedback on this part ? I further crawled over all
|sfmove()|&&co. consumers... and based on that survey I can say that
no other consumer than cp(1) will benefit from |SEEK_HOLE|/|SEEK_DATA|
to copy data for now... therefore I just add |sfcopydata()| as
function which enumerates the file hole/data layout and then uses
|sfmove()| (with |SF_WHOLE| set) to copy data and |sfseek()| to skip
over the holes...
... is that OK so far ?
The other thing I'm trying to figure out whether we can somehow move
holes via pipes using |putmsg()| (the difference between |write()| and
|putmsg()| is that |putmsg()| can preserve the boundaries of a data
block) and maybe a special flag which defines whether data are a hole
or not).
Erm... any comments on that idea ?
Yes, I have a few comments:
1. If the platform has SEEK_HOLE cp(1) and cohorts must NEVER "invent"
holes out of zero bytes. That causes database corruptions for Oracle's
DB and a lot of the bioinformatics products from NIH. They all provide
their custom scripts (usually C or perl) to copy data or convert them
to a more portable (hole-less) format.
2. cp(1) and cohorts should include an option in --help to detect
whether it supports SEEK_HOLE
3. lsholes name. I saw the proposal with lssparsemap and I think this
is the best naming proposal so far.

Irek

Loading...