Bob Krzaczek
2014-04-18 23:38:08 UTC
Hi,
I apologize in advance for the length of the message.
I recently ran into a problem where pax deltas appeared corrupted.
This was a bit of a chore to diagnose at first, because it seemed like
it was only happening on larger archives over 100 GB, where file data
tended to range between 100 MB and 1 GB. Smaller archives do not
manifest the problem. I'm uncertain if the bug is actually in the
writing or the reading of the delta archive, but it seems like it's in
the writing (or perhaps both). This bug also appears in pax deltas
created from the FreeBSD 6 binaries formerly available at the AT&T
release site, and not just with the recently built-from-sources.
In addition to simply being unable to process the entire delta
archive, another symptom is that lines like "delete 0" are printed
over and over when pax is run -v (one per file deleted, it seems). I
think this is pointing to a problem in the creation of the delta
archive, and not necessarily in its reading, because I've found that I
get the same results from older tar. Likewise, when a delta archive is
successful, I can process it with tar as well (of course, I can't *do*
anything with the contents because they're vdelta, but still, the
archive itself appears intact). When the "delete 0" issue hits, both
pax and tar fail to unpack the file. Once the glitch hits, the rest of
the archive cannot be unpacked with or without the --base option. I
also believe, but can't confirm, that I've seen "create 0" lines as
well from pax when the bug hits.
: rtfm; pax -rv --base=../foo.base <../foo.delta
...
delete 0
delete 0
delete 0
delete 0
...
The first thing I did was disable compression, but that made no
difference. Straight pax-format files manifested the problem.
Creating large archives, well over 100 GB in size, of nothing but
random data didn't manifest the problem at first, either. Random
binary data, in files randomly sized up to 1GB, spread in trees with
ten files to a directory could not reproduce the problem. This still
has me worried, because it implies the contents of the data triggers
the bug.
I've finally been able to recreate it with about 10-20 GB of data.
What's very interesting is that the bug only hits when the changes
since the base archive include both deletions as well as new files.
Simply adding new files didn't trigger the bug at first.
And, the final oddity: exactly one set of data produced the following
error. The share3.base.000 file was just created minutes earlier, so
it should match. I only saw this condition hit once, no other data
triggered it as I was trying to recreate this bug.
: rtfm; pax -rv --base=../share3.base.000 <../share3.delta.001
/dev/stdin base ../share3.base.000 in pax format
/dev/stdin in delta pax format
...[2 directories and 12 new files created]...
pax: 0: base archive mismatch [/usr/local/ast.working/src/cmd/pax/copy.c#278]
What confuses me is that pax should have read the base archive during
its verification step, before it even touches the delta archive. It
got past that, and didn't report the mismatch message until after it
had processed all of the new files in the *delta* archive, and had
just seen its first deleted file in the delta.
So, the criteria _seems_ to be
- Large archives (>10 GB)
- Specific data? (100+ GB of /dev/urandom won't trigger it)
- Both creations and deletions in the delta archive
Next week, I'll try to whittle this down further, from the 10-20 GB
range to something more manageable that still generates the problem.
But in the meantime, I wanted to report this before too much more time
passes.
Certainly, it could be that this issue is a symptom of the platform in
some way (though not this specific build) and that there's more work
to be done in the FreeBSD port of libast.
Also, just to be clear: I have never had a problem in traditional pax
archives; I only see this in ones where deltas are being read or
written.
Best regards,
Bob
--
Bob Krzaczek, Chester F. Carlson Center for Imaging Science, RIT
phone +1-585-4757196, email krz at cis.rit.edu, icbm 43.08586N 77.67744W
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 663 bytes
Desc: not available
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20140418/50c7ad87/attachment.sig>
I apologize in advance for the length of the message.
I recently ran into a problem where pax deltas appeared corrupted.
This was a bit of a chore to diagnose at first, because it seemed like
it was only happening on larger archives over 100 GB, where file data
tended to range between 100 MB and 1 GB. Smaller archives do not
manifest the problem. I'm uncertain if the bug is actually in the
writing or the reading of the delta archive, but it seems like it's in
the writing (or perhaps both). This bug also appears in pax deltas
created from the FreeBSD 6 binaries formerly available at the AT&T
release site, and not just with the recently built-from-sources.
In addition to simply being unable to process the entire delta
archive, another symptom is that lines like "delete 0" are printed
over and over when pax is run -v (one per file deleted, it seems). I
think this is pointing to a problem in the creation of the delta
archive, and not necessarily in its reading, because I've found that I
get the same results from older tar. Likewise, when a delta archive is
successful, I can process it with tar as well (of course, I can't *do*
anything with the contents because they're vdelta, but still, the
archive itself appears intact). When the "delete 0" issue hits, both
pax and tar fail to unpack the file. Once the glitch hits, the rest of
the archive cannot be unpacked with or without the --base option. I
also believe, but can't confirm, that I've seen "create 0" lines as
well from pax when the bug hits.
: rtfm; pax -rv --base=../foo.base <../foo.delta
...
delete 0
delete 0
delete 0
delete 0
...
The first thing I did was disable compression, but that made no
difference. Straight pax-format files manifested the problem.
Creating large archives, well over 100 GB in size, of nothing but
random data didn't manifest the problem at first, either. Random
binary data, in files randomly sized up to 1GB, spread in trees with
ten files to a directory could not reproduce the problem. This still
has me worried, because it implies the contents of the data triggers
the bug.
I've finally been able to recreate it with about 10-20 GB of data.
What's very interesting is that the bug only hits when the changes
since the base archive include both deletions as well as new files.
Simply adding new files didn't trigger the bug at first.
And, the final oddity: exactly one set of data produced the following
error. The share3.base.000 file was just created minutes earlier, so
it should match. I only saw this condition hit once, no other data
triggered it as I was trying to recreate this bug.
: rtfm; pax -rv --base=../share3.base.000 <../share3.delta.001
/dev/stdin base ../share3.base.000 in pax format
/dev/stdin in delta pax format
...[2 directories and 12 new files created]...
pax: 0: base archive mismatch [/usr/local/ast.working/src/cmd/pax/copy.c#278]
What confuses me is that pax should have read the base archive during
its verification step, before it even touches the delta archive. It
got past that, and didn't report the mismatch message until after it
had processed all of the new files in the *delta* archive, and had
just seen its first deleted file in the delta.
So, the criteria _seems_ to be
- Large archives (>10 GB)
- Specific data? (100+ GB of /dev/urandom won't trigger it)
- Both creations and deletions in the delta archive
Next week, I'll try to whittle this down further, from the 10-20 GB
range to something more manageable that still generates the problem.
But in the meantime, I wanted to report this before too much more time
passes.
Certainly, it could be that this issue is a symptom of the platform in
some way (though not this specific build) and that there's more work
to be done in the FreeBSD port of libast.
Also, just to be clear: I have never had a problem in traditional pax
archives; I only see this in ones where deltas are being read or
written.
Best regards,
Bob
--
Bob Krzaczek, Chester F. Carlson Center for Imaging Science, RIT
phone +1-585-4757196, email krz at cis.rit.edu, icbm 43.08586N 77.67744W
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 663 bytes
Desc: not available
URL: <http://lists.research.att.com/pipermail/ast-developers/attachments/20140418/50c7ad87/attachment.sig>