Discussion:
[ast-developers] AST md5sum(1)&&co. vs. GNU coreutils md5sum(1)&&co. fixes...
Roland Mainz
2013-09-24 22:39:18 UTC
Permalink
Hi!

----

Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.

There are three major differences which caused hiccups for 3rd-party scripts:
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"

* Notes:
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
shell pattern "*@(md5|sha@(1|224|256|384|512))sum". This is done to
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Glenn about this

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
diff -r -u original/src/lib/libcmd/cksum.c build_md5sumfix/src/lib/libcmd/cksum.c
--- src/lib/libcmd/cksum.c 2012-04-20 08:06:55.000000000 +0200
+++ src/lib/libcmd/cksum.c 2013-09-24 23:20:16.311386131 +0200
@@ -26,8 +26,10 @@
* sum -- list file checksum and size
*/

+#define MD5SUMLIKEPATTERN "*@(md5|sha@(1|224|256|384|512))sum"
+
static const char usage[] =
-"[-?\n@(#)$Id: sum (AT&T Research) 2012-04-20 $\n]"
+"[-?\n@(#)$Id: sum (AT&T Research) 2013-09-23 $\n]"
USAGE_LICENSE
"[+NAME?cksum,md5sum,sum - print file checksum and block count]"
"[+DESCRIPTION?\bsum\b lists the checksum, and for most methods the block"
@@ -49,7 +51,9 @@

"[a:all?List the checksum for all files. Use with \b--total\b to list both"
" individual and total checksums and block counts.]"
-"[b:binary?Read files in binary mode. This is the default.]"
+"[b:binary?Read files in binary mode. This is the default for all utilities "
+ "whose name does not match " MD5SUMLIKEPATTERN ". See "
+ "option \b--text\b.]"
"[B:scale?Block count scale (bytes per block) override for methods that"
" include size in the output. The default is method specific.]#[scale]"
"[c:check?Each \afile\a is interpreted as the output from a previous \bsum\b."
@@ -80,12 +84,14 @@
"[S:silent|status?No output for \b--check\b; 0 exit status means all sums"
" matched, non-0 means at least one sum failed to match. Ignored for"
" \b--permissions\b.]"
-"[t:total?List only the total checksum and block count of all files."
+"[T:total?List only the total checksum and block count of all files."
" \b--all\b \b--total\b lists each checksum and the total. The"
" total checksum and block count may be different from the checksum"
" and block count of the catenation of all files due to partial"
" blocks that may occur when the files are treated separately.]"
-"[T:text?Read files in text mode (i.e., treat \b\\r\\n\b as \b\\n\b).]"
+"[t:text?Read files in text mode (i.e., treat \b\\r\\n\b as \b\\n\b). "
+ "This is the default for all utilities whose name matches "
+ MD5SUMLIKEPATTERN ".]"
"[w!:warn?Warn about invalid \b--check\b lines.]"
"[x:method|algorithm?Specifies the checksum \amethod\a to"
" apply. Parenthesized method options are readonly implementation"
@@ -118,7 +124,7 @@

typedef struct State_s /* program state */
{
- int all; /* list all items */
+ bool all; /* list all items */
Sfio_t* check; /* check previous output */
int flags; /* sumprint() SUM_* flags */
gid_t gid; /* caller gid */
@@ -133,10 +139,11 @@
int silent; /* silent check, 0 exit if ok */
int (*sort)(FTSENT* const*, FTSENT* const*);
Sum_t* sum; /* sum method */
- int text; /* \r\n == \n */
- int total; /* list totals only */
+ bool text; /* \r\n == \n */
+ bool total; /* list totals only */
uid_t uid; /* caller uid */
int warn; /* invalid check line warnings */
+ bool md5sumlike; /* md5sum-like output */
} State_t;

static void verify(State_t*, char*, char*, Sfio_t*);
@@ -244,9 +251,18 @@
(st->st_gid != state->gid && ((st->st_mode & S_ISGID) || (st->st_mode & S_IRGRP) && !(st->st_mode & S_IROTH) || (st->st_mode & S_IXGRP) && !(st->st_mode & S_IXOTH))) ? fmtgid(st->st_gid) : "-");
}
if (ip != sfstdin)
- sfprintf(op, " %s", file);
+ {
+ if (state->md5sumlike)
+ sfprintf(op, " %c%s", (state->text?' ':'*'), file);
+ else
+ sfprintf(op, " %s", file);
+ }
+else
+ sfprintf(op, "#{stdin}");
sfputc(op, '\n');
}
+else
+ sfprintf(op, "#{noperm}\n");
}
}

@@ -447,7 +463,7 @@
Sfio_t* sp;
FTS* fts;
FTSENT* ent;
- int logical;
+ bool logical;
Optdisc_t optdisc;
State_t state;

@@ -456,7 +472,9 @@
flags = fts_flags() | FTS_META | FTS_TOP | FTS_NOPOSTORDER;
state.flags = SUM_SIZE;
state.warn = 1;
- logical = 1;
+ state.md5sumlike = strmatch(argv[0], MD5SUMLIKEPATTERN)?true:false;
+ state.text = (state.md5sumlike)?true:false;
+ logical = true;
method = 0;
optinit(&optdisc, optinfo);
for (;;)
@@ -464,10 +482,10 @@
switch (optget(argv, usage))
{
case 'a':
- state.all = 1;
+ state.all = true;
continue;
case 'b':
- state.text = 0;
+ state.text = false;
continue;
case 'B':
state.scale = opt_info.num;
@@ -494,7 +512,7 @@
flags &= ~FTS_TOP;
state.recursive = 1;
state.sort = order;
- logical = 0;
+ logical = false;
continue;
case 's':
method = "sys5";
@@ -503,7 +521,7 @@
state.silent = opt_info.num;
continue;
case 't':
- state.total = 1;
+ state.text = true;
continue;
case 'w':
state.warn = opt_info.num;
@@ -513,19 +531,19 @@
continue;
case 'H':
flags |= FTS_META|FTS_PHYSICAL;
- logical = 0;
+ logical = false;
continue;
case 'L':
flags &= ~(FTS_META|FTS_PHYSICAL);
- logical = 0;
+ logical = false;
continue;
case 'P':
flags &= ~FTS_META;
flags |= FTS_PHYSICAL;
- logical = 0;
+ logical = false;
continue;
case 'T':
- state.text = 1;
+ state.total = true;
continue;
case '?':
error(ERROR_USAGE|4, "%s", opt_info.arg);
diff -r -u original/src/lib/libcmd/md5sum.c build_md5sumfix/src/lib/libcmd/md5sum.c
--- src/lib/libcmd/md5sum.c 2012-01-10 19:54:41.000000000 +0100
+++ src/lib/libcmd/md5sum.c 2013-09-24 23:45:24.470027070 +0200
@@ -33,3 +33,27 @@
{
return b_cksum(argc, argv, context);
}
+
+int
+b_sha1sum(int argc, register char** argv, Shbltin_t* context)
+{
+ return b_cksum(argc, argv, context);
+}
+
+int
+b_sha256sum(int argc, register char** argv, Shbltin_t* context)
+{
+ return b_cksum(argc, argv, context);
+}
+
+int
+b_sha384sum(int argc, register char** argv, Shbltin_t* context)
+{
+ return b_cksum(argc, argv, context);
+}
+
+int
+b_sha512sum(int argc, register char** argv, Shbltin_t* context)
+{
+ return b_cksum(argc, argv, context);
+}
Glenn Fowler
2013-09-25 05:03:33 UTC
Permalink
I'm sorry but making --text the default on a windows systems simply does not make sense
it renders tgz md5sum verification useless
where do you see anywhere "the md5sum --binary value for foo.tgz is hexhhexhexhex"

my guess is that because of this weasling
Note: There is no difference between binary and text mode option on GNU system.
most gnu weaned users call md2sum with neither --text nor --binary

and this note lies anyway -- it *does* make a difference ' ' is printed for text,
'*' is printed for binary

and on cygwin guess what -- md5sum defaults to binary

if there's any change it will be for the md5sum-specific output to do the ' ' vs '*'
based on text vs binary so on all implementations '*' will be printed by default

how many scripts will break with that default?
--089e01536feea09e6e04e728ceb8
Content-Type: text/plain; charset=ISO-8859-1
Hi!
----
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Roland Mainz
2013-09-25 05:21:24 UTC
Permalink
Post by Glenn Fowler
--089e01536feea09e6e04e728ceb8
Content-Type: text/plain; charset=ISO-8859-1
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
I'm sorry but making --text the default on a windows systems simply does not make sense
Well... blame Cygwin and "Windows Services For Unix" for that crazy
idea. But I was looking at an older version of "md5sum" on Linux...
but it turns out the situation is a bit more complex:
-- snip --
157 void
158 usage (int status)
159 {
160 if (status != EXIT_SUCCESS)
161 emit_try_help ();
162 else
163 {
164 printf (_("\
165 Usage: %s [OPTION]... [FILE]...\n\
166 Print or check %s (%d-bit) checksums.\n\
167 With no FILE, or when FILE is -, read standard input.\n\
168 \n\
169 "),
170 program_name,
171 DIGEST_TYPE_STRING,
172 DIGEST_BITS);
173 if (O_BINARY)
174 fputs (_("\
175 -b, --binary read in binary mode (default unless
reading tty stdin)\n\
176 "), stdout);
177 else
178 fputs (_("\
179 -b, --binary read in binary mode\n\
180 "), stdout);
181 printf (_("\
182 -c, --check read %s sums from the FILEs and check them\n"),
183 DIGEST_TYPE_STRING);
184 fputs (_("\
185 --tag create a BSD-style checksum\n\
186 "), stdout);
187 if (O_BINARY)
188 fputs (_("\
189 -t, --text read in text mode (default if reading tty stdin)\n\
190 "), stdout);
191 else
192 fputs (_("\
193 -t, --text read in text mode (default)\n\
194 "), stdout);
195 fputs (_("\
196 \n\
197 The following three options are useful only when verifying checksums:\n\
198 --quiet don't print OK for each successfully
verified file\n\
199 --status don't output anything, status code shows success\n\
200 -w, --warn warn about improperly formatted checksum lines\n\
201 \n\
202 "), stdout);
203 fputs (_("\
204 --strict with --check, exit non-zero for any invalid input\n\
205 "), stdout);
206 fputs (HELP_OPTION_DESCRIPTION, stdout);
207 fputs (VERSION_OPTION_DESCRIPTION, stdout);
208 printf (_("\
209 \n\
210 The sums are computed as described in %s. When checking, the input\n\
211 should be a former output of this program. The default mode is to print\n\
212 a line with checksum, a character indicating input mode ('*' for binary,\n\
213 space for text), and name for each FILE.\n"),
214 DIGEST_REFERENCE);
215 emit_ancillary_info ();
216 }
217
218 exit (status);
219 }
-- snip --
So basically the per-platform defaults are governed via the
availability of |O_BINARY| at build time and whether you're reading
from a tty stdin.
Post by Glenn Fowler
it renders tgz md5sum verification useless
Yes and no. Yes, it's not a good idea... but what should we do for
compatibility on Windows (Cygwin&&SFU) ? On Unix/Linux the
--text/--binary options are no-ops but we need to be able to produce
compatible output (e.g. the " "/" *") and read it back (I forgot
about that part in my patch).
Post by Glenn Fowler
where do you see anywhere "the md5sum --binary value for foo.tgz is hexhhexhexhex"
my guess is that because of this weasling
Note: There is no difference between binary and text mode option on GNU system.
most gnu weaned users call md2sum with neither --text nor --binary
and this note lies anyway -- it *does* make a difference ' ' is printed for text,
'*' is printed for binary
and on cygwin guess what -- md5sum defaults to binary
Erm... see |usage()| function above... are you sure this is correct ?
Post by Glenn Fowler
if there's any change it will be for the md5sum-specific output to do the ' ' vs '*'
based on text vs binary so on all implementations '*' will be printed by default
AFAIK that's not neccesary - see |usage()| above... there are limits
to the insanity, governed by whether the platform has |O_BINARY| and
whether the input is a tty or not.

... and please only change the output for utilities which match
"*@(md5|sha@(1|224|256|384|512))sum" ... otherwise we end-up with a
lot of trouble for scripts which depend on specific output for
cksum(1) and sum(1) etc.
Post by Glenn Fowler
how many scripts will break with that default?
A lot of scripts which do md5sum and sha256sum verification choke on
the " "/" *" vs. " " difference... we have that issue at least since
2007 when someone from Sun reported the issue in the Sun bugster bug
database that libcmd "md5sum" can't replace GNU coreutils
"md5sum"&&co. until this issue has been fixed.

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Glenn Fowler
2013-09-25 12:35:46 UTC
Permalink
I slept on this and here is what I think

*they should have observed a unix-on/windows-user even just for a few minutes*
to see that --text default is wrong wrong wrong
*they should have made binary the default and '*' mark text mode exception case*
and then the minimal fraction of unix users on windows that generate \r\n
*and* don't want to sum the \r will have to explicitly demand --text

I slept on this and here is what I think should happen

(0) ast defaults to --binary for all methods and does O_TEXT only with
explicit --text

(1) anyone who has a file with ' ' or '*' as the first character
*and* calls ast md5sum will be sol

(2) petition gnu coreutils to accept
<checksum><one-ascii-space-char><name-not-starting-with-space-or-asterisk>
as being generated in --binary mode

(3) anyone who uses gnu md5sum to generate a checklist and uses something
other than ast or gnu md5sum --check will be sol

(4) change ast -t, --total => -T, --total and -T, --text => -t, --text
for gnu compatibility, and retain the ast --binary default *in all cases,
no isatty crap*

(5) change ast --header to include --text (but never --binary)

(6) change ast cksum --check to recognize either
<checksum><space><name>
<checksum><space><gnu-text-or-binary-indicator><name>
in _WINIX (uwin cygwin) make the distiction --text --binary
based on <gnu-text-or-binary-indicator>, otherwise ignore <gnu-text-or-binary-indicator>

if you notice, --method=md5 and --method=sha* are the only ones where ast prints *exactly*
<checksum><space><name>
so it will be able to faithfully distinguish the ast vs gnu case for --check

I will consider this concession:

(0)(1)(3)(4)(5)(6)

(7) ast methods that currently list
<checksum><space><name>
will change to
<checksum><space><gnu-text-or-binary-indicator><name>

this would result in the '*' almost always being printed
ast will then handle old-ast and new-ast (gnu) formats seamlessly

can the unix user who never touches dos handle seeing the '*' indicator in md5sum output?

there are 2 comments below

this is another example where patches don't just exist in a vacuum
the universe of unintended consequences has to be extended to include unix on dos and unix on ebcdic
Post by Roland Mainz
Post by Glenn Fowler
--089e01536feea09e6e04e728ceb8
Content-Type: text/plain; charset=ISO-8859-1
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
I'm sorry but making --text the default on a windows systems simply does not make sense
Well... blame Cygwin and "Windows Services For Unix" for that crazy
idea. But I was looking at an older version of "md5sum" on Linux...
-- snip --
157 void
158 usage (int status)
159 {
160 if (status != EXIT_SUCCESS)
161 emit_try_help ();
162 else
163 {
164 printf (_("\
165 Usage: %s [OPTION]... [FILE]...\n\
166 Print or check %s (%d-bit) checksums.\n\
167 With no FILE, or when FILE is -, read standard input.\n\
168 \n\
169 "),
170 program_name,
171 DIGEST_TYPE_STRING,
172 DIGEST_BITS);
173 if (O_BINARY)
174 fputs (_("\
175 -b, --binary read in binary mode (default unless
reading tty stdin)\n\
176 "), stdout);
177 else
178 fputs (_("\
179 -b, --binary read in binary mode\n\
180 "), stdout);
181 printf (_("\
182 -c, --check read %s sums from the FILEs and check them\n"),
183 DIGEST_TYPE_STRING);
184 fputs (_("\
185 --tag create a BSD-style checksum\n\
186 "), stdout);
187 if (O_BINARY)
188 fputs (_("\
189 -t, --text read in text mode (default if reading tty stdin)\n\
190 "), stdout);
191 else
192 fputs (_("\
193 -t, --text read in text mode (default)\n\
194 "), stdout);
195 fputs (_("\
196 \n\
197 The following three options are useful only when verifying checksums:\n\
198 --quiet don't print OK for each successfully
verified file\n\
199 --status don't output anything, status code shows success\n\
200 -w, --warn warn about improperly formatted checksum lines\n\
201 \n\
202 "), stdout);
203 fputs (_("\
204 --strict with --check, exit non-zero for any invalid input\n\
205 "), stdout);
206 fputs (HELP_OPTION_DESCRIPTION, stdout);
207 fputs (VERSION_OPTION_DESCRIPTION, stdout);
208 printf (_("\
209 \n\
210 The sums are computed as described in %s. When checking, the input\n\
211 should be a former output of this program. The default mode is to print\n\
212 a line with checksum, a character indicating input mode ('*' for binary,\n\
213 space for text), and name for each FILE.\n"),
214 DIGEST_REFERENCE);
215 emit_ancillary_info ();
216 }
217
218 exit (status);
219 }
-- snip --
So basically the per-platform defaults are governed via the
availability of |O_BINARY| at build time and whether you're reading
from a tty stdin.
Post by Glenn Fowler
it renders tgz md5sum verification useless
Yes and no. Yes, it's not a good idea... but what should we do for
compatibility on Windows (Cygwin&&SFU) ? On Unix/Linux the
--text/--binary options are no-ops but we need to be able to produce
compatible output (e.g. the " "/" *") and read it back (I forgot
about that part in my patch).
Post by Glenn Fowler
where do you see anywhere "the md5sum --binary value for foo.tgz is hexhhexhexhex"
my guess is that because of this weasling
Note: There is no difference between binary and text mode option on GNU system.
most gnu weaned users call md2sum with neither --text nor --binary
and this note lies anyway -- it *does* make a difference ' ' is printed for text,
'*' is printed for binary
and on cygwin guess what -- md5sum defaults to binary
Erm... see |usage()| function above... are you sure this is correct ?
as opposed to _UWIN, there is little gnu code untouched by _CYGWIN
my guess is there's a few of them in the code used to build on cygwin
Post by Roland Mainz
Post by Glenn Fowler
if there's any change it will be for the md5sum-specific output to do the ' ' vs '*'
based on text vs binary so on all implementations '*' will be printed by default
AFAIK that's not neccesary - see |usage()| above... there are limits
to the insanity, governed by whether the platform has |O_BINARY| and
whether the input is a tty or not.
... and please only change the output for utilities which match
lot of trouble for scripts which depend on specific output for
cksum(1) and sum(1) etc.
Post by Glenn Fowler
how many scripts will break with that default?
A lot of scripts which do md5sum and sha256sum verification choke on
the " "/" *" vs. " " difference... we have that issue at least since
2007 when someone from Sun reported the issue in the Sun bugster bug
database that libcmd "md5sum" can't replace GNU coreutils
"md5sum"&&co. until this issue has been fixed.
do those script use --check to verify the sum?
Glenn Fowler
2013-09-25 12:49:40 UTC
Permalink
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system

please don't if at all possible

it's much preferable to let the option and documentation stand
with a warning that it may not be available on all systems
along with a stub for the possibly missing feature in scope
e.g., using iffe-style guard macros

#if !_lib_syscall_foo
int syscall_foo(/* syscall_foo formals */) { errno = ENOSYS; return -1; }
#endif

this simplifies the code using the feature in scope by localizing
#ifdefs to the feature itself
and greatly simplifies documentation management
by allowing one translation per target language
which then allows one reference manual for all implementations
e.g.,
ast unix and mainframe and windows users can and do refer to the same documentation

the whole idea of ast is to smooth over such difference to provide
as much as possible the same api on all systems
of course this puts (a small) burden on the programmer/scripter/user
to check for error conditions on every library/system call
and sanely handle $? in scripts
Irek Szczesniak
2013-09-25 15:01:06 UTC
Permalink
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?

Irek
Glenn Fowler
2013-09-25 15:59:23 UTC
Permalink
Post by Irek Szczesniak
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?
md5sum triggered the comment
but there are / have been other patches in the queue that do similar this
libcmd::sync(1) and ksh93::cd(1) -@ come to mind
I wanted to nip it in the bud so it doesn't become a habit

now re md5sum

the gnu docs are fouled up because they don't match gnu-ish reality on cygwin
cygwin actually gets it right by making --binary the default
but that leads to
gnu-md5sum *.tgz > gnu-output
run on linux and cygwin, for the same set of files, producing different results
the md5sum (should) match but the cygwin output will have the '*' indicator

suppose, for the same set of *.tgz you do this on cygwin
gnu-md5sum *.tgz > cygwin-output
and this on linux
gnu-md5sum *.tgz > linux-output
on cygwin this will fail
gnu-md5sum --check linux-output
at least on linux this will work
gnu-md5sum --check cygwin-output

the *ast* --binary default is not platform specific because it is consistent
across unix and _WINIX and honors the principle of least surprise
so this
ast-md5sum *.tgz > ast-output
and this
ast-md5sum --check ast-output
will work no matter what platform either command is executed on

the next alpha, probably later today to fix the ASTAPI() build problem, will
have libcmd::cksum(1) and libsum fixed to generate and consume the ' ' and '*'
read mode indicators -- the only thing will be that on all systems by default
ast md5sum will print the '*' indicator -- lets see how many gnu-reliant scripts
blow up due to that -- this is one of those things that if you demand gnu
semantics on one side prepare to handle them on the other

the good thing about this approach is that ast-md5sum generated output
will be compatible with gnu-md5sum --check no matter what platform either
ast-md5sum or gnu-md5sum are run on
*and*
ast-md5sum will also handle gnu-md5sum output with the proviso that if the
intention on the gnu side was to sum binary files then they better well have
used --binary to do it or it will fail up on _WINIX -- but in reality no
gnu user will do that because the docs say it doesn't really matter -- in
this case there aren't enough ast patches to fix gnu
Irek Szczesniak
2013-09-25 16:07:17 UTC
Permalink
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?
md5sum triggered the comment
but there are / have been other patches in the queue that do similar this
I wanted to nip it in the bud so it doesn't become a habit
now re md5sum
the gnu docs are fouled up because they don't match gnu-ish reality on cygwin
cygwin actually gets it right by making --binary the default
but that leads to
gnu-md5sum *.tgz > gnu-output
run on linux and cygwin, for the same set of files, producing different results
the md5sum (should) match but the cygwin output will have the '*' indicator
suppose, for the same set of *.tgz you do this on cygwin
gnu-md5sum *.tgz > cygwin-output
and this on linux
gnu-md5sum *.tgz > linux-output
on cygwin this will fail
gnu-md5sum --check linux-output
at least on linux this will work
gnu-md5sum --check cygwin-output
the *ast* --binary default is not platform specific because it is consistent
across unix and _WINIX and honors the principle of least surprise
so this
ast-md5sum *.tgz > ast-output
and this
ast-md5sum --check ast-output
will work no matter what platform either command is executed on
the next alpha, probably later today to fix the ASTAPI() build problem, will
have libcmd::cksum(1) and libsum fixed to generate and consume the ' ' and '*'
read mode indicators -- the only thing will be that on all systems by default
ast md5sum will print the '*' indicator
Why do you want to do that?

On *LINUX* and Solaris 10 with /usr/gnu/bin/mdsum I get this:
/usr/bin/md5sum /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 /usr/bin/md5sum
/usr/bin/md5sum --binary /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 */usr/bin/md5sum

This happens because --text and --binary are identical on Linux and Solaris.
Post by Glenn Fowler
-- lets see how many gnu-reliant scripts
blow up due to that -- this is one of those things that if you demand gnu
semantics on one side prepare to handle them on the other
... nothing to say here. I am not sure whether it is a good idea to
serve us that mess GNU coreutils caused and force us to eat it, too.

There may be a 3rd option:
Use three states:
1. If --binary is given print " *"
2. If --text is given print " :
3. On UNIX/Linux default to " " if no option is given, and default to
--text on Windows

Irek
Irek Szczesniak
2013-09-25 16:15:51 UTC
Permalink
Post by Irek Szczesniak
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?
md5sum triggered the comment
but there are / have been other patches in the queue that do similar this
I wanted to nip it in the bud so it doesn't become a habit
now re md5sum
the gnu docs are fouled up because they don't match gnu-ish reality on cygwin
cygwin actually gets it right by making --binary the default
but that leads to
gnu-md5sum *.tgz > gnu-output
run on linux and cygwin, for the same set of files, producing different results
the md5sum (should) match but the cygwin output will have the '*' indicator
suppose, for the same set of *.tgz you do this on cygwin
gnu-md5sum *.tgz > cygwin-output
and this on linux
gnu-md5sum *.tgz > linux-output
on cygwin this will fail
gnu-md5sum --check linux-output
at least on linux this will work
gnu-md5sum --check cygwin-output
the *ast* --binary default is not platform specific because it is consistent
across unix and _WINIX and honors the principle of least surprise
so this
ast-md5sum *.tgz > ast-output
and this
ast-md5sum --check ast-output
will work no matter what platform either command is executed on
the next alpha, probably later today to fix the ASTAPI() build problem, will
have libcmd::cksum(1) and libsum fixed to generate and consume the ' ' and '*'
read mode indicators -- the only thing will be that on all systems by default
ast md5sum will print the '*' indicator
Why do you want to do that?
/usr/bin/md5sum /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 /usr/bin/md5sum
/usr/bin/md5sum --binary /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 */usr/bin/md5sum
This happens because --text and --binary are identical on Linux and Solaris.
Post by Glenn Fowler
-- lets see how many gnu-reliant scripts
blow up due to that -- this is one of those things that if you demand gnu
semantics on one side prepare to handle them on the other
... nothing to say here. I am not sure whether it is a good idea to
serve us that mess GNU coreutils caused and force us to eat it, too.
1. If --binary is given print " *"
3. On UNIX/Linux default to " " if no option is given, and default to
--text on Windows
A coworker just reminded me why md5sum defaults to --text: As
originally conceived md5 and md5sum were created to create hashs over
mime headers, which use '\n' on UNIX/Linux but '\n\r' on Windows/DOS,
MIME, HTTP etc. So its not Windows which mandates this behaviour, its
the original design. --binary is optional for parsing MIME or HTTP
headers.
This is consistent with other md5 hash utility implementations like md5(1).

Irek
Glenn Fowler
2013-09-25 16:33:03 UTC
Permalink
Post by Irek Szczesniak
Post by Irek Szczesniak
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?
md5sum triggered the comment
but there are / have been other patches in the queue that do similar this
I wanted to nip it in the bud so it doesn't become a habit
now re md5sum
the gnu docs are fouled up because they don't match gnu-ish reality on cygwin
cygwin actually gets it right by making --binary the default
but that leads to
gnu-md5sum *.tgz > gnu-output
run on linux and cygwin, for the same set of files, producing different results
the md5sum (should) match but the cygwin output will have the '*' indicator
suppose, for the same set of *.tgz you do this on cygwin
gnu-md5sum *.tgz > cygwin-output
and this on linux
gnu-md5sum *.tgz > linux-output
on cygwin this will fail
gnu-md5sum --check linux-output
at least on linux this will work
gnu-md5sum --check cygwin-output
the *ast* --binary default is not platform specific because it is consistent
across unix and _WINIX and honors the principle of least surprise
so this
ast-md5sum *.tgz > ast-output
and this
ast-md5sum --check ast-output
will work no matter what platform either command is executed on
the next alpha, probably later today to fix the ASTAPI() build problem, will
have libcmd::cksum(1) and libsum fixed to generate and consume the ' ' and '*'
read mode indicators -- the only thing will be that on all systems by default
ast md5sum will print the '*' indicator
Why do you want to do that?
/usr/bin/md5sum /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 /usr/bin/md5sum
/usr/bin/md5sum --binary /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 */usr/bin/md5sum
This happens because --text and --binary are identical on Linux and Solaris.
Post by Glenn Fowler
-- lets see how many gnu-reliant scripts
blow up due to that -- this is one of those things that if you demand gnu
semantics on one side prepare to handle them on the other
... nothing to say here. I am not sure whether it is a good idea to
serve us that mess GNU coreutils caused and force us to eat it, too.
1. If --binary is given print " *"
3. On UNIX/Linux default to " " if no option is given, and default to
--text on Windows
A coworker just reminded me why md5sum defaults to --text: As
originally conceived md5 and md5sum were created to create hashs over
mime headers, which use '\n' on UNIX/Linux but '\n\r' on Windows/DOS,
MIME, HTTP etc. So its not Windows which mandates this behaviour, its
the original design. --binary is optional for parsing MIME or HTTP
headers.
This is consistent with other md5 hash utility implementations like md5(1).
thanks thats good to know

but what possessed the gnu md5sum *file* utility developers to force mime header semantics on files
when all other unix sum *file* utilities do not make that distinction

is there an option to gnu-md5sum to just process a mime/HTPP header?
nope
so any code that wants to use the md5sum file utility to sum a mime header already
has to do special processing to extract the header from the rest of the stream
they might as well add the --text option at that point

AHA (it really was an aha just at this moment)
thanks to you and your coworker I realize why I'm being so insistent on this

in ast md5sum --text means \r\n => \n *on all platforms*
in ast this just works in unix and _WINIX

$ printf $'aha\r\n' | md5sum --text
c6409a1abd80c82df3c71bb00f2d1b64
$ printf $'aha\n' | md5sum --text
c6409a1abd80c82df3c71bb00f2d1b64
$ printf $'aha\n' | md5sum
c6409a1abd80c82df3c71bb00f2d1b64

now try to do that with gnu md5sum
or for a real world example
collect MIME HTTP etc. data with \r\n and try to process it as text on linux with gnu-md5sum

and here's *another* case where it would have been much much better to discuss a problem
on the list before posting a patch; in this case defaulting ast cksum to --text would
have been wrong for unix and _WINIX
Glenn Fowler
2013-09-25 16:16:07 UTC
Permalink
Post by Irek Szczesniak
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
Is this about md5sum(1)? AFAIK the *defaults* are platform-specific,
based on what Windows does, but not the availability of the options.
Or is this something different?
md5sum triggered the comment
but there are / have been other patches in the queue that do similar this
I wanted to nip it in the bud so it doesn't become a habit
now re md5sum
the gnu docs are fouled up because they don't match gnu-ish reality on cygwin
cygwin actually gets it right by making --binary the default
but that leads to
gnu-md5sum *.tgz > gnu-output
run on linux and cygwin, for the same set of files, producing different results
the md5sum (should) match but the cygwin output will have the '*' indicator
suppose, for the same set of *.tgz you do this on cygwin
gnu-md5sum *.tgz > cygwin-output
and this on linux
gnu-md5sum *.tgz > linux-output
on cygwin this will fail
gnu-md5sum --check linux-output
at least on linux this will work
gnu-md5sum --check cygwin-output
the *ast* --binary default is not platform specific because it is consistent
across unix and _WINIX and honors the principle of least surprise
so this
ast-md5sum *.tgz > ast-output
and this
ast-md5sum --check ast-output
will work no matter what platform either command is executed on
the next alpha, probably later today to fix the ASTAPI() build problem, will
have libcmd::cksum(1) and libsum fixed to generate and consume the ' ' and '*'
read mode indicators -- the only thing will be that on all systems by default
ast md5sum will print the '*' indicator
Why do you want to do that?
I just explained why -- consistency between unix and _WINIX

cygwin already learned this but apparently just patched for _CYGWIN
instead of pushing the correct behavior upstream
Post by Irek Szczesniak
/usr/bin/md5sum /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 /usr/bin/md5sum
/usr/bin/md5sum --binary /usr/bin/md5sum
85e85dcf910f4c5d1dd1729b2c81e584 */usr/bin/md5sum
This happens because --text and --binary are identical on Linux and Solaris.
aha you drank the gnu man page koolaid
they are not identical : --binary outputs the * indicator

I bet a large %-age of gnu linxu/solaris scripts relying on this bogus behaviour
would fail running on _WINIX -- you may not care but folks who deal with
moving data between unix <=> _WINIX do
Irek Szczesniak
2013-09-25 16:31:43 UTC
Permalink
Post by Roland Mainz
Hi!
----
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Glenn about this
Roland and Glenn: Roland added entry points for sha1sum(1),
sha2sum(1), sha256sum(1), sha384sum(1) and sha512sum(1) but they do
not appear in /opt/ast/bin. Could you check why this happens, please?

Irek
Glenn Fowler
2013-09-25 16:40:34 UTC
Permalink
Post by Irek Szczesniak
Post by Roland Mainz
Hi!
----
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Glenn about this
Roland and Glenn: Roland added entry points for sha1sum(1),
sha2sum(1), sha256sum(1), sha384sum(1) and sha512sum(1) but they do
not appear in /opt/ast/bin. Could you check why this happens, please?
do you have $INSTALLROOT/bin/nmake in your build tree?
Irek Szczesniak
2013-09-25 17:26:27 UTC
Permalink
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Roland Mainz
Hi!
----
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Glenn about this
Roland and Glenn: Roland added entry points for sha1sum(1),
sha2sum(1), sha256sum(1), sha384sum(1) and sha512sum(1) but they do
not appear in /opt/ast/bin. Could you check why this happens, please?
do you have $INSTALLROOT/bin/nmake in your build tree?
No, I only imported ast-ksh into my git (laziness). However I think
the root cause is that the script which creates the entries for
/opt/ast assumes one builtin == one source file, right?

Irek
Glenn Fowler
2013-09-25 17:46:21 UTC
Permalink
Post by Irek Szczesniak
Post by Glenn Fowler
Post by Irek Szczesniak
Post by Roland Mainz
Hi!
----
Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
which fixes an incompatibility between AST md5sum(1)&&co. and GNU
coreutils md5sum(1)&&co. fixes.
- GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
- GNU coreutils use a " *" before the file name to indicate binary
mode and " " to indicate text mode... the AST hash utilities used
only a single blank " " instead.
- "-t" means "text mode" for GNU coreutils while AST used this for "total"
- GNU and AST *sum(1) utilities now have identical output and seem to
be 100% compatible with each other
- On platforms which do not implement |O_BINARY| and |O_TEXT| the
change only affects the seperator (" "/" *"(=new) vs. " "(=old)).
Portable applications can use [[:space:]]+ in egrep(1) to make sure
they can match the hashes against both the old and new versions of AST
*sum(1)
- The output *intentionally* changes only for utilities matching the
maintain compatibility for cksum(1) and sum(1)
- AST does not have a sha224sum(1) utility (yet) ... need to talk to
Glenn about this
Roland and Glenn: Roland added entry points for sha1sum(1),
sha2sum(1), sha256sum(1), sha384sum(1) and sha512sum(1) but they do
not appear in /opt/ast/bin. Could you check why this happens, please?
do you have $INSTALLROOT/bin/nmake in your build tree?
No, I only imported ast-ksh into my git (laziness). However I think
the root cause is that the script which creates the entries for
/opt/ast assumes one builtin == one source file, right?
no, that may have been the case a while ago, but now all libcmd source
is grep'd for b_*() prototypes and that is used to generate cmdlist.h
which is a file that with
CMDLIST(foo)
for each b_foo() defined in the libcmd source -- as source files are
added or modified in src/lib/libcmd cmdlist.h will be updated
users of <cmdlist.h> must first define the CMDLIST(name) macro
this is how ksh93 sets up its builtin table
the nmake Makefile glues this together

in src/cmd/builtin <cmdlist.h> is itself sed'd to generate the list of
builtins that must be installed in $INSTALLROOT/bin
again the nmake Makefile glues this together

it was a bit complicated to set up
but now it just does the right thing
as b_*() symbols are added (or deleted) <cmdlist.h> is updated
and ripples through the rest of the build

ast-base is the smallest package with nmake
install that and "bin/package make" will bottstrap build it
and then use it for the rest of the build
you will also need nmake to build any ast plugins (runtime lodable shared libs)
pax sort and ksh use these

also, using the mail I sent tina, you can cd into any subdir in $INSTALLROOT/src
and run "nmake install" and it will update everything in that dir and below
bin/package make does that from $INSTALLROOT/src
nmake checks the combination of makefiles to determine the build order
add new dirs with Makefiles under $PACKAGROOT/src and the $INSTALLROOT/src
build will pick them up and build them in the proper order

David Korn
2013-09-25 17:22:34 UTC
Permalink
Subject: Re: [ast-developers] general comment on ast utility options vs #ifdef
--------
Post by Glenn Fowler
this is just a general comment on those contemplating adding
#ifdef to select option visibility based on the underlying system
please don't if at all possible
There is another advantage to doing this. I takes away the option for use
in an incompatible way.

David Korn
dgk at research.att.com
Loading...