Discussion:
[ast-developers] [urgent] realpath(1)+readlink(1) in libcmd?
Cedric Blancher
2013-09-14 18:14:23 UTC
Permalink
`readlink -f' is the most common invocation I see in scripts. I'd recommend making compatibility with at least coreutils' readlink a goal. See path_resolution(7) from the Linux man-pages for details on that part.
Agreed. But as Roland Mainz wrote "the work for the { -e, -f, -m
}-options has a dependency on |fgetcwd()|.". So we have to wait with
-f until fgetcwd() appears in libast, and until then take readlink(1)
as is. Good enough for me, and good enough for users coming from
FreeBSD or busybox.
Glenn, is there *now* hope to get readlink(1) integrated into libcmd?
I know ls -l could work too but there are existing consumers (mostly
FreeBSD and busybox) relying on readlink(1) and realpath(1) who could
benefit from this work.
yes
I extended the src/lib/libast/path/pathcanon.c flags to handle resolvepath(2)
and readlink(1) options -- do you have a url for the realpath(1) man page?
I'm guessing readlink(1) would be sufficient
the ast readlink(1) will probably add a few more options to cover the underlying
ast pathcanon() and pathdev() PATH_* flags -- then it could be used as a
regression test harness
just looked at the posix realpath(2) api -- can't believe it doesn't have
a buffer size arg, and the linux vs bsd vs solaris behavior is all over the place
soem preserve relative path, some always return absolute path
Glenn, Roland: Could the next ast-ksh update *PLEASE* focus on libcmd
updates? The availability of readlink(1) and realpath(1) has become
*VERY* high priority for us, even more important than bugfixes for
signal handling.

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Lionel Cons
2013-09-14 22:10:58 UTC
Permalink
Post by Cedric Blancher
`readlink -f' is the most common invocation I see in scripts. I'd recommend making compatibility with at least coreutils' readlink a goal. See path_resolution(7) from the Linux man-pages for details on that part.
Agreed. But as Roland Mainz wrote "the work for the { -e, -f, -m
}-options has a dependency on |fgetcwd()|.". So we have to wait with
-f until fgetcwd() appears in libast, and until then take readlink(1)
as is. Good enough for me, and good enough for users coming from
FreeBSD or busybox.
Glenn, is there *now* hope to get readlink(1) integrated into libcmd?
I know ls -l could work too but there are existing consumers (mostly
FreeBSD and busybox) relying on readlink(1) and realpath(1) who could
benefit from this work.
yes
I extended the src/lib/libast/path/pathcanon.c flags to handle resolvepath(2)
and readlink(1) options -- do you have a url for the realpath(1) man page?
I'm guessing readlink(1) would be sufficient
the ast readlink(1) will probably add a few more options to cover the underlying
ast pathcanon() and pathdev() PATH_* flags -- then it could be used as a
regression test harness
just looked at the posix realpath(2) api -- can't believe it doesn't have
a buffer size arg, and the linux vs bsd vs solaris behavior is all over the place
soem preserve relative path, some always return absolute path
Glenn, Roland: Could the next ast-ksh update *PLEASE* focus on libcmd
updates? The availability of readlink(1) and realpath(1) has become
*VERY* high priority for us, even more important than bugfixes for
signal handling.
Hey, it's weekend. Give Glenn and Roland some rest :)
I've looked at Roland's patch (our scripts are heavy users of
readlink(1) and resolvepath(1) in some places and there's lots of 3rd
party code which uses both commands, too) and its pretty
straightforward for readlink(1). My trouble is with resolvepath(1) -
Glenn may want to use his pathdev() function and I have no idea how to
do that and make it work with shp->pwdfd without having a pathdevat()
or resolvepathat() function.

Lionel

P.S. pathdev() has no man page
Glenn Fowler
2013-09-15 04:18:20 UTC
Permalink
Post by Lionel Cons
Post by Cedric Blancher
`readlink -f' is the most common invocation I see in scripts. I'd recommend making compatibility with at least coreutils' readlink a goal. See path_resolution(7) from the Linux man-pages for details on that part.
Agreed. But as Roland Mainz wrote "the work for the { -e, -f, -m
}-options has a dependency on |fgetcwd()|.". So we have to wait with
-f until fgetcwd() appears in libast, and until then take readlink(1)
as is. Good enough for me, and good enough for users coming from
FreeBSD or busybox.
Glenn, is there *now* hope to get readlink(1) integrated into libcmd?
I know ls -l could work too but there are existing consumers (mostly
FreeBSD and busybox) relying on readlink(1) and realpath(1) who could
benefit from this work.
yes
I extended the src/lib/libast/path/pathcanon.c flags to handle resolvepath(2)
and readlink(1) options -- do you have a url for the realpath(1) man page?
I'm guessing readlink(1) would be sufficient
the ast readlink(1) will probably add a few more options to cover the underlying
ast pathcanon() and pathdev() PATH_* flags -- then it could be used as a
regression test harness
just looked at the posix realpath(2) api -- can't believe it doesn't have
a buffer size arg, and the linux vs bsd vs solaris behavior is all over the place
soem preserve relative path, some always return absolute path
Glenn, Roland: Could the next ast-ksh update *PLEASE* focus on libcmd
updates? The availability of readlink(1) and realpath(1) has become
*VERY* high priority for us, even more important than bugfixes for
signal handling.
Hey, it's weekend. Give Glenn and Roland some rest :)
I've looked at Roland's patch (our scripts are heavy users of
readlink(1) and resolvepath(1) in some places and there's lots of 3rd
party code which uses both commands, too) and its pretty
straightforward for readlink(1). My trouble is with resolvepath(1) -
Glenn may want to use his pathdev() function and I have no idea how to
do that and make it work with shp->pwdfd without having a pathdevat()
or resolvepathat() function.
Lionel
P.S. pathdev() has no man page
pathdev() was rounded out with resolvepath(1) in mind
and the first arg to pathdev() is a *at()-style directory fd
Lionel Cons
2013-09-15 07:09:55 UTC
Permalink
Post by Glenn Fowler
Post by Lionel Cons
Post by Cedric Blancher
`readlink -f' is the most common invocation I see in scripts. I'd recommend making compatibility with at least coreutils' readlink a goal. See path_resolution(7) from the Linux man-pages for details on that part.
Agreed. But as Roland Mainz wrote "the work for the { -e, -f, -m
}-options has a dependency on |fgetcwd()|.". So we have to wait with
-f until fgetcwd() appears in libast, and until then take readlink(1)
as is. Good enough for me, and good enough for users coming from
FreeBSD or busybox.
Glenn, is there *now* hope to get readlink(1) integrated into libcmd?
I know ls -l could work too but there are existing consumers (mostly
FreeBSD and busybox) relying on readlink(1) and realpath(1) who could
benefit from this work.
yes
I extended the src/lib/libast/path/pathcanon.c flags to handle resolvepath(2)
and readlink(1) options -- do you have a url for the realpath(1) man page?
I'm guessing readlink(1) would be sufficient
the ast readlink(1) will probably add a few more options to cover the underlying
ast pathcanon() and pathdev() PATH_* flags -- then it could be used as a
regression test harness
just looked at the posix realpath(2) api -- can't believe it doesn't have
a buffer size arg, and the linux vs bsd vs solaris behavior is all over the place
soem preserve relative path, some always return absolute path
Glenn, Roland: Could the next ast-ksh update *PLEASE* focus on libcmd
updates? The availability of readlink(1) and realpath(1) has become
*VERY* high priority for us, even more important than bugfixes for
signal handling.
Hey, it's weekend. Give Glenn and Roland some rest :)
I've looked at Roland's patch (our scripts are heavy users of
readlink(1) and resolvepath(1) in some places and there's lots of 3rd
party code which uses both commands, too) and its pretty
straightforward for readlink(1). My trouble is with resolvepath(1) -
Glenn may want to use his pathdev() function and I have no idea how to
do that and make it work with shp->pwdfd without having a pathdevat()
or resolvepathat() function.
Lionel
P.S. pathdev() has no man page
pathdev() was rounded out with resolvepath(1) in mind
and the first arg to pathdev() is a *at()-style directory fd
Thanks.

Any suggestion how we can implement all the following (crazy?) extra
options in GNU readlink(1):

readlink [OPTION]... FILE

DESCRIPTION
Display value of a symbolic link on standard output.

-f, --canonicalize
canonicalize by following every symlink in every
component of the given name recursively; all but the last component
must exist

-e, --canonicalize-existing
canonicalize by following every symlink in every
component of the given name recursively, all components must exist

-m, --canonicalize-missing
canonicalize by following every symlink in every
component of the given name recursively, without requirements on
components existence


Lionel
Glenn Fowler
2013-09-19 16:08:16 UTC
Permalink
can someone post urls for the man pages for
readlink(1)
realpath(1)
resolvepath(1)
Roland Mainz
2013-09-19 16:53:24 UTC
Permalink
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
1. The relevant references:
The all-in-one manpage for busybox can be found here:
http://busybox.net/downloads/BusyBox.html

FreeBSD readlink(1):
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
FreeBSD realpath(1):
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html

2. Only for reference - GNU coreutils:
http://www.gnu.org/software/coreutils/manual/html_node/readlink-invocation.html
http://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
Post by Glenn Fowler
resolvepath(1)
AFAIK that's ENOSUCHMANPAGE... out of confusion between
|resolvepath(2)| and the |*(1)| consumers of that API...
... Lionel... can you verify this, please ?

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
Lionel Cons
2013-09-21 06:51:38 UTC
Permalink
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.gnu.org/software/coreutils/manual/html_node/readlink-invocation.html
http://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
Post by Glenn Fowler
resolvepath(1)
AFAIK that's ENOSUCHMANPAGE... out of confusion between
|resolvepath(2)| and the |*(1)| consumers of that API...
... Lionel... can you verify this, please ?
Yes, that's correct. resolvepath(1) turned out to be a crazy script of
our own making which gets deployed to deal with the lack of a path
resolving facility - a standardisation on ksh93/busybox readlink(1)
and realpath(1) would eliminate the need.

Please don't forget option --fd $dirfd for readlink(1) and realpath(1)
so we can have a virtual root (for chroot environments without doing
an actual (expensive) chroot() call each time we want to get the paths
right).

Lionel
Glenn Fowler
2013-09-21 13:33:57 UTC
Permalink
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
both man pages describe the operands as
[ path ... ]
[ file ... ]
but only the realpath says what happens if file is omitted (. is used)
neither say what happens if more than one path/file operand is specified
readlink annoyingly inverts -q/-v -- diagnostics are off by default -- *what other posix utility does that*

so what does readlink do in silent mode when it is invoked with
readlink is-a-symlink is-not-a-symlink is-another-symlink

from the refs roland supplied the gnu readlink and realpath are very close modulo defaults
mainly readlink with no --canonicalize* options is in "readlink" mode, otherwise "realpath" mode
and the annoying one: readlink by default suppresses diagnostics

so I need to know what happens on bsd and gnu for readlink and realpath for various combinations of
0,1,2,3 path/file operands and within that various combinations of
is-a-symlink is-not-a-symlink is-an-existing-path is-not-an-existing-path
in particular when there are multiple operands and an error occurs does the output have a blank line to mark the errors?

my intention is to provide one implementation of realpath with a union of the
ast --dirfd=fd + gnu readlink/realpath + bsd readlink/realpath options
and an additional --readlink option that puts it into "readlink" mode (operand is a symlink, diagnostics suppressed),
and a note that argv[0]=="*readlink" => --readlink
this will given us one implementation and one document to manage
Irek Szczesniak
2013-09-21 19:05:54 UTC
Permalink
Post by Glenn Fowler
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
both man pages describe the operands as
[ path ... ]
[ file ... ]
but only the realpath says what happens if file is omitted (. is used)
neither say what happens if more than one path/file operand is specified
readlink annoyingly inverts -q/-v -- diagnostics are off by default -- *what other posix utility does that*
so what does readlink do in silent mode when it is invoked with
readlink is-a-symlink is-not-a-symlink is-another-symlink
from the refs roland supplied the gnu readlink and realpath are very close modulo defaults
mainly readlink with no --canonicalize* options is in "readlink" mode, otherwise "realpath" mode
and the annoying one: readlink by default suppresses diagnostics
so I need to know what happens on bsd and gnu for readlink and realpath for various combinations of
0,1,2,3 path/file operands and within that various combinations of
is-a-symlink is-not-a-symlink is-an-existing-path is-not-an-existing-path
in particular when there are multiple operands and an error occurs does the output have a blank line to mark the errors?
my intention is to provide one implementation of realpath with a union of the
ast --dirfd=fd + gnu readlink/realpath + bsd readlink/realpath options
and an additional --readlink option that puts it into "readlink" mode (operand is a symlink, diagnostics suppressed),
and a note that argv[0]=="*readlink" => --readlink
this will given us one implementation and one document to manage
Oh for gods sake. Please don't do that. Please keep the readlink(1)
and realpath(1) utilities separate entities, to keep the code simple,
maintainable and instructive. One tool per job, and not a Swiss army
knife catch-all, sink-all and defeat-communism in one go. Think about
later generations of students being forced to study your work and be
happy with it[1] :)

[1] There have been rumors of compulsory hospitalization after people
spend too much time digging for bugs in the GNU coreutils source, so
please ;)

Irek
Glenn Fowler
2013-09-22 03:39:05 UTC
Permalink
Post by Irek Szczesniak
Post by Glenn Fowler
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
both man pages describe the operands as
[ path ... ]
[ file ... ]
but only the realpath says what happens if file is omitted (. is used)
neither say what happens if more than one path/file operand is specified
readlink annoyingly inverts -q/-v -- diagnostics are off by default -- *what other posix utility does that*
so what does readlink do in silent mode when it is invoked with
readlink is-a-symlink is-not-a-symlink is-another-symlink
from the refs roland supplied the gnu readlink and realpath are very close modulo defaults
mainly readlink with no --canonicalize* options is in "readlink" mode, otherwise "realpath" mode
and the annoying one: readlink by default suppresses diagnostics
so I need to know what happens on bsd and gnu for readlink and realpath for various combinations of
0,1,2,3 path/file operands and within that various combinations of
is-a-symlink is-not-a-symlink is-an-existing-path is-not-an-existing-path
in particular when there are multiple operands and an error occurs does the output have a blank line to mark the errors?
my intention is to provide one implementation of realpath with a union of the
ast --dirfd=fd + gnu readlink/realpath + bsd readlink/realpath options
and an additional --readlink option that puts it into "readlink" mode (operand is a symlink, diagnostics suppressed),
and a note that argv[0]=="*readlink" => --readlink
this will given us one implementation and one document to manage
Oh for gods sake. Please don't do that. Please keep the readlink(1)
and realpath(1) utilities separate entities, to keep the code simple,
maintainable and instructive. One tool per job, and not a Swiss army
knife catch-all, sink-all and defeat-communism in one go. Think about
later generations of students being forced to study your work and be
happy with it[1] :)
[1] There have been rumors of compulsory hospitalization after people
spend too much time digging for bugs in the GNU coreutils source, so
please ;)
I thought even the casual reader would have figured out that the ast
implementation will be based on an ast function that directly handles
the options of readlink(1) and realpath(1)

and how ironic that you use the combination of readlink(1) and realpath(1)
to teach a lesson on kiss and one function per utility -- readlink(1)
already does 2 completely separate operations "readlink" and "canonicalize"
and is itself a hardlink to, dare I say, the catch-all stat(1)

the next ast alpha will provide readlink(1) and realpath(1) libcmd builtins
I'm done explaining myself on this issue

still awaiting reports on how the reference implementations handle
0 and 2 or more path operands as the reference implementation man pages suggest
Glenn Fowler
2013-09-21 14:31:00 UTC
Permalink
Post by Lionel Cons
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.gnu.org/software/coreutils/manual/html_node/readlink-invocation.html
http://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
Post by Glenn Fowler
resolvepath(1)
AFAIK that's ENOSUCHMANPAGE... out of confusion between
|resolvepath(2)| and the |*(1)| consumers of that API...
... Lionel... can you verify this, please ?
Yes, that's correct. resolvepath(1) turned out to be a crazy script of
our own making which gets deployed to deal with the lack of a path
resolving facility - a standardisation on ksh93/busybox readlink(1)
and realpath(1) would eliminate the need.
Please don't forget option --fd $dirfd for readlink(1) and realpath(1)
so we can have a virtual root (for chroot environments without doing
an actual (expensive) chroot() call each time we want to get the paths
right).
can you provide some more info on dirfd as chroot() surrogate
I'm a bit confused because the *at() api dirfd has no effect on absolute paths
and the whole idea of chroot() is to grab all paths, relative and absolute

would the caller using --fd=$dirfd with the intention of modelling chroot
also have to ensure that all paths were relative (really the leading / chopped off)?
Joshuah Hurst
2013-09-21 15:26:48 UTC
Permalink
Post by Glenn Fowler
Post by Lionel Cons
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.gnu.org/software/coreutils/manual/html_node/readlink-invocation.html
http://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
Post by Glenn Fowler
resolvepath(1)
AFAIK that's ENOSUCHMANPAGE... out of confusion between
|resolvepath(2)| and the |*(1)| consumers of that API...
... Lionel... can you verify this, please ?
Yes, that's correct. resolvepath(1) turned out to be a crazy script of
our own making which gets deployed to deal with the lack of a path
resolving facility - a standardisation on ksh93/busybox readlink(1)
and realpath(1) would eliminate the need.
Please don't forget option --fd $dirfd for readlink(1) and realpath(1)
so we can have a virtual root (for chroot environments without doing
an actual (expensive) chroot() call each time we want to get the paths
right).
can you provide some more info on dirfd as chroot() surrogate
I'm a bit confused because the *at() api dirfd has no effect on absolute paths
and the whole idea of chroot() is to grab all paths, relative and absolute
I think the idea is to to NOT have to use chroot() each time when they
set up a chroot environment. That doesn't include a full emulation of
chroot behaviour, but requires to do some things relative to the
intended root of the environment.
I've seen such hacks before, using perl and custom wrappers for
openat(2), fchownat(2), fstatat(2), futimesat(2), renameat(2),
unlinkat(2) ...

Josh
Glenn Fowler
2013-09-22 02:58:47 UTC
Permalink
Post by Joshuah Hurst
Post by Glenn Fowler
Post by Lionel Cons
Post by Roland Mainz
Post by Glenn Fowler
can someone post urls for the man pages for
readlink(1)
realpath(1)
http://busybox.net/downloads/BusyBox.html
http://www.freebsd.org/cgi/man.cgi?query=readlink&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.freebsd.org/cgi/man.cgi?query=realpath&apropos=0&sektion=1&manpath=FreeBSD+9.1-RELEASE&arch=default&format=html
http://www.gnu.org/software/coreutils/manual/html_node/readlink-invocation.html
http://www.gnu.org/software/coreutils/manual/html_node/realpath-invocation.html
Post by Glenn Fowler
resolvepath(1)
AFAIK that's ENOSUCHMANPAGE... out of confusion between
|resolvepath(2)| and the |*(1)| consumers of that API...
... Lionel... can you verify this, please ?
Yes, that's correct. resolvepath(1) turned out to be a crazy script of
our own making which gets deployed to deal with the lack of a path
resolving facility - a standardisation on ksh93/busybox readlink(1)
and realpath(1) would eliminate the need.
Please don't forget option --fd $dirfd for readlink(1) and realpath(1)
so we can have a virtual root (for chroot environments without doing
an actual (expensive) chroot() call each time we want to get the paths
right).
can you provide some more info on dirfd as chroot() surrogate
I'm a bit confused because the *at() api dirfd has no effect on absolute paths
and the whole idea of chroot() is to grab all paths, relative and absolute
I think the idea is to to NOT have to use chroot() each time when they
set up a chroot environment. That doesn't include a full emulation of
chroot behaviour, but requires to do some things relative to the
intended root of the environment.
I've seen such hacks before, using perl and custom wrappers for
openat(2), fchownat(2), fstatat(2), futimesat(2), renameat(2),
unlinkat(2) ...
perhaps I didn't phrase the question correctly
I realize that the idea is to use the chroot() fd and not chroot(2) itself

for
openat(fd, "/absolute-path", flags)
fd has no effect on the resolution of "/absolute-path" because it starts with "/"

so

how does a chroot() emulation using *at() functions work?
Loading...