Discussion:
[ast-developers] bug: kill(1) in ksh93 doesn't check for kill(3p) and killpg(3p) errors
Cedric Blancher
2013-06-17 12:59:38 UTC
Permalink
I found a bug in ksh93's kill(1) implementation: It doesn't check for
errors when kill(3p) and killpg(3p) are used.
However posix says for kill(3p) and killpg(3p):
RETURN VALUE
Upon successful completion, 0 shall be returned. Otherwise, -1
shall be returned and errno set to indicate the error.

ERRORS
The kill() function shall fail if:

EINVAL The value of the sig argument is an invalid or
unsupported signal number.

EPERM The process does not have permission to send the signal
to any receiving process.

ESRCH No process or process group can be found corresponding
to that specified by pid.

So the two syscalls can fail, and ksh93 kill(1) should check this.

Ced
--
Cedric Blancher <cedric.blancher at googlemail.com>
Institute Pasteur
Glenn Fowler
2013-06-17 13:50:06 UTC
Permalink
can you point to the specific lines in the code that led you to this conclusion
Post by Cedric Blancher
I found a bug in ksh93's kill(1) implementation: It doesn't check for
errors when kill(3p) and killpg(3p) are used.
RETURN VALUE
Upon successful completion, 0 shall be returned. Otherwise, -1
shall be returned and errno set to indicate the error.
ERRORS
EINVAL The value of the sig argument is an invalid or
unsupported signal number.
EPERM The process does not have permission to send the signal
to any receiving process.
ESRCH No process or process group can be found corresponding
to that specified by pid.
So the two syscalls can fail, and ksh93 kill(1) should check this.
Ced
--
Cedric Blancher <cedric.blancher at googlemail.com>
Institute Pasteur
Cedric Blancher
2013-06-17 14:00:21 UTC
Permalink
Post by Glenn Fowler
Post by Cedric Blancher
I found a bug in ksh93's kill(1) implementation: It doesn't check for
errors when kill(3p) and killpg(3p) are used.
RETURN VALUE
Upon successful completion, 0 shall be returned. Otherwise, -1
shall be returned and errno set to indicate the error.
ERRORS
EINVAL The value of the sig argument is an invalid or
unsupported signal number.
EPERM The process does not have permission to send the signal
to any receiving process.
ESRCH No process or process group can be found corresponding
to that specified by pid.
So the two syscalls can fail, and ksh93 kill(1) should check this.
can you point to the specific lines in the code that led you to this conclusion
function job_kill():
1214 }
1215 else
1216 r = kill(pid,sig);
1217 if(r>=0 && !stopsig)
1218 {
1219 if(pw->p_flag&P_STOPPED)
1220 pw->p_flag &=
~(P_STOPPED|P_SIGNALLED);
1221 if(sig)
1222 kill(pid,SIGCONT);
1223 }
1224 }
1225 else
1226 {
1227 if(qflag)
1228 goto no_sigqueue;
1229 if((r = killpg(-pid,sig))>=0
&& !stopsig)
1230 {
1231
job_unstop(job_bypid(pw->p_pid));
1232 if(sig)

None of them checks for errors from kill(pid,sig) or killpg() to print an error.

I noticed this when David reported that on Linux 3% of the USR1 and
USR2 signals get lost, which indicates a kernel bug. This is the only
place I could find which may be a culprit for 'silently' loosing
signals.

If this isn't the culprit then Linux has a bug.

Ced
--
Cedric Blancher <cedric.blancher at googlemail.com>
Institute Pasteur
Glenn Fowler
2013-06-17 14:37:08 UTC
Permalink
Post by Cedric Blancher
Post by Glenn Fowler
Post by Cedric Blancher
I found a bug in ksh93's kill(1) implementation: It doesn't check for
errors when kill(3p) and killpg(3p) are used.
RETURN VALUE
Upon successful completion, 0 shall be returned. Otherwise, -1
shall be returned and errno set to indicate the error.
ERRORS
EINVAL The value of the sig argument is an invalid or
unsupported signal number.
EPERM The process does not have permission to send the signal
to any receiving process.
ESRCH No process or process group can be found corresponding
to that specified by pid.
So the two syscalls can fail, and ksh93 kill(1) should check this.
can you point to the specific lines in the code that led you to this conclusion
1214 }
1215 else
1216 r = kill(pid,sig);
1217 if(r>=0 && !stopsig)
1218 {
1219 if(pw->p_flag&P_STOPPED)
1220 pw->p_flag &=
~(P_STOPPED|P_SIGNALLED);
1221 if(sig)
1222 kill(pid,SIGCONT);
1223 }
1224 }
1225 else
1226 {
1227 if(qflag)
1228 goto no_sigqueue;
1229 if((r = killpg(-pid,sig))>=0
&& !stopsig)
1230 {
1231
job_unstop(job_bypid(pw->p_pid));
1232 if(sig)
None of them checks for errors from kill(pid,sig) or killpg() to print an error.
"none" in the section of code you mention (job_kill() which is called by
b_kill() via job_walk()) translates to 3 job control specific instances:

kill(pid,SIGCONT);
killpg(-pid,SIGCONT);
kill(pw->p_pid,SIGCONT);

and if you look close they are encased in logic that already has done a kill/killpg
that does check

the other 4 instances are checked:

r = kill(pid,sig);
if((r = killpg(-pid,sig))>=0 && !stopsig)
r = kill(pid,sig);
r = killpg(-pid,sig);
r = killpg(pid,sig);
Post by Cedric Blancher
I noticed this when David reported that on Linux 3% of the USR1 and
USR2 signals get lost, which indicates a kernel bug. This is the only
place I could find which may be a culprit for 'silently' loosing
signals.
If this isn't the culprit then Linux has a bug.
Ced
--
Cedric Blancher <cedric.blancher at googlemail.com>
Institute Pasteur
Cedric Blancher
2013-06-17 14:53:40 UTC
Permalink
Post by Glenn Fowler
Post by Cedric Blancher
Post by Glenn Fowler
Post by Cedric Blancher
I found a bug in ksh93's kill(1) implementation: It doesn't check for
errors when kill(3p) and killpg(3p) are used.
RETURN VALUE
Upon successful completion, 0 shall be returned. Otherwise, -1
shall be returned and errno set to indicate the error.
ERRORS
EINVAL The value of the sig argument is an invalid or
unsupported signal number.
EPERM The process does not have permission to send the signal
to any receiving process.
ESRCH No process or process group can be found corresponding
to that specified by pid.
So the two syscalls can fail, and ksh93 kill(1) should check this.
can you point to the specific lines in the code that led you to this conclusion
1214 }
1215 else
1216 r = kill(pid,sig);
1217 if(r>=0 && !stopsig)
1218 {
1219 if(pw->p_flag&P_STOPPED)
1220 pw->p_flag &=
~(P_STOPPED|P_SIGNALLED);
1221 if(sig)
1222 kill(pid,SIGCONT);
1223 }
1224 }
1225 else
1226 {
1227 if(qflag)
1228 goto no_sigqueue;
1229 if((r = killpg(-pid,sig))>=0
&& !stopsig)
1230 {
1231
job_unstop(job_bypid(pw->p_pid));
1232 if(sig)
None of them checks for errors from kill(pid,sig) or killpg() to print an error.
"none" in the section of code you mention (job_kill() which is called by
kill(pid,SIGCONT);
killpg(-pid,SIGCONT);
kill(pw->p_pid,SIGCONT);
and if you look close they are encased in logic that already has done a kill/killpg
that does check
r = kill(pid,sig);
if((r = killpg(-pid,sig))>=0 && !stopsig)
r = kill(pid,sig);
r = killpg(-pid,sig);
r = killpg(pid,sig);
Oh. My bad. I didn't notice that much further down r is checked and a
message is printed for errors. Sorry for the trouble then

Ced
--
Cedric Blancher <cedric.blancher at googlemail.com>
Institute Pasteur
Loading...