Discussion:
[ast-developers] |sigqueue()| fixes+|EAGAIN| handling etc. ...
Roland Mainz
2013-08-18 23:07:49 UTC
Permalink
Hi!

----

Attached (as "astksh20130814_sigqueuerepeat001.diff.txt") is a patch
which fixes a couple of problems with kill(1)'s |sigqueue()| support.

** Changes:
- kill(1) now has a --repeat option to repeat |sigqueue()| if it
returns with |EAGAIN|. It turns out this can happen *very* often if
the |SIGQUEUE_MAX| limit is reached. Since this happens very very
*OFTEN* (and the detection and re-try on shell level a bit cumbersome)
this option is "on" by default for -q/-Q but can be turned off via
--norepeat. If |sigqueue()| is repeated kill(1) calles |sched_yield()|
(see "Notes" below) to prevent the receiving process from being
"starved" away from the CPU. Note that this can never result in an
endless loop as signals are queued in-order by the sending process as
long as there is space in the queue. Priority inversion and other
issues are avoided by using |sched_yield()|.

- If --norepeat is active kill(1) will properly report |EAGAIN| as
error message. Previously it reported "No such process" in such cases
which was *infuriatingly* misleading (and send a couple of
Solaris+Linux people to a "wild goose chase" to check for kernel
bugs).

- kill(1) now has a -Q option to pass an address to |sigqueue()|

- kill(1) now has a --sendcont to control the sending of SIGCONT after
a signal was send (requested by Cedric... as it turns out the extra
SIGCONT may fill-up the signal queue twice as fast up to
|SIGQUEUE_MAX| limit if -q/-Q aren't used). The other issues were that
a user may wish to send stopped processes signals (to queue them) and
then resume the whole process group in one step. The old behaviour of
unconditionally resuming processes prevented that... --nosendcont now
allows that

- The logic for sending SIGCONT after sending a signal has been moved
to |b_kill()|

- -q/-Q are now only available if |sigqueue()| is supported. It
doesn't make much sense to provide -q/-Q if there is no way to send
the requested value. This has been confusing for at least one script
author who wasted a day of debugging with that mess


** Notes:
- |sched_yield()| is part of the same "POSIX realtime extenions"
package as |sigqueue()| and therefore should always be there when
|sigqueue()| is available


** Testing:
- Cedric, Olga and I have this patch in our trees since a ~two weeks
without problems

Comments/rants/feedback welcome... and please notify me if you change
the patch somehow...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
diff -r -u original/src/cmd/ksh93/bltins/trap.c build_clang_dgkfix/src/cmd/ksh93/bltins/trap.c
--- src/cmd/ksh93/bltins/trap.c 2013-08-09 21:59:35.000000000 +0200
+++ src/cmd/ksh93/bltins/trap.c 2013-08-18 23:57:46.696491954 +0200
@@ -35,7 +35,9 @@

#define L_FLAG 1
#define S_FLAG 2
+#define C_FLAG JOB_CFLAG
#define Q_FLAG JOB_QFLAG
+#define R_FLAG JOB_RFLAG

static int sig_number(Shell_t*,const char*);

@@ -194,7 +196,7 @@
int b_kill(int argc,char *argv[],Shbltin_t *context)
{
register char *signame;
- register int sig=SIGTERM, flag=0, n;
+ register int sig=SIGTERM, flag=C_FLAG|R_FLAG, n;
register Shell_t *shp = context->shp;
int usemenu = 0;
NOT_USED(argc);
@@ -218,12 +220,27 @@
case 'l':
flag |= L_FLAG;
break;
+#if _lib_sigqueue
case 'q':
flag |= Q_FLAG;
- shp->sigval = opt_info.num;
+ flag &= ~C_FLAG;
+ memset(&shp->sigval, 0, sizeof(shp->sigval));
+ shp->sigval.sival_int = opt_info.num;
+ break;
+ case 'Q':
+ flag |= Q_FLAG;
+ flag &= ~C_FLAG;
+ memset(&shp->sigval, 0, sizeof(shp->sigval));
+ shp->sigval.sival_ptr = (void *)(((char *)0)+((ptrdiff_t)opt_info.num));
+ break;
+ case 'R':
+ opt_info.num?(flag |= R_FLAG):(flag &= ~R_FLAG);
+ break;
+#endif
+ case 'C':
+ opt_info.num?(flag |= C_FLAG):(flag &= ~C_FLAG);
break;
case '?':
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_usage(2), "%s", opt_info.arg);
break;
}
@@ -233,7 +250,6 @@
argv++;
if(error_info.errors || flag==(L_FLAG|S_FLAG) || (!(*argv) && !(flag&L_FLAG)))
{
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_usage(2),"%s", optusage((char*)0));
}
/* just in case we send a kill -9 $$ */
@@ -251,7 +267,6 @@
if((sig=sig_number(shp,signame))<0)
{
shp->exitval = 2;
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_exit(1),e_nosignal,signame);
}
sfprintf(sfstdout,"%d\n",sig);
@@ -267,9 +282,8 @@
errormsg(SH_DICT,ERROR_exit(1),e_nosignal,signame);
}
}
- if(job_walk(shp,sfstdout,job_kill,sig|(flag&Q_FLAG),argv))
+ if(job_walk(shp,sfstdout,job_kill,sig|(flag&(C_FLAG|Q_FLAG|R_FLAG)),argv))
shp->exitval = 1;
- shp->sigval = 0;
return(shp->exitval);
}

diff -r -u original/src/cmd/ksh93/data/builtins.c build_clang_dgkfix/src/cmd/ksh93/data/builtins.c
--- src/cmd/ksh93/data/builtins.c 2013-06-26 20:29:58.000000000 +0200
+++ src/cmd/ksh93/data/builtins.c 2013-08-19 00:18:07.910431667 +0200
@@ -1010,7 +1010,7 @@
;

const char sh_optkill[] =
-"[-1c?\n@(#)$Id: kill (AT&T Research) 2012-07-05 $\n]"
+"[-1c?\n@(#)$Id: kill (AT&T Research) 2013-08-14 $\n]"
USAGE_LICENSE
"[+NAME?kill - terminate or signal process]"
"[+DESCRIPTION?With the first form in which \b-l\b is not specified, "
@@ -1027,13 +1027,22 @@
"due to a signal. If a name is given the corresponding signal "
"number will be written to standard output. If a number is given "
"the corresponding signal name will be written to standard output.]"
+"[C!:sendcont?Send SIGCONT to resume target process after sending the signal.]"
"[l?List signal names or signal numbers rather than sending signals as "
"described above. "
"The \b-n\b and \b-s\b options cannot be specified.]"
-"[q]#[n?On systems that support \asigqueue\a(2), send a queued signal with "
- "message number \an\a. The specified \ajob\as must be a positive "
- "number. On systems that do not support \asigqueue\a(2), a signal "
- "is sent without the message number \an\a and will not be queued.]"
+#if _lib_sigqueue
+"[q]#[n?Queue a signal using \asigqueue\a(2) with the integer value \an\a."
+ "The specified \ajob\as must be a positive number. Implies "
+ "--nosendcont.]"
+"[Q]#[addr?Queue a signal using \asigqueue\a(2) with the address pointer "
+ "value \aaddr\a. Note that the value is only portable between "
+ "processes which have the same pointer size and endianess, "
+ "otherwise the result may be undefined."
+ "The specified \ajob\as must be a positive number. Implies "
+ "--nosendcont.]"
+"[R!:repeat?Repeat attempt to queue a signal if sigqueue() returns EAGAIN.]"
+#endif
"[L?Same as \b-l\b except that of no argument is specified the signals will "
"be listed in menu format as with select compound command.]"
"[n]#[signum?Specify a signal number to send. Signal numbers are not "
diff -r -u original/src/cmd/ksh93/data/msg.c build_clang_dgkfix/src/cmd/ksh93/data/msg.c
--- src/cmd/ksh93/data/msg.c 2013-05-23 17:25:11.000000000 +0200
+++ src/cmd/ksh93/data/msg.c 2013-08-18 21:29:01.630335807 +0200
@@ -83,6 +83,7 @@
const char e_subscript[] = "%s: subscript out of range";
const char e_toodeep[] = "%s: recursion too deep";
const char e_access[] = "permission denied";
+const char e_again[] = "resource temporarily unavailable";
#ifdef _cmd_universe
const char e_nouniverse[] = "universe not accessible";
#endif /* _cmd_universe */
diff -r -u original/src/cmd/ksh93/features/sigfeatures build_clang_dgkfix/src/cmd/ksh93/features/sigfeatures
--- src/cmd/ksh93/features/sigfeatures 2013-04-06 00:58:27.000000000 +0200
+++ src/cmd/ksh93/features/sigfeatures 2013-08-19 00:15:42.440051841 +0200
@@ -1,11 +1,8 @@
-lib sigblock,sigrelse,sigsetmask,sigprocmask,sigvec,sigqueue,sigaction
+lib sigblock,sigrelse,sigsetmask,sigprocmask,sigvec,sigqueue,sigaction,sched_yield
typ sigset_t ast.h signal.h
mem sigvec.sv_mask signal.h
mem siginfo_t.si_value.sigval_int signal.h
cat{
- #ifndef _lib_sigqueue
- # define sigqueue(sig,action,val) kill(sig,action)
- #endif
#ifndef _mem_sigvec_sv_mask
# undef _lib_sigvec
#endif
diff -r -u original/src/cmd/ksh93/include/defs.h build_clang_dgkfix/src/cmd/ksh93/include/defs.h
--- src/cmd/ksh93/include/defs.h 2013-08-12 16:31:56.000000000 +0200
+++ src/cmd/ksh93/include/defs.h 2013-08-19 00:10:36.730731842 +0200
@@ -143,6 +143,12 @@
Shwait_f waitevent;
};

+#if _lib_sigqueue
+typedef sigval_t sh_sigval_t;
+#else
+typedef int sh_sigval_t;
+#endif
+
#define _SH_PRIVATE \
struct shared *gd; /* global data */ \
struct sh_scoped st; /* scoped information */ \
@@ -230,7 +236,7 @@
int xargexit; \
int nenv; \
int lexsize; \
- int sigval; \
+ sh_sigval_t sigval; \
mode_t mask; \
Env_t *env; \
void *init_context; \
diff -r -u original/src/cmd/ksh93/include/jobs.h build_clang_dgkfix/src/cmd/ksh93/include/jobs.h
--- src/cmd/ksh93/include/jobs.h 2013-07-08 20:43:18.000000000 +0200
+++ src/cmd/ksh93/include/jobs.h 2013-08-18 23:52:31.728297588 +0200
@@ -140,7 +140,9 @@
#define JOB_NFLAG 2
#define JOB_PFLAG 4
#define JOB_NLFLAG 8
-#define JOB_QFLAG 0x100
+#define JOB_QFLAG 0x100 /* use |sigqueue()| */
+#define JOB_RFLAG 0x200 /* retry for |EAGAIN| */
+#define JOB_CFLAG 0x400 /* send SIGCONT */

extern struct jobs job;

@@ -186,6 +188,7 @@
extern const char e_jobsrunning[];
extern const char e_nlspace[];
extern const char e_access[];
+extern const char e_again[];
extern const char e_terminate[];
extern const char e_no_jctl[];
extern const char e_signo[];
diff -r -u original/src/cmd/ksh93/sh/jobs.c build_clang_dgkfix/src/cmd/ksh93/sh/jobs.c
--- src/cmd/ksh93/sh/jobs.c 2013-08-13 16:41:50.000000000 +0200
+++ src/cmd/ksh93/sh/jobs.c 2013-08-18 23:55:56.584886159 +0200
@@ -31,6 +31,10 @@

#include "defs.h"
#include <wait.h>
+#if _lib_sched_yield
+#include <sched.h>
+#endif
+
#include "io.h"
#include "jobs.h"
#include "history.h"
@@ -1169,22 +1173,20 @@
register pid_t pid;
register int r;
const char *msg;
- int qflag = sig&JOB_QFLAG;
+#if _lib_sigqueue
+ bool qflag = (sig&JOB_QFLAG)?true:false;
+ bool rflag = (sig&JOB_RFLAG)?true:false;
+#endif
+ bool cflag = (sig&JOB_CFLAG)?true:false;
#ifdef SIGTSTP
bool stopsig;
#endif
-#if _lib_sigqueue
- union sigval sig_val;
-#else
- int sig_val = 0;
-#endif
+
if(pw==0)
goto error;
shp = pw->p_shp;
-#if _lib_sigqueue
- sig_val.sival_int = shp->sigval;
-#endif
- sig &= ~JOB_QFLAG;
+
+ sig &= ~(JOB_CFLAG|JOB_QFLAG|JOB_RFLAG);
#ifdef SIGTSTP
stopsig = (sig==SIGSTOP||sig==SIGTSTP||sig==SIGTTIN||sig==SIGTTOU);
#else
@@ -1213,54 +1215,78 @@
{
if(pid>=0)
{
+#if _lib_sigqueue
if(qflag)
{
if(pid==0)
goto no_sigqueue;
- r = sigqueue(pid,sig,sig_val);
+ while (r = sigqueue(pid, sig, shp->sigval), rflag && (r < 0) && (errno == EAGAIN))
+ {
+#if _lib_sched_yield
+ /* Give other processes some CPU time */
+ (void)sched_yield();
+#endif
+ errno = 0;
+ }
}
else
+#endif
r = kill(pid,sig);
if(r>=0 && !stopsig)
{
if(pw->p_flag&P_STOPPED)
pw->p_flag &= ~(P_STOPPED|P_SIGNALLED);
- if(sig && !qflag)
- kill(pid,SIGCONT);
+ if(sig && cflag)
+ (void)kill(pid,SIGCONT);
}
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
if((r = killpg(-pid,sig))>=0 && !stopsig)
{
job_unstop(job_bypid(pw->p_pid));
- if(sig)
- killpg(-pid,SIGCONT);
+ if(sig && cflag)
+ (void)killpg(-pid,SIGCONT);
}
}
}
#else
if(pid>=0)
{
+#if _lib_sigqueue
if(qflag)
- r = sigqueue(pid,sig,sig_val);
+ while (r = sigqueue(pid, sig, shp->sigval), rflag && (r < 0) && (errno == EAGAIN))
+ {
+#if _lib_sched_yield
+ /* Give other processes some CPU time */
+ (void)sched_yield();
+#endif
+ errno = 0;
+ }
else
+#endif
r = kill(pid,sig);
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
r = killpg(-pid,sig);
}
#endif /* SIGTSTP */
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
if(pid = pw->p_pgrp)
{
r = killpg(pid,sig);
@@ -1285,22 +1311,34 @@
if(r<0 && job_string)
{
error:
- if(pw && by_number)
- msg = sh_translate(e_no_proc);
- else
- msg = sh_translate(e_no_job);
- if(errno == EPERM)
- msg = sh_translate(e_access);
- sfprintf(sfstderr,"kill: %s: %s\n",job_string, msg);
+ switch(errno)
+ {
+ case EPERM:
+ msg = sh_translate(e_access);
+ break;
+ case EAGAIN:
+ msg = sh_translate(e_again);
+ break;
+ default:
+ if(pw && by_number)
+ msg = sh_translate(e_no_proc);
+ else
+ msg = sh_translate(e_no_job);
+ break;
+ }
+
+ sfprintf(sfstderr, "kill: %s: %s\n", job_string, msg);
r = 2;
}
sh_delay(.001);
job_unlock();
return(r);
+#if _lib_sigqueue
no_sigqueue:
msg = sh_translate("queued signals require positive pids");
sfprintf(sfstderr,"kill: %s: %s\n",job_string, msg);
return(2);
+#endif
}

/*
Irek Szczesniak
2013-08-19 07:25:49 UTC
Permalink
Post by Roland Mainz
Hi!
----
Attached (as "astksh20130814_sigqueuerepeat001.diff.txt") is a patch
which fixes a couple of problems with kill(1)'s |sigqueue()| support.
- kill(1) now has a --repeat option to repeat |sigqueue()| if it
returns with |EAGAIN|. It turns out this can happen *very* often if
the |SIGQUEUE_MAX| limit is reached. Since this happens very very
*OFTEN* (and the detection and re-try on shell level a bit cumbersome)
this option is "on" by default for -q/-Q but can be turned off via
--norepeat. If |sigqueue()| is repeated kill(1) calles |sched_yield()|
(see "Notes" below) to prevent the receiving process from being
"starved" away from the CPU. Note that this can never result in an
endless loop as signals are queued in-order by the sending process as
long as there is space in the queue. Priority inversion and other
issues are avoided by using |sched_yield()|.
I'm sometimes *AMAZED* which things get overlooked. Yes, EAGAIN is an
issue. We've seen that on Solaris and AIX quite often and cursed a lot
because realtime signals were unreliable prior to this patch.
Post by Roland Mainz
- If --norepeat is active kill(1) will properly report |EAGAIN| as
error message. Previously it reported "No such process" in such cases
which was *infuriatingly* misleading (and send a couple of
Solaris+Linux people to a "wild goose chase" to check for kernel
bugs).
Nit: Is there *ever* a reason to use --norepeat?
Post by Roland Mainz
- kill(1) now has a -Q option to pass an address to |sigqueue()|
Good

The patch looks good in general except these nits:
- kill --help misses some spaces
- A NOTES section explaining that the number of realtime signals is
flexible might be good
- I dislike the abuse of Shell_t as dumping ground for everything.
This includes sigval_t. Could you find a better way, e.g. pass it as
parameter?
- IMO Shell_t needs to be cleaned up a lot. Its an unstructured,
unclean mess, IMO only second to the jobs variable in terms of being a
source of possible bugs and confusion.

Irek

Loading...