Discussion:
[ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
Roland Mainz
2013-08-19 09:29:03 UTC
Permalink
Attached (as "astksh20130814_sigqueuerepeat001.diff.txt") is a patch
which fixes a couple of problems with kill(1)'s |sigqueue()| support.
[snip]
Nit: Is there *ever* a reason to use --norepeat?
Yes... if the script wishes to do it's own handling of |EAGAIN| for
some reason...

[snip]
- kill --help misses some spaces
Fixed.
- A NOTES section explaining that the number of realtime signals is
flexible might be good
I'll defer that to a late patch. The primary issue is that $ getconf
RTSIG_MAX # returns the number of signals supported by the kernel...
but glibc and stuff like valgrind may dynamically use one or more
realtime signals for their own usage... reducing the number of
available realtime signals reported by $ kill -l # ... it may require
some thinking before writing such a NOTES section...
- I dislike the abuse of Shell_t as dumping ground for everything.
This includes sigval_t. Could you find a better way, e.g. pass it as
parameter?
I agree... but the issue is that I need to update the coshell support
and I'm not sure about the consequences yet. I have to ask Glenn for
suggestions/help...
- IMO Shell_t needs to be cleaned up a lot. Its an unstructured,
unclean mess, IMO only second to the jobs variable in terms of being a
source of possible bugs and confusion.
Grumpf... I partially agree... I wish |Shell_t| would be more
structured, e.g. that the consumers (lexer/parser) all have their
struct ShLex/ShParser/etc. to store their variables there so it
becomes easier to figure out which subsystem in ksh93 uses which
variable... it would also help instrumentation utilities to keep track
of uninitalised variables etc.

Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...

----

Bye,
Roland
--
__ . . __
(o.\ \/ /.o) roland.mainz at nrubsig.org
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
-------------- next part --------------
diff -r -u original/src/cmd/ksh93/bltins/trap.c build_clang_dgkfix/src/cmd/ksh93/bltins/trap.c
--- src/cmd/ksh93/bltins/trap.c 2013-08-09 21:59:35.000000000 +0200
+++ src/cmd/ksh93/bltins/trap.c 2013-08-19 09:41:38.397146210 +0200
@@ -35,7 +35,9 @@

#define L_FLAG 1
#define S_FLAG 2
+#define C_FLAG JOB_CFLAG
#define Q_FLAG JOB_QFLAG
+#define R_FLAG JOB_RFLAG

static int sig_number(Shell_t*,const char*);

@@ -194,7 +196,7 @@
int b_kill(int argc,char *argv[],Shbltin_t *context)
{
register char *signame;
- register int sig=SIGTERM, flag=0, n;
+ register int sig=SIGTERM, flag=C_FLAG|R_FLAG, n;
register Shell_t *shp = context->shp;
int usemenu = 0;
NOT_USED(argc);
@@ -218,12 +220,31 @@
case 'l':
flag |= L_FLAG;
break;
+#if _lib_sigqueue
case 'q':
flag |= Q_FLAG;
- shp->sigval = opt_info.num;
+ flag &= ~C_FLAG;
+ memset(&shp->sigval, 0, sizeof(shp->sigval));
+ shp->sigval.sival_int = opt_info.num;
+ break;
+ case 'Q':
+ flag |= Q_FLAG;
+ flag &= ~C_FLAG;
+ memset(&shp->sigval, 0, sizeof(shp->sigval));
+#if __STDC_VERSION__ >= 199901L
+ shp->sigval.sival_ptr = (void *)(((char *)0)+((ptrdiff_t)opt_info.num));
+#else
+ shp->sigval.sival_ptr = (void *)(((char *)0)+((unsigned int)opt_info.num));
+#endif
+ break;
+ case 'R':
+ opt_info.num?(flag |= R_FLAG):(flag &= ~R_FLAG);
+ break;
+#endif
+ case 'C':
+ opt_info.num?(flag |= C_FLAG):(flag &= ~C_FLAG);
break;
case '?':
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_usage(2), "%s", opt_info.arg);
break;
}
@@ -233,7 +254,6 @@
argv++;
if(error_info.errors || flag==(L_FLAG|S_FLAG) || (!(*argv) && !(flag&L_FLAG)))
{
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_usage(2),"%s", optusage((char*)0));
}
/* just in case we send a kill -9 $$ */
@@ -251,7 +271,6 @@
if((sig=sig_number(shp,signame))<0)
{
shp->exitval = 2;
- shp->sigval = 0;
errormsg(SH_DICT,ERROR_exit(1),e_nosignal,signame);
}
sfprintf(sfstdout,"%d\n",sig);
@@ -267,9 +286,8 @@
errormsg(SH_DICT,ERROR_exit(1),e_nosignal,signame);
}
}
- if(job_walk(shp,sfstdout,job_kill,sig|(flag&Q_FLAG),argv))
+ if(job_walk(shp,sfstdout,job_kill,sig|(flag&(C_FLAG|Q_FLAG|R_FLAG)),argv))
shp->exitval = 1;
- shp->sigval = 0;
return(shp->exitval);
}

diff -r -u original/src/cmd/ksh93/data/builtins.c build_clang_dgkfix/src/cmd/ksh93/data/builtins.c
--- src/cmd/ksh93/data/builtins.c 2013-06-26 20:29:58.000000000 +0200
+++ src/cmd/ksh93/data/builtins.c 2013-08-19 09:38:42.139734115 +0200
@@ -1010,7 +1010,7 @@
;

const char sh_optkill[] =
-"[-1c?\n@(#)$Id: kill (AT&T Research) 2012-07-05 $\n]"
+"[-1c?\n@(#)$Id: kill (AT&T Research) 2013-08-15 $\n]"
USAGE_LICENSE
"[+NAME?kill - terminate or signal process]"
"[+DESCRIPTION?With the first form in which \b-l\b is not specified, "
@@ -1027,13 +1027,22 @@
"due to a signal. If a name is given the corresponding signal "
"number will be written to standard output. If a number is given "
"the corresponding signal name will be written to standard output.]"
+"[C!:sendcont?Send SIGCONT to resume target process after sending the signal.]"
"[l?List signal names or signal numbers rather than sending signals as "
- "described above. "
+ "described above. "
"The \b-n\b and \b-s\b options cannot be specified.]"
-"[q]#[n?On systems that support \asigqueue\a(2), send a queued signal with "
- "message number \an\a. The specified \ajob\as must be a positive "
- "number. On systems that do not support \asigqueue\a(2), a signal "
- "is sent without the message number \an\a and will not be queued.]"
+#if _lib_sigqueue
+"[q]#[n?Queue a signal using \asigqueue\a(2) with the integer value \an\a. "
+ "The specified \ajob\as must be a positive number. Implies "
+ "--nosendcont.]"
+"[Q]#[addr?Queue a signal using \asigqueue\a(2) with the address pointer "
+ "value \aaddr\a. Note that the value is only portable between "
+ "processes which have the same pointer size and endianess, "
+ "otherwise the result may be undefined. "
+ "The specified \ajob\as must be a positive number. Implies "
+ "--nosendcont.]"
+"[R!:repeat?Repeat attempt to queue a signal if sigqueue() returns EAGAIN.]"
+#endif
"[L?Same as \b-l\b except that of no argument is specified the signals will "
"be listed in menu format as with select compound command.]"
"[n]#[signum?Specify a signal number to send. Signal numbers are not "
diff -r -u original/src/cmd/ksh93/data/msg.c build_clang_dgkfix/src/cmd/ksh93/data/msg.c
--- src/cmd/ksh93/data/msg.c 2013-05-23 17:25:11.000000000 +0200
+++ src/cmd/ksh93/data/msg.c 2013-08-18 21:29:01.630335807 +0200
@@ -83,6 +83,7 @@
const char e_subscript[] = "%s: subscript out of range";
const char e_toodeep[] = "%s: recursion too deep";
const char e_access[] = "permission denied";
+const char e_again[] = "resource temporarily unavailable";
#ifdef _cmd_universe
const char e_nouniverse[] = "universe not accessible";
#endif /* _cmd_universe */
diff -r -u original/src/cmd/ksh93/features/sigfeatures build_clang_dgkfix/src/cmd/ksh93/features/sigfeatures
--- src/cmd/ksh93/features/sigfeatures 2013-04-06 00:58:27.000000000 +0200
+++ src/cmd/ksh93/features/sigfeatures 2013-08-19 00:15:42.440051841 +0200
@@ -1,11 +1,8 @@
-lib sigblock,sigrelse,sigsetmask,sigprocmask,sigvec,sigqueue,sigaction
+lib sigblock,sigrelse,sigsetmask,sigprocmask,sigvec,sigqueue,sigaction,sched_yield
typ sigset_t ast.h signal.h
mem sigvec.sv_mask signal.h
mem siginfo_t.si_value.sigval_int signal.h
cat{
- #ifndef _lib_sigqueue
- # define sigqueue(sig,action,val) kill(sig,action)
- #endif
#ifndef _mem_sigvec_sv_mask
# undef _lib_sigvec
#endif
diff -r -u original/src/cmd/ksh93/include/defs.h build_clang_dgkfix/src/cmd/ksh93/include/defs.h
--- src/cmd/ksh93/include/defs.h 2013-08-12 16:31:56.000000000 +0200
+++ src/cmd/ksh93/include/defs.h 2013-08-19 00:10:36.730731842 +0200
@@ -143,6 +143,12 @@
Shwait_f waitevent;
};

+#if _lib_sigqueue
+typedef sigval_t sh_sigval_t;
+#else
+typedef int sh_sigval_t;
+#endif
+
#define _SH_PRIVATE \
struct shared *gd; /* global data */ \
struct sh_scoped st; /* scoped information */ \
@@ -230,7 +236,7 @@
int xargexit; \
int nenv; \
int lexsize; \
- int sigval; \
+ sh_sigval_t sigval; \
mode_t mask; \
Env_t *env; \
void *init_context; \
diff -r -u original/src/cmd/ksh93/include/jobs.h build_clang_dgkfix/src/cmd/ksh93/include/jobs.h
--- src/cmd/ksh93/include/jobs.h 2013-07-08 20:43:18.000000000 +0200
+++ src/cmd/ksh93/include/jobs.h 2013-08-19 09:36:22.633161420 +0200
@@ -140,7 +140,9 @@
#define JOB_NFLAG 2
#define JOB_PFLAG 4
#define JOB_NLFLAG 8
-#define JOB_QFLAG 0x100
+#define JOB_QFLAG 0x100 /* use |sigqueue()| */
+#define JOB_RFLAG 0x200 /* retry for |EAGAIN| */
+#define JOB_CFLAG 0x400 /* send SIGCONT */

extern struct jobs job;

@@ -186,6 +188,7 @@
extern const char e_jobsrunning[];
extern const char e_nlspace[];
extern const char e_access[];
+extern const char e_again[];
extern const char e_terminate[];
extern const char e_no_jctl[];
extern const char e_signo[];
diff -r -u original/src/cmd/ksh93/sh/jobs.c build_clang_dgkfix/src/cmd/ksh93/sh/jobs.c
--- src/cmd/ksh93/sh/jobs.c 2013-08-13 16:41:50.000000000 +0200
+++ src/cmd/ksh93/sh/jobs.c 2013-08-19 10:59:07.248414622 +0200
@@ -31,8 +31,13 @@

#include "defs.h"
#include <wait.h>
+#if _lib_sched_yield
+#include <sched.h>
+#endif
+
#include "io.h"
#include "jobs.h"
+#include "builtins.h"
#include "history.h"

#if !defined(WCONTINUED) || !defined(WIFCONTINUED)
@@ -1165,26 +1170,32 @@

int job_kill(register struct process *pw,register int sig)
{
- Shell_t *shp;
- register pid_t pid;
- register int r;
- const char *msg;
- int qflag = sig&JOB_QFLAG;
+ Shell_t * shp;
+ register pid_t pid;
+ register int r;
+ const char * msg;
+ Shbltin_t * context;
+
+#if _lib_sigqueue
+ bool qflag = (sig&JOB_QFLAG)?true:false;
+ bool rflag = (sig&JOB_RFLAG)?true:false;
+#endif
+ bool cflag = (sig&JOB_CFLAG)?true:false;
#ifdef SIGTSTP
bool stopsig;
#endif
-#if _lib_sigqueue
- union sigval sig_val;
-#else
- int sig_val = 0;
-#endif
+
if(pw==0)
goto error;
shp = pw->p_shp;
-#if _lib_sigqueue
- sig_val.sival_int = shp->sigval;
-#endif
- sig &= ~JOB_QFLAG;
+
+ /*
+ * In the future |job_walk()| should take an extra argument
+ * to pass the 3rd argument to |b_kill()| down to this function
+ */
+ context = (shp->bltinfun == b_kill)?(&shp->bltindata):(NULL);
+
+ sig &= ~(JOB_CFLAG|JOB_QFLAG|JOB_RFLAG);
#ifdef SIGTSTP
stopsig = (sig==SIGSTOP||sig==SIGTSTP||sig==SIGTTIN||sig==SIGTTOU);
#else
@@ -1213,54 +1224,84 @@
{
if(pid>=0)
{
+#if _lib_sigqueue
if(qflag)
{
if(pid==0)
goto no_sigqueue;
- r = sigqueue(pid,sig,sig_val);
+ while (r = sigqueue(pid, sig, shp->sigval),
+ rflag && (r < 0) && (errno == EAGAIN) &&
+ (context?(!context->sigset):true))
+ {
+#if _lib_sched_yield
+ /* Give other processes some CPU time */
+ (void)sched_yield();
+#endif
+ errno = 0;
+ }
}
else
+#endif
r = kill(pid,sig);
if(r>=0 && !stopsig)
{
if(pw->p_flag&P_STOPPED)
pw->p_flag &= ~(P_STOPPED|P_SIGNALLED);
- if(sig && !qflag)
- kill(pid,SIGCONT);
+ if(sig && cflag)
+ (void)kill(pid,SIGCONT);
}
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
if((r = killpg(-pid,sig))>=0 && !stopsig)
{
job_unstop(job_bypid(pw->p_pid));
- if(sig)
- killpg(-pid,SIGCONT);
+ if(sig && cflag)
+ (void)killpg(-pid,SIGCONT);
}
}
}
#else
if(pid>=0)
{
+#if _lib_sigqueue
if(qflag)
- r = sigqueue(pid,sig,sig_val);
+ {
+ while (r = sigqueue(pid, sig, shp->sigval),
+ rflag && (r < 0) && (errno == EAGAIN) &&
+ (context?(!context->sigset):true))
+ {
+#if _lib_sched_yield
+ /* Give other processes some CPU time */
+ (void)sched_yield();
+#endif
+ errno = 0;
+ }
+ }
else
+#endif
r = kill(pid,sig);
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
r = killpg(-pid,sig);
}
#endif /* SIGTSTP */
}
else
{
+#if _lib_sigqueue
if(qflag)
goto no_sigqueue;
+#endif
if(pid = pw->p_pgrp)
{
r = killpg(pid,sig);
@@ -1268,10 +1309,6 @@
if(r>=0 && (sig==SIGHUP||sig==SIGTERM || sig==SIGCONT))
job_unstop(pw);
#endif /* SIGTSTP */
-#if 0
- if(r>=0)
- sh_delay(.05);
-#endif
}
while(pw && pw->p_pgrp==0 && (r=kill(pw->p_pid,sig))>=0)
{
@@ -1284,23 +1321,47 @@
}
if(r<0 && job_string)
{
- error:
- if(pw && by_number)
- msg = sh_translate(e_no_proc);
- else
- msg = sh_translate(e_no_job);
- if(errno == EPERM)
- msg = sh_translate(e_access);
- sfprintf(sfstderr,"kill: %s: %s\n",job_string, msg);
+error:
+ /*
+ * |sigqueue()| can (at least) generate |EPERM|,
+ * |EAGAIN|, |EINVAL| and |ESRCH|
+ */
+ switch(errno)
+ {
+ case EPERM:
+ msg = sh_translate(e_access);
+ break;
+ case EAGAIN:
+ msg = sh_translate(e_again);
+ break;
+ case EINVAL:
+ msg = sh_translate(e_nosignal);
+ break;
+ case ESRCH:
+ default:
+ if(pw && by_number)
+ msg = sh_translate(e_no_proc);
+ else
+ msg = sh_translate(e_no_job);
+ break;
+ }
+
+ sfprintf(sfstderr, "kill: %s: %s\n", job_string, msg);
r = 2;
}
+#if _lib_sched_yield
+ (void)sched_yield();
+#else
sh_delay(.001);
+#endif
job_unlock();
return(r);
+#if _lib_sigqueue
no_sigqueue:
msg = sh_translate("queued signals require positive pids");
sfprintf(sfstderr,"kill: %s: %s\n",job_string, msg);
return(2);
+#endif
}

/*
Cedric Blancher
2013-08-20 16:42:43 UTC
Permalink
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat001.diff.txt") is a patch
which fixes a couple of problems with kill(1)'s |sigqueue()| support.
[snip]
Nit: Is there *ever* a reason to use --norepeat?
Yes... if the script wishes to do it's own handling of |EAGAIN| for
some reason...
[snip]
- kill --help misses some spaces
Fixed.
- A NOTES section explaining that the number of realtime signals is
flexible might be good
I'll defer that to a late patch. The primary issue is that $ getconf
RTSIG_MAX # returns the number of signals supported by the kernel...
but glibc and stuff like valgrind may dynamically use one or more
realtime signals for their own usage... reducing the number of
available realtime signals reported by $ kill -l # ... it may require
some thinking before writing such a NOTES section...
- I dislike the abuse of Shell_t as dumping ground for everything.
This includes sigval_t. Could you find a better way, e.g. pass it as
parameter?
I agree... but the issue is that I need to update the coshell support
and I'm not sure about the consequences yet. I have to ask Glenn for
suggestions/help...
- IMO Shell_t needs to be cleaned up a lot. Its an unstructured,
unclean mess, IMO only second to the jobs variable in terms of being a
source of possible bugs and confusion.
Grumpf... I partially agree... I wish |Shell_t| would be more
structured, e.g. that the consumers (lexer/parser) all have their
struct ShLex/ShParser/etc. to store their variables there so it
becomes easier to figure out which subsystem in ksh93 uses which
variable... it would also help instrumentation utilities to keep track
of uninitalised variables etc.
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
Roland, thank you. The patch cures the signal problems we found. kill
-q now works even with massive signal storms on 64 core machines,
which didn't do it previously (at least not in a reliable way).

Now, what's the status of the patch? is it going to be part of the next alpha?

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
David Korn
2013-08-21 14:09:25 UTC
Permalink
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.

Let me go over them piece by piece.

1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.

2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.

3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.

Let me know what you think.

David Korn
dgk at research.att.com
ольга крыжановская
2013-08-21 14:58:40 UTC
Permalink
David, about 2., the code should *NOT* use sleep() or nanosleep() to
wait after EAGAIN or you risk *DRAMATIC* priority inversion problems.
I have experimented with this and EAGAIN can happen 2000, 200, 20, or
just 2 times, or 0 times.
The best solution we found is just to call sched_yield() and let the
process spin. This will not affect system performance as the code
walked in userland is very very small, and sched_yield() will
guarantee that the kernel will only give CPU time if the scheduler
queue cycles. At the same time, if sched_yield() is used (not
nanosleep()!!!!!), the scheduler will give less and less priority to
the spinning process. I tested this on Solaris, Linux, Illumos,
FreeBSD and OSX.

Olga
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
Let me know what you think.
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Glenn Fowler
2013-08-21 15:12:27 UTC
Permalink
in ast call asoyield() for sched_yield()
Post by ольга крыжановская
David, about 2., the code should *NOT* use sleep() or nanosleep() to
wait after EAGAIN or you risk *DRAMATIC* priority inversion problems.
I have experimented with this and EAGAIN can happen 2000, 200, 20, or
just 2 times, or 0 times.
The best solution we found is just to call sched_yield() and let the
process spin. This will not affect system performance as the code
walked in userland is very very small, and sched_yield() will
guarantee that the kernel will only give CPU time if the scheduler
queue cycles. At the same time, if sched_yield() is used (not
nanosleep()!!!!!), the scheduler will give less and less priority to
the spinning process. I tested this on Solaris, Linux, Illumos,
FreeBSD and OSX.
Olga
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
Let me know what you think.
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2013-08-21 15:21:29 UTC
Permalink
Does asoyield() call sched_yield()? The reason for sched_yield() is
that the kernel scheduler is aware that the process is spinning, at
the first use of scheld_yield(). For most kernels this sets a flag
internally, so priority adjustments for fast, repeated sched_yield()
usage happens immediately - which avoids that the kernel has to use
statistics for cpu hogging processes instead - which are wrong in this
case. Use of nanosleep() or similar apis MUST be avoided, at all cost,
because the kernel will see this as sign, that the process is doing
important stuff and has to be put on top of the scheduler queue each
time the time out expires. This makes spin locks implemented via
nanosleep() so bad, they just hog the cpu, at all costs, to the
*severe* (unfair!!) disadvantage of all other processes.

Olga
Post by Glenn Fowler
in ast call asoyield() for sched_yield()
Post by ольга крыжановская
David, about 2., the code should *NOT* use sleep() or nanosleep() to
wait after EAGAIN or you risk *DRAMATIC* priority inversion problems.
I have experimented with this and EAGAIN can happen 2000, 200, 20, or
just 2 times, or 0 times.
The best solution we found is just to call sched_yield() and let the
process spin. This will not affect system performance as the code
walked in userland is very very small, and sched_yield() will
guarantee that the kernel will only give CPU time if the scheduler
queue cycles. At the same time, if sched_yield() is used (not
nanosleep()!!!!!), the scheduler will give less and less priority to
the spinning process. I tested this on Solaris, Linux, Illumos,
FreeBSD and OSX.
Olga
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
Let me know what you think.
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Glenn Fowler
2013-08-21 15:36:21 UTC
Permalink
sched_yield() is iffe'd
on systems that don't have it calls ast::tvsleep() => nanosleep()
Post by ольга крыжановская
Does asoyield() call sched_yield()? The reason for sched_yield() is
that the kernel scheduler is aware that the process is spinning, at
the first use of scheld_yield(). For most kernels this sets a flag
internally, so priority adjustments for fast, repeated sched_yield()
usage happens immediately - which avoids that the kernel has to use
statistics for cpu hogging processes instead - which are wrong in this
case. Use of nanosleep() or similar apis MUST be avoided, at all cost,
because the kernel will see this as sign, that the process is doing
important stuff and has to be put on top of the scheduler queue each
time the time out expires. This makes spin locks implemented via
nanosleep() so bad, they just hog the cpu, at all costs, to the
*severe* (unfair!!) disadvantage of all other processes.
Olga
Post by Glenn Fowler
in ast call asoyield() for sched_yield()
Post by ольга крыжановская
David, about 2., the code should *NOT* use sleep() or nanosleep() to
wait after EAGAIN or you risk *DRAMATIC* priority inversion problems.
I have experimented with this and EAGAIN can happen 2000, 200, 20, or
just 2 times, or 0 times.
The best solution we found is just to call sched_yield() and let the
process spin. This will not affect system performance as the code
walked in userland is very very small, and sched_yield() will
guarantee that the kernel will only give CPU time if the scheduler
queue cycles. At the same time, if sched_yield() is used (not
nanosleep()!!!!!), the scheduler will give less and less priority to
the spinning process. I tested this on Solaris, Linux, Illumos,
FreeBSD and OSX.
Olga
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
Let me know what you think.
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2013-08-21 16:02:14 UTC
Permalink
I think this will not matter. Systems which have sigqueue() must have
sched_yield(), or they do not conform to POSIX, in any reasonable way.

Olga
Post by Glenn Fowler
sched_yield() is iffe'd
on systems that don't have it calls ast::tvsleep() => nanosleep()
Post by ольга крыжановская
Does asoyield() call sched_yield()? The reason for sched_yield() is
that the kernel scheduler is aware that the process is spinning, at
the first use of scheld_yield(). For most kernels this sets a flag
internally, so priority adjustments for fast, repeated sched_yield()
usage happens immediately - which avoids that the kernel has to use
statistics for cpu hogging processes instead - which are wrong in this
case. Use of nanosleep() or similar apis MUST be avoided, at all cost,
because the kernel will see this as sign, that the process is doing
important stuff and has to be put on top of the scheduler queue each
time the time out expires. This makes spin locks implemented via
nanosleep() so bad, they just hog the cpu, at all costs, to the
*severe* (unfair!!) disadvantage of all other processes.
Olga
Post by Glenn Fowler
in ast call asoyield() for sched_yield()
Post by ольга крыжановская
David, about 2., the code should *NOT* use sleep() or nanosleep() to
wait after EAGAIN or you risk *DRAMATIC* priority inversion problems.
I have experimented with this and EAGAIN can happen 2000, 200, 20, or
just 2 times, or 0 times.
The best solution we found is just to call sched_yield() and let the
process spin. This will not affect system performance as the code
walked in userland is very very small, and sched_yield() will
guarantee that the kernel will only give CPU time if the scheduler
queue cycles. At the same time, if sched_yield() is used (not
nanosleep()!!!!!), the scheduler will give less and less priority to
the spinning process. I tested this on Solaris, Linux, Illumos,
FreeBSD and OSX.
Olga
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
Let me know what you think.
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Cedric Blancher
2013-08-21 18:33:28 UTC
Permalink
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
We do have the need. The point is to queue signals in any state, even
the stopped one. The scripts we use have many worker jobs (5000+ for
small cases) which work on tasks submitted via SIGRT to it and return
to the stopped state after sending a SIGRT signal to return the result
(or better: A number defining the result directory). Once stopped they
await new instructions, again send via SIGRT. Once all tasks have been
submitted to the process we want to send a SIGCONT to kick off the
process once again to do it's stuff until the task queue has been
drained, at which the process sends itself a SIGSTOP. The parent
process receives a SIGCHLD with CHLD_STOP as notification that the
process awaits new instructions.

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Irek Szczesniak
2013-08-21 22:35:23 UTC
Permalink
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
I think I know why Roland added this option:
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.

Irek
Cedric Blancher
2013-08-30 02:04:34 UTC
Permalink
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.
That was the reason. So -C has a use and -R has a rightful use, too

but the discussion is moot as the patch wasn't taken for
ast-ksh.2013-08-29. So again a shell where kill -q doesn't work in a
production environment. Which drives me seriously crazy.
I start to understand the rise of perl (which ksh93 could've easily
crushed): they just take patches in time while ksh93 delays them over
and over and over again

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Glenn Fowler
2013-08-30 04:01:52 UTC
Permalink
Post by Cedric Blancher
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.
That was the reason. So -C has a use and -R has a rightful use, too
but the discussion is moot as the patch wasn't taken for
ast-ksh.2013-08-29. So again a shell where kill -q doesn't work in a
production environment. Which drives me seriously crazy.
I start to understand the rise of perl (which ksh93 could've easily
crushed): they just take patches in time while ksh93 delays them over
and over and over again
some patch proposals require thought before making it into a release
discussion is part of that process
Cedric Blancher
2013-08-30 04:07:14 UTC
Permalink
Post by Glenn Fowler
Post by Cedric Blancher
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.
That was the reason. So -C has a use and -R has a rightful use, too
but the discussion is moot as the patch wasn't taken for
ast-ksh.2013-08-29. So again a shell where kill -q doesn't work in a
production environment. Which drives me seriously crazy.
I start to understand the rise of perl (which ksh93 could've easily
crushed): they just take patches in time while ksh93 delays them over
and over and over again
some patch proposals require thought before making it into a release
discussion is part of that process
But it has been discussed and went through many discussions before
Roland send it to the list. There has been a lot of thought behind
that patch, more than I liked.

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Glenn Fowler
2013-08-30 05:37:47 UTC
Permalink
Post by Cedric Blancher
Post by Glenn Fowler
Post by Cedric Blancher
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.
That was the reason. So -C has a use and -R has a rightful use, too
but the discussion is moot as the patch wasn't taken for
ast-ksh.2013-08-29. So again a shell where kill -q doesn't work in a
production environment. Which drives me seriously crazy.
I start to understand the rise of perl (which ksh93 could've easily
crushed): they just take patches in time while ksh93 delays them over
and over and over again
some patch proposals require thought before making it into a release
discussion is part of that process
But it has been discussed and went through many discussions before
Roland send it to the list. There has been a lot of thought behind
that patch, more than I liked.
some problems require more thought/discussion than others
part of the off list discussion between dgk and me is how to
abstract C features at the interpreter level
some interpreted languages give up on abstraction and expose way too much of the C apis

look how far mcilroy's elegant "foo | bar" has taken us
there's plenty more gems like that waiting for some thought to bring them to light

I know "(memory) address" in that patch raised a red flag
Cedric Blancher
2013-08-30 07:10:39 UTC
Permalink
Post by Glenn Fowler
Post by Cedric Blancher
Post by Glenn Fowler
Post by Cedric Blancher
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds. EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
1. If the target process or thread is stopped it cannot consume
signals. They just queue up. If the queue is full an attempt to
sigqueue() one returns with EAGAIN and you have a variation of the
Livelock [http://en.wikipedia.org/wiki/Deadlock#Livelock]. You can try
to experiment with that by using ulimit -i 5.
2. Spinning with EAGAIN may not always be desirable for a realtime
application. Realtime != fast, it must be able to guarantee to act in
a determinable amount of time. The default case should be to spin with
EAGAIN (after all, ksh93 is a high level language), but give the
programmer the ability to do the EGAIN loop themselves. That's what we
did until Roland invented the -R option.
That was the reason. So -C has a use and -R has a rightful use, too
but the discussion is moot as the patch wasn't taken for
ast-ksh.2013-08-29. So again a shell where kill -q doesn't work in a
production environment. Which drives me seriously crazy.
I start to understand the rise of perl (which ksh93 could've easily
crushed): they just take patches in time while ksh93 delays them over
and over and over again
some patch proposals require thought before making it into a release
discussion is part of that process
But it has been discussed and went through many discussions before
Roland send it to the list. There has been a lot of thought behind
that patch, more than I liked.
some problems require more thought/discussion than others
part of the off list discussion between dgk and me is how to
abstract C features at the interpreter level
some interpreted languages give up on abstraction and expose way too much of the C apis
look how far mcilroy's elegant "foo | bar" has taken us
there's plenty more gems like that waiting for some thought to bring them to light
I know "(memory) address" in that patch raised a red flag
Would a patch without kill -Q acceptable?

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Cedric Blancher
2013-08-30 02:04:57 UTC
Permalink
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
I say its necessary. I asked for this and spend quite some time and
effort (11 emails) convincing Roland about having it.
The arguments are:
1. Not all targets which receive signals are ksh93
2. The targets may provide a set of valid addresses to ksh93 scripts
which are used to trigger actions or pass object references disguised
as kill -Q $address -s RTMIN $pid
3. We are talking about "BIG DATA" (all uppercase). >= 1TB of main
memory. kill -q runs out of breath at 2**31, which are 2**31
objects/actions you can pick from. -Q runs out of breath at 2**63,
which is a lot more and unlikely to happen anytime soon

Roland's counterarguments are:
- WTF a new option?
- Can't you fit this into a mapping list which maps objects to 2*31
choices? My answer: No, we can't because its cumbersome and we already
have more than 2**31 objects on production systems which need to be
processed in one step.
- Addresses are not portable. My answer: yes, they are, but
application and scripts are running on the same machine anyway.
Post by David Korn
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds.
What would not improve the situation as there is still the risk that
this blows up in the face of the user. Either we have -R or have it
not, but please don't do a stupid half fix which still keeps the risk
of blowing up in the face of the user at the worst possible time.
Post by David Korn
EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
This isn't EINTR, this is EAGAIN
Post by David Korn
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
I think this has been answered. Processes should be able to get
signals queued and not change their state from STOPPED to RUNNING.
Post by David Korn
Let me know what you think.
I'll be nice if Roland's patch could be accepted without functional
changes. If there are objections please let me KNOW that I can
respond.

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
Cedric Blancher
2013-08-30 02:20:57 UTC
Permalink
Post by Cedric Blancher
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Roland Mainz
Attached (as "astksh20130814_sigqueuerepeat002.diff.txt") is a patch
which fixes (some of) the issues listed above...
I read over Rolands patch and while it contains useful ideas, I have
some problems with it.
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
I say its necessary. I asked for this and spend quite some time and
effort (11 emails) convincing Roland about having it.
1. Not all targets which receive signals are ksh93
2. The targets may provide a set of valid addresses to ksh93 scripts
which are used to trigger actions or pass object references disguised
as kill -Q $address -s RTMIN $pid
3. We are talking about "BIG DATA" (all uppercase). >= 1TB of main
memory. kill -q runs out of breath at 2**31, which are 2**31
objects/actions you can pick from. -Q runs out of breath at 2**63,
which is a lot more and unlikely to happen anytime soon
- WTF a new option?
- Can't you fit this into a mapping list which maps objects to 2*31
choices? My answer: No, we can't because its cumbersome and we already
have more than 2**31 objects on production systems which need to be
processed in one step.
- Addresses are not portable. My answer: yes, they are, but
application and scripts are running on the same machine anyway.
Post by David Korn
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds.
What would not improve the situation as there is still the risk that
this blows up in the face of the user. Either we have -R or have it
not, but please don't do a stupid half fix which still keeps the risk
of blowing up in the face of the user at the worst possible time.
Post by David Korn
EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
This isn't EINTR, this is EAGAIN
Post by David Korn
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
I think this has been answered. Processes should be able to get
signals queued and not change their state from STOPPED to RUNNING.
Post by David Korn
Let me know what you think.
I'll be nice if Roland's patch could be accepted without functional
changes. If there are objections please let me KNOW that I can
respond.
Roland, could you update your patch for ast-ksh.20130829, with #ifdef
for kill -Q in case David doesn't wish to integrate it, please?

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
David Korn
2013-08-21 19:44:00 UTC
Permalink
cc: ast-developers at research.att.com
Subject: Re: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by Cedric Blancher
We do have the need. The point is to queue signals in any state, even
the stopped one. The scripts we use have many worker jobs (5000+ for
small cases) which work on tasks submitted via SIGRT to it and return
to the stopped state after sending a SIGRT signal to return the result
(or better: A number defining the result directory). Once stopped they
await new instructions, again send via SIGRT. Once all tasks have been
submitted to the process we want to send a SIGCONT to kick off the
process once again to do it's stuff until the task queue has been
drained, at which the process sends itself a SIGSTOP. The parent
process receives a SIGCHLD with CHLD_STOP as notification that the
process awaits new instructions.
Ced
OK, if there is a need for it, then I can add an option to not
send SIGCONT which will remain the default.

David Korn
dgk at research.att.com
David Korn
2013-09-03 20:07:17 UTC
Permalink
cc: ast-developers at research.att.com
Subject: Re: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by David Korn
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
The -q option will be able to take integers as large as ptrdiff_t as options
so that addresses can be passed, for example -q 0xffff1234.
This will show up as .sh.value=4294906420.
Post by David Korn
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds.
EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
sigqueue will yield and return -2 for EAGAIN. Users can issue retries.
Post by David Korn
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
kill -q will not send SIGCONT. kill without -q will.

Let me know if this is sufficient or why it is not.

David Korn
dgk at research.att.com
Irek Szczesniak
2013-09-07 12:04:28 UTC
Permalink
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by David Korn
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
The -q option will be able to take integers as large as ptrdiff_t as options
so that addresses can be passed, for example -q 0xffff1234.
This will show up as .sh.value=4294906420.
How should this possibly work?
1. sigval_t is an *union* and not a struct. How do you decide whether
the value goes into the sival_int or the sival_ptr member? You can't
use them as "equal" because they have usually different sizes, i.e.
sizeof(pointer) > sizeof(int) and there is no way to tell whether the
si_int data occupy the low or high 32bit word of a 64bit pointer.
2. The sender and receiver must know whether they with to use si_int
or si_ptr. Using one at the sender side and using the other on the
receiver side is a good way to get random values.
3. You invoke the wrath of big and little endian platforms. One of
them will break with your solution.
4. sival_int is a signed property, si_ptr is an unsigned property. How
would you handle that?
Post by David Korn
Post by David Korn
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds.
EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
sigqueue will yield and return -2 for EAGAIN. Users can issue retries.
How can a builtin return a negative return value?
Post by David Korn
Post by David Korn
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
kill -q will not send SIGCONT. kill without -q will.
Sounds good.
Post by David Korn
Let me know if this is sufficient or why it is not.
Obviously using kill -q to fill both si_int and si_ptr at the same
time doesn't fly. Not with the POSIX apis I know about at least.

I would still favor strongly the kill -q/kill -Q approach Roland
invented to choose either a) sival_int (kill -q signed_integer_value)
or b) sival_ptr (kill -Q hexadecimal_value), and do the same with
.sh.sig.value.int and .sh.sig.value.ptr.
I know that ksh93 doesn't allow pointers, but I accept Cedric's
explanation that passing an unsigned long hexadecimal value around
that way is a good way to pass an "object handle" around.

Also, keep in mind that not all senders and not all receivers are
shell scripts. Typically realtime applications use SIGRT signals
because its a reliable, low latency and allocation-free way to
communicate between processes.

Irek
Lionel Cons
2013-09-09 08:47:23 UTC
Permalink
Post by Irek Szczesniak
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by David Korn
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
The -q option will be able to take integers as large as ptrdiff_t as options
so that addresses can be passed, for example -q 0xffff1234.
This will show up as .sh.value=4294906420.
How should this possibly work?
1. sigval_t is an *union* and not a struct. How do you decide whether
the value goes into the sival_int or the sival_ptr member? You can't
use them as "equal" because they have usually different sizes, i.e.
sizeof(pointer) > sizeof(int) and there is no way to tell whether the
si_int data occupy the low or high 32bit word of a 64bit pointer.
2. The sender and receiver must know whether they with to use si_int
or si_ptr. Using one at the sender side and using the other on the
receiver side is a good way to get random values.
3. You invoke the wrath of big and little endian platforms. One of
them will break with your solution.
4. sival_int is a signed property, si_ptr is an unsigned property. How
would you handle that?
We talked this through on today's engineering meeting.
We agree with Irek, it is technically not possible to fill both the
sival_int and sival_ptr fields of the sigval_t UNION using one option
taking a single integer argument.

IMHO both kill -q and kill -Q are required to independently fill
either sival_int and sival_ptr, and have the .sh.sig.value.int and
.sh.sig.value.ptr fields as counterparts to obtain the value.

I've been myself doubtful of the usefulness of kill -Q, but my staff
pointed out that in any case the shell needs .sh.sig.value.ptr because
a sender - which might necessarily not be a ksh93 script - can use
sival_ptr instead of sival_int to pass data to the shell and sival_int
cannot be used to substitute sival_ptr, neither can sival_ptr be used
to substitute sival_int.

kill -Q comes handy to pass larger data or object references around,
or (important!) forward received information (signal proxy), so it
might be good to have this option.

Lionel
Cedric Blancher
2013-09-09 16:43:35 UTC
Permalink
Post by David Korn
cc: ast-developers at research.att.com
Subject: Re: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
Post by David Korn
Let me go over them piece by piece.
1. Add -Q to pass addresses.
Currently the shell has no way of utilizing an address so there
is no present need. Currently, the value field models the
C standard and treats value as a union. However, it could treat
the field as an integer choosing the large of void* and int as
the size and pass the value that way. I would add a typedef
for this type. It would be an integral type rather than a union.
A user could do kill -q $((0x4000abc)) or just kill -q 0x4000abc to
send a pointer since optget will convert to an integer.
Programs could format the value as an address or as an int.
I think that this needs more discussion before adding -Q.
Commands already have too many options so I am reluctant to
add an option unless necessary.
The -q option will be able to take integers as large as ptrdiff_t as options
so that addresses can be passed, for example -q 0xffff1234.
This will show up as .sh.value=4294906420.
Post by David Korn
2. Add -R to handle EAGAIN
I don't think that this is needed. EAGAIN should be handled
as it is with fork with an exponential back-off algorithm that
times out after around 30 seconds.
EINTR will cause a retry
unless trapnote has pending trap or signal to process in which
case kill will fail.
sigqueue will yield and return -2 for EAGAIN. Users can issue retries.
Post by David Korn
3. Add -C to not send SIGCONT when sending a signal. I don't
see why you would want to send a signal to a stopped process
and not have it react. I believe that C-shell (the originator
of job control) always sent SIGCONT. If there is a need for
this, I could be convinced.
kill -q will not send SIGCONT. kill without -q will.
Let me know if this is sufficient or why it is not.
The changes sound good. I say "sound" since I can't find any newer
ast-ksh version than ast-ksh.2013-08-29 and therefore can't test it.
What worries me is the change for -q since it doesn't fit into what
the POSIX siginfo api does. Do you use a different api or did you try
to abolish the .sh.sig.value.ptr part? As I said - multiple times - we
need that one to pass object reference between a C++ application to a
shell script and from the shell script to another C++ application. And
we need .sh.sig.value.int, too. Please don't remove it.

Ced
--
Cedric Blancher <cedric.blancher at gmail.com>
Institute Pasteur
David Korn
2013-09-11 21:59:25 UTC
Permalink
cc: ast-developers at research.att.com cedric.blancher at gmail.com roland.mainz at nrubsig.org
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------

I have added -Q in the next alpha and value is a compound variable
of in and ptr. However, I don't like ptr since the shell doesn't
provide a ptr interface. I am thinking of changing the
name to lint or long int.

It will not affect the functionality as far as I know but give
a better description since -Q passes a long int and -q passes int
and give an error for overflow.

Any comments?

David Korn
dgk at research.att.com
ольга крыжановская
2013-09-11 22:21:27 UTC
Permalink
What about naming it obj (object), ref (reference) or objref (object
reference)? long int is misleading, because it is not always a long
int. The best next data type in C, which should always fit, is
ptrdiff_t - but that has the dreaded term ptr in its own name again.

The other counter argument is: Why bother changing the name? The ksh93
api for .sh.sig.value is modeled after sigval_t, and that uses
sival_ptr. Renaming it may cause more confusion than necessary.
Closest competitor perl - which does not have a real concept of
pointer either - just uses si_ptr()
[http://sssg1.whoi.edu/swap2/sources/build/lib/perl/5.8.4/bits/siginfo.ph].
I am not good in python, but AFAIK it does not have native pointers
either, and still defines sigval as this:
sigval._fields_ = [
('sival_int', c_int),
('sival_ptr', c_void_p),
]

Olga
Post by David Korn
cc: ast-developers at research.att.com cedric.blancher at gmail.com roland.mainz at nrubsig.org
Subject: Re: Re: [ast-developers] [patch] kill(1) |sigqueue()| fixes+|EAGAIN| handling etc. ... / was: Re: |sigqueue()| fixes+|EAGAIN| handling etc. ...
--------
I have added -Q in the next alpha and value is a compound variable
of in and ptr. However, I don't like ptr since the shell doesn't
provide a ptr interface. I am thinking of changing the
name to lint or long int.
It will not affect the functionality as far as I know but give
a better description since -Q passes a long int and -q passes int
and give an error for overflow.
Any comments?
David Korn
dgk at research.att.com
_______________________________________________
ast-developers mailing list
ast-developers at lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Loading...