Discussion:
[developer] RT signal priority inversion
Ryan Zezeski via illumos-developer
2014-10-10 02:53:28 UTC
Permalink
For those that don't know, POSIX1.b added a new form of signals called
realtime signals. There are two major differences between RT signals and
regular signals:

1. RT signals can be queued and should be delivered in FIFO order for
the same signal number.

2. RT signals have an ordering (priority), smaller signal numbers have
greater priority and shall be delivered first if multiple different
types of RT signals are queued. [1]

If you who subscribe to illumos-discuss you may have noticed I brought
this up two months ago when running an example program from Steven's
Unix Network Programming Vol 2 [2, 3].

I returned to this problem a few days ago and have done a lot of
digging only to end up a bit miffed. I think there is a legitimate
bug, but as well know signals are batshit crazy and I wanted to check
with the experts before going any further.

The first thing I did was check the man pages for Illumos, FreeBSD,
and Linux and they all agree with the POSIX standard, lower RT number
means higher priority. Then I ran the Steven's program against all
three kernels and _every single one_ exhibited priority inversion. I
then did a lot of dtrace'ing to figure out that the reversal was being
caused by what I can only call "accidental recursion" between the
kernel and libc signal delivery mechanism (more on that soon).

After all that I just couldn't accept the fact that Stevens and I are
the only damn people on Earth to notice this bug. I went googling for
other reports in the field and sure enough I came across a post on
LKML from 2007 describing the exact same issue [4]! The solution
there? To make sure you set the sigaction.sa_mask to block all other
RT signals so that the handler is not interrupted. Sure enough, this
"fix" works. But I wonder if this should be implicit behavior?

Today I wrote up a new test program (rtsignal.c) which is a mix of the
Steven's example and the LKML example program. It takes one argument
of either "mask" or "dont" to determine whether you want to mask
signals while handling or not. If you run with "dont" you will notice
that priority is inverted. If you run with "mask" priority is
correct. Why is this happening? To find out I recommend using my
dtrace script fsig6.d to trace rtsignal while executing it. Both the
source files are attached to this email and can be found at these
URLs:

http://zinascii.com/annex/rtsignal.c
http://zinascii.com/annex/fsig6.d

If you are too lazy to run my code then you can look at my output when
I run with "dont" and "mask":

http://zinascii.com/annex/fsig6.out
http://zinascii.com/annex/fsig6-mask.out

Notice in fsig6.out that fsig (the function responsible for pulling
next signal off the queue) does indeed start in the correct order: 63,
64, then 65. But wait, notice also that the user stack keeps growing
with what looks like a recursive call of the libc signal delivery.
Also notice there is a syscall between each recursion which happens to
be lwp_sigmask. If you look at the code for that syscall you'll
notice it sets t_hold and then calls sigcheck to determine if
t_sig_check should be called.

The problem is that libc is calling a syscall to set a mask but this
syscall also has the side effect of turning on the sigcheck and then
when the syscall exits it notices a signal is pending and repeats the
entire process from the beginning. So basically the kernel and
userland and bouncing back and forth until all unique RT signals have
been seen, added to t_hold, and then at that point the callstack has
effectively LIFO'd the signal priority.

That is, the kernel gets it right but then libc messes it all up.

If you compare this to fsig6-mask.out you'll see that the user stack
doesn't grow and t_sig_check is only set when it should be.

So, at this point I'm now looking for feedback on whether this is a
bug or not. I think it's a bug because POSIX says nothing about
having to set sigaction.sa_mask to obtain the correct priority. Also,
at this point in the code you are in the middle of delivering an RT
signal, a lesser priority RT signal shouldn't be allowed to come in a
usurp it just because of an implementation detail (i.e. the fact that
a syscall was made and there is a stack discipline at play). Finally,
for Stevens and I at least, it violates principal of least surprise.

That said, I'm new to systems programming and I'm not going to lose
sleep if the majority declares "functions as designed". I did this
mostly out of fun and to learn something new. It's my gift to you, do
with this information what you wish. Besides, FreeBSD and Linux
kernels have the same problem.

Finally, if others agree this is a bug, I have an idea for a solution.
Before the handler is invoked call_user_handler() masks off the
current signal being delivered. Add a clause that checks if the
current signal is RT and if it is not only mask off itself but also
every RT signal _larger_ than it. That way it can still be usurped by
lower priority RT signals and non-RT signals (POSIX is clear that
there is no defined order between non-RT and RT signals) but not RT
signals of lesser priority than it.

*wipes brow*

*lets out a sigh*

Okay well that's about it. I'm happy to answer questions or run other
tests.

-Z (rzezeski in IRC)


[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02

[2]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/1:2/20140812194405:89641C78-227A-11E4-8B66-E15F1438A39E/

[3]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/0:2/20140813104524:69E08002-22F8-11E4-AA5C-D62A1474A68B/

[4]: https://lkml.org/lkml/2007/7/11/100



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Robert Mustacchi via illumos-developer
2014-10-11 00:55:17 UTC
Permalink
Post by Ryan Zezeski via illumos-developer
For those that don't know, POSIX1.b added a new form of signals called
realtime signals. There are two major differences between RT signals and
1. RT signals can be queued and should be delivered in FIFO order for
the same signal number.
2. RT signals have an ordering (priority), smaller signal numbers have
greater priority and shall be delivered first if multiple different
types of RT signals are queued. [1]
If you who subscribe to illumos-discuss you may have noticed I brought
this up two months ago when running an example program from Steven's
Unix Network Programming Vol 2 [2, 3].
I returned to this problem a few days ago and have done a lot of
digging only to end up a bit miffed. I think there is a legitimate
bug, but as well know signals are batshit crazy and I wanted to check
with the experts before going any further.
The first thing I did was check the man pages for Illumos, FreeBSD,
and Linux and they all agree with the POSIX standard, lower RT number
means higher priority. Then I ran the Steven's program against all
three kernels and _every single one_ exhibited priority inversion. I
then did a lot of dtrace'ing to figure out that the reversal was being
caused by what I can only call "accidental recursion" between the
kernel and libc signal delivery mechanism (more on that soon).
After all that I just couldn't accept the fact that Stevens and I are
the only damn people on Earth to notice this bug. I went googling for
other reports in the field and sure enough I came across a post on
LKML from 2007 describing the exact same issue [4]! The solution
there? To make sure you set the sigaction.sa_mask to block all other
RT signals so that the handler is not interrupted. Sure enough, this
"fix" works. But I wonder if this should be implicit behavior?
Today I wrote up a new test program (rtsignal.c) which is a mix of the
Steven's example and the LKML example program. It takes one argument
of either "mask" or "dont" to determine whether you want to mask
signals while handling or not. If you run with "dont" you will notice
that priority is inverted. If you run with "mask" priority is
correct. Why is this happening? To find out I recommend using my
dtrace script fsig6.d to trace rtsignal while executing it. Both the
source files are attached to this email and can be found at these
http://zinascii.com/annex/rtsignal.c
http://zinascii.com/annex/fsig6.d
If you are too lazy to run my code then you can look at my output when
http://zinascii.com/annex/fsig6.out
http://zinascii.com/annex/fsig6-mask.out
Notice in fsig6.out that fsig (the function responsible for pulling
next signal off the queue) does indeed start in the correct order: 63,
64, then 65. But wait, notice also that the user stack keeps growing
with what looks like a recursive call of the libc signal delivery.
Also notice there is a syscall between each recursion which happens to
be lwp_sigmask. If you look at the code for that syscall you'll
notice it sets t_hold and then calls sigcheck to determine if
t_sig_check should be called.
The problem is that libc is calling a syscall to set a mask but this
syscall also has the side effect of turning on the sigcheck and then
when the syscall exits it notices a signal is pending and repeats the
entire process from the beginning. So basically the kernel and
userland and bouncing back and forth until all unique RT signals have
been seen, added to t_hold, and then at that point the callstack has
effectively LIFO'd the signal priority.
That is, the kernel gets it right but then libc messes it all up.
If you compare this to fsig6-mask.out you'll see that the user stack
doesn't grow and t_sig_check is only set when it should be.
So, at this point I'm now looking for feedback on whether this is a
bug or not. I think it's a bug because POSIX says nothing about
having to set sigaction.sa_mask to obtain the correct priority. Also,
at this point in the code you are in the middle of delivering an RT
signal, a lesser priority RT signal shouldn't be allowed to come in a
usurp it just because of an implementation detail (i.e. the fact that
a syscall was made and there is a stack discipline at play). Finally,
for Stevens and I at least, it violates principal of least surprise.
That said, I'm new to systems programming and I'm not going to lose
sleep if the majority declares "functions as designed". I did this
mostly out of fun and to learn something new. It's my gift to you, do
with this information what you wish. Besides, FreeBSD and Linux
kernels have the same problem.
Finally, if others agree this is a bug, I have an idea for a solution.
Before the handler is invoked call_user_handler() masks off the
current signal being delivered. Add a clause that checks if the
current signal is RT and if it is not only mask off itself but also
every RT signal _larger_ than it. That way it can still be usurped by
lower priority RT signals and non-RT signals (POSIX is clear that
there is no defined order between non-RT and RT signals) but not RT
signals of lesser priority than it.
*wipes brow*
*lets out a sigh*
Okay well that's about it. I'm happy to answer questions or run other
tests.
This seems like a good analysis in an area that's really annoying and
gritty. This sounds like a bug, and what you've root caused this as
sounds pretty close to the mark. I'd say you should take a shot at
coding up the fix to this that you described and let's see what this
turns out to look like.

Thanks for digging in,
Robert


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Andrew Gabriel via illumos-developer
2014-10-11 08:04:24 UTC
Permalink
Post by Robert Mustacchi via illumos-developer
This seems like a good analysis in an area that's really annoying and
gritty. This sounds like a bug, and what you've root caused this as
sounds pretty close to the mark. I'd say you should take a shot at
coding up the fix to this that you described and let's see what this
turns out to look like. Thanks for digging in, Robert
So this gives 4 possibilities:
1. Pretty much no apps are using this interface
2. All the apps are coded to work with all the wrong implementations
3. All the apps are not working correctly on any platforms
4. POSIX spec is wrong and the current implementations are actually what
was intended.

Do you have any feeling for which of these applies?
It's not clear to me that changing the current implementation will help
anything. Unless you can show that #3 applies, it may do more harm than
good by making Solaris different from everything else, if I understand
the analysis correctly.
--
Andrew


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Bob Friesenhahn via illumos-developer
2014-10-11 21:10:39 UTC
Permalink
Post by Andrew Gabriel via illumos-developer
Post by Robert Mustacchi via illumos-developer
This seems like a good analysis in an area that's really annoying and
gritty. This sounds like a bug, and what you've root caused this as sounds
pretty close to the mark. I'd say you should take a shot at coding up the
fix to this that you described and let's see what this turns out to look
like. Thanks for digging in, Robert
1. Pretty much no apps are using this interface
2. All the apps are coded to work with all the wrong implementations
3. All the apps are not working correctly on any platforms
4. POSIX spec is wrong and the current implementations are actually what was
intended.
Do you have any feeling for which of these applies?
It's not clear to me that changing the current implementation will help
anything. Unless you can show that #3 applies, it may do more harm than good
by making Solaris different from everything else, if I understand the
analysis correctly.
It is clear that real time signals are supposed to be queued and
"delivered" in both FIFO and priority order to three argument
(specified via SA_SIGINFO) signal handlers and
sigwaitinfo()/sigtimedwait(). Bill O. Gallmeister's "Programming For
The Real World" places clear focus on this so it has been a given
since the beginning. There should be no question regarding intention
about the order of when "delivery" occurs.

I am not seeing a definition for what signal "delivery" means for a
signal handler call-back. I see nothing which says if "delivery"
means that the signal handler has completed its task and returned, or
if it only means the signal handler has been started (is able to
execute instructions), or if invocation of additional signal handler
delivery is forbidden if the signal handler for a higher priority
signal is still executing. Likewise, if delivery is via a new thread,
I see nothing which prevents multiple delivery threads from running at
one time.

It is clear to me what "delivery" means when
sigwaitinfo()/sigtimedwait() are used since then delivery is
synchronous with the call. When I require assured handling of signal
delivery, I block asynchronous signal delivery and have a thread wait
for signal delivery with sigwaitinfo().

I have observed simultaneous delivery threads being created under
Linux when threaded delivery is requested. Nothing checks to see if a
previously started delivery thread has completed, and there is not
even an expectation that the delivery thread will terminate any time
soon.

Regardless, it seems that a delivery system which delivers a
lower/same-priority signal while a higher/same-priority signal handler
function is already executing in the receiving thread is not providing
the behavior anticipated by the application programmer. Indeed, it
might not be possible to block delivery of additional signals if the
next signal starts being delivered before the currently-executed
signal handler can adjust signal masking. This does smell like a bug
to me.

Options #2 and #3 likely both apply.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-16 18:06:43 UTC
Permalink
Post by Andrew Gabriel via illumos-developer
This seems like a good analysis in an area that's really annoying and gritty. This sounds like a bug, and what you've root caused this as sounds pretty close to the mark. I'd say you should take a shot at coding up the fix to this that you described and let's see what this turns out to look like. Thanks for digging in, Robert
1. Pretty much no apps are using this interface
I cannot say, but this point is moot as the specification is out there
and its available to be used by any future applications that choose to
do so. I did learn that Linux/glibc uses 2-3 RT signals in its
threading library.

http://man7.org/linux/man-pages/man7/signal.7.html
Post by Andrew Gabriel via illumos-developer
2. All the apps are coded to work with all the wrong implementations
A fair assessment if we only consider the Illumos, FreeBSD, and Linux
kernels, but there are many other kernels out there.
Post by Andrew Gabriel via illumos-developer
3. All the apps are not working correctly on any platforms
My guess is some apps may be broken and the users/authors don't
realize it. But they may rarely hit this bug since it requires
multiple RT signals to be pending.
Post by Andrew Gabriel via illumos-developer
4. POSIX spec is wrong and the current implementations are actually what was intended.
To me the POSIX spec cannot be wrong in the sense that it is what
dictates the user semantics and API. If a user is trying to write
portable code then they should be able to rely on the POSIX spec.
Post by Andrew Gabriel via illumos-developer
Do you have any feeling for which of these applies?
It's not clear to me that changing the current implementation will help anything. Unless you can show that #3 applies, it may do more harm than good by making Solaris different from everything else, if I understand the analysis correctly.
In Stevens' original test Digital Unix 4.0B delivered signals in the
correct order. I had my friend run the test on a Dec Alpha running
IRIX 6.5.29 and c8000 pa-risc workstation running HP-UX64 11.11i:
HP-UX passes both tests (I found what I believe to be a second bug,
but let's table that for now), IRIX is completely busted, delivering
results in strange order.

So there are kernels out there that deliver in the correct order
without requiring the user to manually mask. It all comes down to
this for me: what the hell does POSIX actually say?

* RT signals of the same priority (number) _must_ be delivered in FIFO
order.

* RT signals of lower number are higher in priority and must be
delivered first if multiple RT signals are pending.

* A signal has three stages: generated, pending, and delivered (there
is also "accepted" if you include the sigwait functions).

* Generated is when you create the signal, e.g. sigqueue.

* Pending is when the signal is being held for delivery because it is
currently being blocked by the process/thread it is destined for.

* Delivered is when the process (which is taking delivery of the
signal) takes the appropriate action for that signal (POSIX.1-2008
section 2.4.1).

Okay, so all these are pretty clear except for the delivery part.
"Appropriate action" can be interpreted in many different ways. But
take a step back, what is the point of POSIX in the first place? It's
a specification to define a portable user level API and semantics.
And when I say "user level" I mean application code, not libc. In
that case "taking the appropriate action", when referring to a signal
handler, should mean the handler was at least called. Right?

If you can accept appropriate action as meaning the signal handler was
called (i.e. put on user stack and call made), then this is a bug
because my test shows higher priority signals being preempted by lower
priority ones _before_ the handler is called (right before actually,
the last thing done before invoking the handler is making a
sigprocmask syscall to enforce deferred signals). It would be an
entirely different thing if my handler made a syscall and that caused
it to get interrupted, but that is not what is happening here.
...
Post by Andrew Gabriel via illumos-developer
Given the specified selection of the lowest numeric unblocked pending signal, preemptive priority signal delivery can be achieved using signal numbers and signal masks by ensuring that the sa_mask for each signal number blocks all signals with a higher numeric value.
That's the same proposed fix I gave in my original email. To me this
is clear proof that the OS is supposed to implicitly mask off lower
priority RT signals while delivering.

http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_04_03

-Z



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-16 21:32:05 UTC
Permalink
Post by Ryan Zezeski via illumos-developer
That's the same proposed fix I gave in my original email. To me this
is clear proof that the OS is supposed to implicitly mask off lower
priority RT signals while delivering.
After thinking about it more I still think this is absolutely a bug,
but at this point no one should rely on RT signal ordering on any
system if they want to be portable. Even if the bug was fixed across
all kernels there is no easy way for the program to determine if its
runtime environment has this bug or not.

It was an interesting mental exercise for me to think about this in a
more general sense because I now believe that if an OS (or if
different OSs have different behavior) gets a POSIX spec wrong when
first implementing it you can no longer rely on the spec without
hacking up the compilation/runtime to check the environment.
Basically, because most OSs (that I've tested) got it wrong, and they
differ slightly from each other, there is no reason to ever trust RT
signal ordering. Right?

I still want to fix this bug, but as Andrew pointed out I'm not so
sure it buys anyone anything (at least in terms of portability). It
could still be useful to other OS code or non-portable applications
that want to rely on the order specified in POSIX.

Anyways, I'm going to take a stab at a fix to learn the process.

-Z



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-22 13:23:36 UTC
Permalink
Post by Ryan Zezeski via illumos-developer
Anyways, I'm going to take a stab at a fix to learn the process.
As promised, here is a patch for consideration. If you would like to see this patch in illumos-gate then please speak up and I will go through the real patch process. The general outline of the patch is:

* Before delivering to user handler check if RT signal and if so mask off all RT signals of lower priority. This means that regular signals can still preempt RT signals (the POSIX spec leaves priority between regular and RT undefined, I gave regular high priority, Linux does the same thing).

* Make a note in the sigaction man page that SA_NODEFER does not apply to RT signals because they guarantee FIFO order which NODEFER breaks.

* Add an os-test to verify proper delivery.

One thing I don’t like about my patch is that it loops over sigaddset to mask off the lower priority RT signals. This could be achieved more efficiently with code like the following (N.B. I did not test this particular snippet, just wrote it):

int word = sigword(sig);
int a = bitmask(sig);
int b = ~a;
b &= ((b | (b - 1)) + 1);
a |= b;
set->__sigbits[word++] |= a;
sigset_t *fillset;
sigfillset(fillset);
for(; word < SIGSETSIZE; word++)
(void) set->__sigbits[word] = fillset->__sigbits[word];

However, that code would need to exist in a file like sigsetopts or copy/paste some of its macros into sigaction.c, which seems like a bad idea. Is there a way for me to add something to sigsetopts that is only exposed to the system? Maybe just add __sigrtmask and don’t document it? This is out of my league so I’m looking for help.

Keep in mind I’m a beginner at C/systems programming. I may be doing dumb things or have bad style. Please let me know where I can improve.

-Z
Garrett D'Amore via illumos-developer
2014-10-22 17:09:39 UTC
Permalink
To add a private symbol to libc, use _ prefix, and don't add an explicit
reference for it to the mapfile. Any symbols not exposed in the mapfile
are treated as local by the linker.

On Wed, Oct 22, 2014 at 6:23 AM, Ryan Zezeski via illumos-developer <
Post by Ryan Zezeski via illumos-developer
Post by Ryan Zezeski via illumos-developer
Anyways, I'm going to take a stab at a fix to learn the process.
As promised, here is a patch for consideration. If you would like to see
this patch in illumos-gate then please speak up and I will go through the
* Before delivering to user handler check if RT signal and if so mask off
all RT signals of lower priority. This means that regular signals can
still preempt RT signals (the POSIX spec leaves priority between regular
and RT undefined, I gave regular high priority, Linux does the same thing).
* Make a note in the sigaction man page that SA_NODEFER does not apply to
RT signals because they guarantee FIFO order which NODEFER breaks.
* Add an os-test to verify proper delivery.
One thing I don’t like about my patch is that it loops over sigaddset to
mask off the lower priority RT signals. This could be achieved more
efficiently with code like the following (N.B. I did not test this
int word = sigword(sig);
int a = bitmask(sig);
int b = ~a;
b &= ((b | (b - 1)) + 1);
a |= b;
set->__sigbits[word++] |= a;
sigset_t *fillset;
sigfillset(fillset);
for(; word < SIGSETSIZE; word++)
(void) set->__sigbits[word] =
fillset->__sigbits[word];
However, that code would need to exist in a file like sigsetopts or
copy/paste some of its macros into sigaction.c, which seems like a bad
idea. Is there a way for me to add something to sigsetopts that is only
exposed to the system? Maybe just add __sigrtmask and don’t document it?
This is out of my league so I’m looking for help.
Keep in mind I’m a beginner at C/systems programming. I may be doing dumb
things or have bad style. Please let me know where I can improve.
-Z
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-developer
2014-10-22 17:21:41 UTC
Permalink
You can't just update the copyright to Sun Microsystems. First, Sun
Microsystems no longer exists as an entity, and second, they haven't
approved or written this work. You can (and *should*) add your own
copyright message instead.

You should check that your code is cstyle (cstyle -cPp your *.c files)
compliant. Otherwise, it looks good.

On Wed, Oct 22, 2014 at 6:23 AM, Ryan Zezeski via illumos-developer <
Post by Ryan Zezeski via illumos-developer
Post by Ryan Zezeski via illumos-developer
Anyways, I'm going to take a stab at a fix to learn the process.
As promised, here is a patch for consideration. If you would like to see
this patch in illumos-gate then please speak up and I will go through the
* Before delivering to user handler check if RT signal and if so mask off
all RT signals of lower priority. This means that regular signals can
still preempt RT signals (the POSIX spec leaves priority between regular
and RT undefined, I gave regular high priority, Linux does the same thing).
* Make a note in the sigaction man page that SA_NODEFER does not apply to
RT signals because they guarantee FIFO order which NODEFER breaks.
* Add an os-test to verify proper delivery.
One thing I don’t like about my patch is that it loops over sigaddset to
mask off the lower priority RT signals. This could be achieved more
efficiently with code like the following (N.B. I did not test this
int word = sigword(sig);
int a = bitmask(sig);
int b = ~a;
b &= ((b | (b - 1)) + 1);
a |= b;
set->__sigbits[word++] |= a;
sigset_t *fillset;
sigfillset(fillset);
for(; word < SIGSETSIZE; word++)
(void) set->__sigbits[word] =
fillset->__sigbits[word];
However, that code would need to exist in a file like sigsetopts or
copy/paste some of its macros into sigaction.c, which seems like a bad
idea. Is there a way for me to add something to sigsetopts that is only
exposed to the system? Maybe just add __sigrtmask and don’t document it?
This is out of my league so I’m looking for help.
Keep in mind I’m a beginner at C/systems programming. I may be doing dumb
things or have bad style. Please let me know where I can improve.
-Z
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-developer
2014-10-22 17:22:12 UTC
Permalink
And yes, I read the POSIX 2008 specification carefully, and I agree that
your improvements match the wording in the standard.
Post by Garrett D'Amore via illumos-developer
You can't just update the copyright to Sun Microsystems. First, Sun
Microsystems no longer exists as an entity, and second, they haven't
approved or written this work. You can (and *should*) add your own
copyright message instead.
You should check that your code is cstyle (cstyle -cPp your *.c files)
compliant. Otherwise, it looks good.
On Wed, Oct 22, 2014 at 6:23 AM, Ryan Zezeski via illumos-developer <
Post by Ryan Zezeski via illumos-developer
Post by Ryan Zezeski via illumos-developer
Anyways, I'm going to take a stab at a fix to learn the process.
As promised, here is a patch for consideration. If you would like to see
this patch in illumos-gate then please speak up and I will go through the
* Before delivering to user handler check if RT signal and if so mask off
all RT signals of lower priority. This means that regular signals can
still preempt RT signals (the POSIX spec leaves priority between regular
and RT undefined, I gave regular high priority, Linux does the same thing).
* Make a note in the sigaction man page that SA_NODEFER does not apply to
RT signals because they guarantee FIFO order which NODEFER breaks.
* Add an os-test to verify proper delivery.
One thing I don’t like about my patch is that it loops over sigaddset to
mask off the lower priority RT signals. This could be achieved more
efficiently with code like the following (N.B. I did not test this
int word = sigword(sig);
int a = bitmask(sig);
int b = ~a;
b &= ((b | (b - 1)) + 1);
a |= b;
set->__sigbits[word++] |= a;
sigset_t *fillset;
sigfillset(fillset);
for(; word < SIGSETSIZE; word++)
(void) set->__sigbits[word] =
fillset->__sigbits[word];
However, that code would need to exist in a file like sigsetopts or
copy/paste some of its macros into sigaction.c, which seems like a bad
idea. Is there a way for me to add something to sigsetopts that is only
exposed to the system? Maybe just add __sigrtmask and don’t document it?
This is out of my league so I’m looking for help.
Keep in mind I’m a beginner at C/systems programming. I may be doing
dumb things or have bad style. Please let me know where I can improve.
-Z
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-22 18:37:10 UTC
Permalink
You can't just update the copyright to Sun Microsystems. First, Sun Microsystems no longer exists as an entity, and second, they haven't approved or written this work. You can (and *should*) add your own copyright message instead.
Yes, that was a copy/paste typo that I noticed the moment after I sent the patch out :)
You should check that your code is cstyle (cstyle -cPp your *.c files) compliant. Otherwise, it looks good.
Okay.



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-22 20:57:43 UTC
Permalink
You can't just update the copyright to Sun Microsystems. First, Sun Microsystems no longer exists as an entity, and second, they haven't approved or written this work. You can (and *should*) add your own copyright message instead.
You should check that your code is cstyle (cstyle -cPp your *.c files) compliant. Otherwise, it looks good.
I fixed the copyright, cstyle is now clean for all .c files, and I updated the other two runfiles for os-test.

-Z
Garrett D'Amore via illumos-developer
2014-10-22 21:27:42 UTC
Permalink
Looks good. I think the next step is RTI, but lets wait 24 hours to give
any other reviewers a chance to comment.
Post by Garrett D'Amore via illumos-developer
Post by Garrett D'Amore via illumos-developer
You can't just update the copyright to Sun Microsystems. First, Sun
Microsystems no longer exists as an entity, and second, they haven't
approved or written this work. You can (and *should*) add your own
copyright message instead.
Post by Garrett D'Amore via illumos-developer
You should check that your code is cstyle (cstyle -cPp your *.c files)
compliant. Otherwise, it looks good.
I fixed the copyright, cstyle is now clean for all .c files, and I updated
the other two runfiles for os-test.
-Z
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Ryan Zezeski via illumos-developer
2014-10-22 21:30:58 UTC
Permalink
Looks good. I think the next step is RTI, but lets wait 24 hours to give any other reviewers a chance to comment.
Sounds good, I actually have a small man page fix related to POSIX queues that I’ve been meaning to make. So I’ll go ahead and learn the RTI process with that small patch and then come back to this.

-Z



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com

Rafael Vanoni via illumos-developer
2014-10-22 18:45:25 UTC
Permalink
Post by Ryan Zezeski via illumos-developer
Post by Ryan Zezeski via illumos-developer
Anyways, I'm going to take a stab at a fix to learn the process.
* Before delivering to user handler check if RT signal and if so mask off all RT signals of lower priority. This means that regular signals can still preempt RT signals (the POSIX spec leaves priority between regular and RT undefined, I gave regular high priority, Linux does the same thing).
* Make a note in the sigaction man page that SA_NODEFER does not apply to RT signals because they guarantee FIFO order which NODEFER breaks.
* Add an os-test to verify proper delivery.
int word = sigword(sig);
int a = bitmask(sig);
int b = ~a;
b &= ((b | (b - 1)) + 1);
a |= b;
set->__sigbits[word++] |= a;
sigset_t *fillset;
sigfillset(fillset);
for(; word < SIGSETSIZE; word++)
(void) set->__sigbits[word] = fillset->__sigbits[word];
However, that code would need to exist in a file like sigsetopts or copy/paste some of its macros into sigaction.c, which seems like a bad idea. Is there a way for me to add something to sigsetopts that is only exposed to the system? Maybe just add __sigrtmask and don’t document it? This is out of my league so I’m looking for help.
Keep in mind I’m a beginner at C/systems programming. I may be doing dumb things or have bad style. Please let me know where I can improve.
-Z
Modulo the c-style nits that Garret already mentioned, this looks good
to me.

Cheers,
Rafael


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Rafael Vanoni via illumos-developer
2014-10-11 00:53:24 UTC
Permalink
Post by Ryan Zezeski via illumos-developer
For those that don't know, POSIX1.b added a new form of signals called
realtime signals. There are two major differences between RT signals and
1. RT signals can be queued and should be delivered in FIFO order for
the same signal number.
2. RT signals have an ordering (priority), smaller signal numbers have
greater priority and shall be delivered first if multiple different
types of RT signals are queued. [1]
If you who subscribe to illumos-discuss you may have noticed I brought
this up two months ago when running an example program from Steven's
Unix Network Programming Vol 2 [2, 3].
I returned to this problem a few days ago and have done a lot of
digging only to end up a bit miffed. I think there is a legitimate
bug, but as well know signals are batshit crazy and I wanted to check
with the experts before going any further.
The first thing I did was check the man pages for Illumos, FreeBSD,
and Linux and they all agree with the POSIX standard, lower RT number
means higher priority. Then I ran the Steven's program against all
three kernels and _every single one_ exhibited priority inversion. I
then did a lot of dtrace'ing to figure out that the reversal was being
caused by what I can only call "accidental recursion" between the
kernel and libc signal delivery mechanism (more on that soon).
After all that I just couldn't accept the fact that Stevens and I are
the only damn people on Earth to notice this bug. I went googling for
other reports in the field and sure enough I came across a post on
LKML from 2007 describing the exact same issue [4]! The solution
there? To make sure you set the sigaction.sa_mask to block all other
RT signals so that the handler is not interrupted. Sure enough, this
"fix" works. But I wonder if this should be implicit behavior?
Today I wrote up a new test program (rtsignal.c) which is a mix of the
Steven's example and the LKML example program. It takes one argument
of either "mask" or "dont" to determine whether you want to mask
signals while handling or not. If you run with "dont" you will notice
that priority is inverted. If you run with "mask" priority is
correct. Why is this happening? To find out I recommend using my
dtrace script fsig6.d to trace rtsignal while executing it. Both the
source files are attached to this email and can be found at these
http://zinascii.com/annex/rtsignal.c
http://zinascii.com/annex/fsig6.d
If you are too lazy to run my code then you can look at my output when
http://zinascii.com/annex/fsig6.out
http://zinascii.com/annex/fsig6-mask.out
Notice in fsig6.out that fsig (the function responsible for pulling
next signal off the queue) does indeed start in the correct order: 63,
64, then 65. But wait, notice also that the user stack keeps growing
with what looks like a recursive call of the libc signal delivery.
Also notice there is a syscall between each recursion which happens to
be lwp_sigmask. If you look at the code for that syscall you'll
notice it sets t_hold and then calls sigcheck to determine if
t_sig_check should be called.
The problem is that libc is calling a syscall to set a mask but this
syscall also has the side effect of turning on the sigcheck and then
when the syscall exits it notices a signal is pending and repeats the
entire process from the beginning. So basically the kernel and
userland and bouncing back and forth until all unique RT signals have
been seen, added to t_hold, and then at that point the callstack has
effectively LIFO'd the signal priority.
That is, the kernel gets it right but then libc messes it all up.
If you compare this to fsig6-mask.out you'll see that the user stack
doesn't grow and t_sig_check is only set when it should be.
So, at this point I'm now looking for feedback on whether this is a
bug or not. I think it's a bug because POSIX says nothing about
having to set sigaction.sa_mask to obtain the correct priority. Also,
at this point in the code you are in the middle of delivering an RT
signal, a lesser priority RT signal shouldn't be allowed to come in a
usurp it just because of an implementation detail (i.e. the fact that
a syscall was made and there is a stack discipline at play). Finally,
for Stevens and I at least, it violates principal of least surprise.
That said, I'm new to systems programming and I'm not going to lose
sleep if the majority declares "functions as designed". I did this
mostly out of fun and to learn something new. It's my gift to you, do
with this information what you wish. Besides, FreeBSD and Linux
kernels have the same problem.
Finally, if others agree this is a bug, I have an idea for a solution.
Before the handler is invoked call_user_handler() masks off the
current signal being delivered. Add a clause that checks if the
current signal is RT and if it is not only mask off itself but also
every RT signal _larger_ than it. That way it can still be usurped by
lower priority RT signals and non-RT signals (POSIX is clear that
there is no defined order between non-RT and RT signals) but not RT
signals of lesser priority than it.
*wipes brow*
*lets out a sigh*
Okay well that's about it. I'm happy to answer questions or run other
tests.
-Z (rzezeski in IRC)
[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02
[2]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/1:2/20140812194405:89641C78-227A-11E4-8B66-E15F1438A39E/
[3]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/0:2/20140813104524:69E08002-22F8-11E4-AA5C-D62A1474A68B/
[4]: https://lkml.org/lkml/2007/7/11/100
Hi Ryan,

Sorry I missed your original email, this is a pretty interesting piece
of debugging. I can't spare the time to read the Posix spec right now,
so I'll trust your reading of it. In which case I'd consider this a bug
as you described.

The fact that all other OS'es have the same implementation does raise an
eyebrow on whether there's a dragon or two hiding here somewhere.

I'm not an active contributor to illumos but my impression is that folks
are striving for stability and only taking features when well justified.
If that's accurate, then it might be the case of taking this as a
documentation bug rather than a non trivial change just for Posix
compliance.

Cheers,
Rafael


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Loading...