Ryan Zezeski via illumos-developer
2014-10-10 02:53:28 UTC
For those that don't know, POSIX1.b added a new form of signals called
realtime signals. There are two major differences between RT signals and
regular signals:
1. RT signals can be queued and should be delivered in FIFO order for
the same signal number.
2. RT signals have an ordering (priority), smaller signal numbers have
greater priority and shall be delivered first if multiple different
types of RT signals are queued. [1]
If you who subscribe to illumos-discuss you may have noticed I brought
this up two months ago when running an example program from Steven's
Unix Network Programming Vol 2 [2, 3].
I returned to this problem a few days ago and have done a lot of
digging only to end up a bit miffed. I think there is a legitimate
bug, but as well know signals are batshit crazy and I wanted to check
with the experts before going any further.
The first thing I did was check the man pages for Illumos, FreeBSD,
and Linux and they all agree with the POSIX standard, lower RT number
means higher priority. Then I ran the Steven's program against all
three kernels and _every single one_ exhibited priority inversion. I
then did a lot of dtrace'ing to figure out that the reversal was being
caused by what I can only call "accidental recursion" between the
kernel and libc signal delivery mechanism (more on that soon).
After all that I just couldn't accept the fact that Stevens and I are
the only damn people on Earth to notice this bug. I went googling for
other reports in the field and sure enough I came across a post on
LKML from 2007 describing the exact same issue [4]! The solution
there? To make sure you set the sigaction.sa_mask to block all other
RT signals so that the handler is not interrupted. Sure enough, this
"fix" works. But I wonder if this should be implicit behavior?
Today I wrote up a new test program (rtsignal.c) which is a mix of the
Steven's example and the LKML example program. It takes one argument
of either "mask" or "dont" to determine whether you want to mask
signals while handling or not. If you run with "dont" you will notice
that priority is inverted. If you run with "mask" priority is
correct. Why is this happening? To find out I recommend using my
dtrace script fsig6.d to trace rtsignal while executing it. Both the
source files are attached to this email and can be found at these
URLs:
http://zinascii.com/annex/rtsignal.c
http://zinascii.com/annex/fsig6.d
If you are too lazy to run my code then you can look at my output when
I run with "dont" and "mask":
http://zinascii.com/annex/fsig6.out
http://zinascii.com/annex/fsig6-mask.out
Notice in fsig6.out that fsig (the function responsible for pulling
next signal off the queue) does indeed start in the correct order: 63,
64, then 65. But wait, notice also that the user stack keeps growing
with what looks like a recursive call of the libc signal delivery.
Also notice there is a syscall between each recursion which happens to
be lwp_sigmask. If you look at the code for that syscall you'll
notice it sets t_hold and then calls sigcheck to determine if
t_sig_check should be called.
The problem is that libc is calling a syscall to set a mask but this
syscall also has the side effect of turning on the sigcheck and then
when the syscall exits it notices a signal is pending and repeats the
entire process from the beginning. So basically the kernel and
userland and bouncing back and forth until all unique RT signals have
been seen, added to t_hold, and then at that point the callstack has
effectively LIFO'd the signal priority.
That is, the kernel gets it right but then libc messes it all up.
If you compare this to fsig6-mask.out you'll see that the user stack
doesn't grow and t_sig_check is only set when it should be.
So, at this point I'm now looking for feedback on whether this is a
bug or not. I think it's a bug because POSIX says nothing about
having to set sigaction.sa_mask to obtain the correct priority. Also,
at this point in the code you are in the middle of delivering an RT
signal, a lesser priority RT signal shouldn't be allowed to come in a
usurp it just because of an implementation detail (i.e. the fact that
a syscall was made and there is a stack discipline at play). Finally,
for Stevens and I at least, it violates principal of least surprise.
That said, I'm new to systems programming and I'm not going to lose
sleep if the majority declares "functions as designed". I did this
mostly out of fun and to learn something new. It's my gift to you, do
with this information what you wish. Besides, FreeBSD and Linux
kernels have the same problem.
Finally, if others agree this is a bug, I have an idea for a solution.
Before the handler is invoked call_user_handler() masks off the
current signal being delivered. Add a clause that checks if the
current signal is RT and if it is not only mask off itself but also
every RT signal _larger_ than it. That way it can still be usurped by
lower priority RT signals and non-RT signals (POSIX is clear that
there is no defined order between non-RT and RT signals) but not RT
signals of lesser priority than it.
*wipes brow*
*lets out a sigh*
Okay well that's about it. I'm happy to answer questions or run other
tests.
-Z (rzezeski in IRC)
[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02
[2]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/1:2/20140812194405:89641C78-227A-11E4-8B66-E15F1438A39E/
[3]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/0:2/20140813104524:69E08002-22F8-11E4-AA5C-D62A1474A68B/
[4]: https://lkml.org/lkml/2007/7/11/100
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
realtime signals. There are two major differences between RT signals and
regular signals:
1. RT signals can be queued and should be delivered in FIFO order for
the same signal number.
2. RT signals have an ordering (priority), smaller signal numbers have
greater priority and shall be delivered first if multiple different
types of RT signals are queued. [1]
If you who subscribe to illumos-discuss you may have noticed I brought
this up two months ago when running an example program from Steven's
Unix Network Programming Vol 2 [2, 3].
I returned to this problem a few days ago and have done a lot of
digging only to end up a bit miffed. I think there is a legitimate
bug, but as well know signals are batshit crazy and I wanted to check
with the experts before going any further.
The first thing I did was check the man pages for Illumos, FreeBSD,
and Linux and they all agree with the POSIX standard, lower RT number
means higher priority. Then I ran the Steven's program against all
three kernels and _every single one_ exhibited priority inversion. I
then did a lot of dtrace'ing to figure out that the reversal was being
caused by what I can only call "accidental recursion" between the
kernel and libc signal delivery mechanism (more on that soon).
After all that I just couldn't accept the fact that Stevens and I are
the only damn people on Earth to notice this bug. I went googling for
other reports in the field and sure enough I came across a post on
LKML from 2007 describing the exact same issue [4]! The solution
there? To make sure you set the sigaction.sa_mask to block all other
RT signals so that the handler is not interrupted. Sure enough, this
"fix" works. But I wonder if this should be implicit behavior?
Today I wrote up a new test program (rtsignal.c) which is a mix of the
Steven's example and the LKML example program. It takes one argument
of either "mask" or "dont" to determine whether you want to mask
signals while handling or not. If you run with "dont" you will notice
that priority is inverted. If you run with "mask" priority is
correct. Why is this happening? To find out I recommend using my
dtrace script fsig6.d to trace rtsignal while executing it. Both the
source files are attached to this email and can be found at these
URLs:
http://zinascii.com/annex/rtsignal.c
http://zinascii.com/annex/fsig6.d
If you are too lazy to run my code then you can look at my output when
I run with "dont" and "mask":
http://zinascii.com/annex/fsig6.out
http://zinascii.com/annex/fsig6-mask.out
Notice in fsig6.out that fsig (the function responsible for pulling
next signal off the queue) does indeed start in the correct order: 63,
64, then 65. But wait, notice also that the user stack keeps growing
with what looks like a recursive call of the libc signal delivery.
Also notice there is a syscall between each recursion which happens to
be lwp_sigmask. If you look at the code for that syscall you'll
notice it sets t_hold and then calls sigcheck to determine if
t_sig_check should be called.
The problem is that libc is calling a syscall to set a mask but this
syscall also has the side effect of turning on the sigcheck and then
when the syscall exits it notices a signal is pending and repeats the
entire process from the beginning. So basically the kernel and
userland and bouncing back and forth until all unique RT signals have
been seen, added to t_hold, and then at that point the callstack has
effectively LIFO'd the signal priority.
That is, the kernel gets it right but then libc messes it all up.
If you compare this to fsig6-mask.out you'll see that the user stack
doesn't grow and t_sig_check is only set when it should be.
So, at this point I'm now looking for feedback on whether this is a
bug or not. I think it's a bug because POSIX says nothing about
having to set sigaction.sa_mask to obtain the correct priority. Also,
at this point in the code you are in the middle of delivering an RT
signal, a lesser priority RT signal shouldn't be allowed to come in a
usurp it just because of an implementation detail (i.e. the fact that
a syscall was made and there is a stack discipline at play). Finally,
for Stevens and I at least, it violates principal of least surprise.
That said, I'm new to systems programming and I'm not going to lose
sleep if the majority declares "functions as designed". I did this
mostly out of fun and to learn something new. It's my gift to you, do
with this information what you wish. Besides, FreeBSD and Linux
kernels have the same problem.
Finally, if others agree this is a bug, I have an idea for a solution.
Before the handler is invoked call_user_handler() masks off the
current signal being delivered. Add a clause that checks if the
current signal is RT and if it is not only mask off itself but also
every RT signal _larger_ than it. That way it can still be usurped by
lower priority RT signals and non-RT signals (POSIX is clear that
there is no defined order between non-RT and RT signals) but not RT
signals of lesser priority than it.
*wipes brow*
*lets out a sigh*
Okay well that's about it. I'm happy to answer questions or run other
tests.
-Z (rzezeski in IRC)
[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02
[2]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/1:2/20140812194405:89641C78-227A-11E4-8B66-E15F1438A39E/
[3]: http://www.listbox.com/member/archive/182180/2014/08/search/cmVhbHRpbWU/sort/time_rev/page/1/entry/0:2/20140813104524:69E08002-22F8-11E4-AA5C-D62A1474A68B/
[4]: https://lkml.org/lkml/2007/7/11/100
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com