Discussion:
[developer] Illumos 5224 and compiler reordering function calls
Dan McDonald via illumos-developer
2014-10-22 19:19:27 UTC
Permalink
This is a long note. Please be patient, as it's fascinating. TLDR, I'm curious if the changes at http://kebe.com/~danmcd/webrevs/5224/ are sufficient? You should read on, however.

Community member Richard PALO reported a bug to us, and I told him to open an illumos bug for it. He did, and it's the aforementioned 5224. He put a lot of detail in there. I'm going to attempt to simplify it a bit to get to what I discovered, and sorta fixed with the help of Rich Lowe.

Richard P was trying to get rounding of his floating point numbers when printing. They were failing. One of his many test programs in 5224 spews out lots of failures over the course of many tests. Here's one example failure:

failure:f precision 17 for 01/03 => +0.33333333333333332 (+0.333333333333333314830)

See how the rounded version's last digit goes to 2, but the longer version should really force it to 1? Yeah, that's a problem.

Richard P did truss on libc, and discovered that newer versions of libc (anything after oi_151a9) would fail. He also discovered that there was a problem with libc's __mul_set(), which printf's floating-point-to-printable conversions end up using. A disassembly of old __mul_set() and newer __mul_set() shows disturbing things.

Here's the C source for 32-bit:

/*
* Multiplies two normal or subnormal doubles, returns result and exceptions.
*/
double
__mul_set(double x, double y, int *pe) {
extern void _putsw(), _getsw();
int sw;
double z;

_putsw(0); /* XXX KEBE SAYS WATCH */
z = x * y; /* THIS ORDERING */
_getsw(&sw); /* CAREFULLY! */
if ((sw & 0x3f) == 0) {
*pe = 0;
} else {
/* Result may not be exact. */
*pe = 1;
}
return (z);
}

Note the ordering of the _putsw(), the multiply, and the _getsw(). For this function to work, they MUST execute in that order. And they do on OI 151a9:

__mul_set+0x16: 6a 00 pushl $0x0
__mul_set+0x18: e8 cb 06 00 00 call +0x6cb <_putsw> /* XXX KEBE SAYS CALL... */
__mul_set+0x1d: 83 c4 10 addl $0x10,%esp
__mul_set+0x20: dd 45 08 fldl 0x8(%ebp)
__mul_set+0x23: dc 4d 10 fmull 0x10(%ebp) /* XXX KEBE SAYS MULTIPLY... */
__mul_set+0x26: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x29: 9b fwait
__mul_set+0x2a: 83 ec 0c subl $0xc,%esp
__mul_set+0x2d: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x30: 50 pushl %eax
__mul_set+0x31: e8 a2 06 00 00 call +0x6a2 <_getsw> /* AND CALL AGAIN */

But on ALMOST ANYTHING more recent (OmniOS of any supported release, SmartOS, OI Hipster), we get this:

__mul_set+0x25: e8 3f 05 00 00 call +0x53f <_putsw> /* CALL... */
__mul_set+0x2a: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x2d: 89 04 24 movl %eax,(%esp)
__mul_set+0x30: e8 24 05 00 00 call +0x524 <_getsw> /* ... CALL AGAIN?!?! */
__mul_set+0x35: 8b 45 f4 movl -0xc(%ebp),%eax
__mul_set+0x38: 83 e0 3f andl $0x3f,%eax
__mul_set+0x3b: 0f 95 c2 setne %dl
__mul_set+0x3e: 0f b6 d2 movzbl %dl,%edx
__mul_set+0x41: 8b 45 18 movl 0x18(%ebp),%eax
__mul_set+0x44: 89 10 movl %edx,(%eax)
__mul_set+0x46: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x49: dc 4d d8 fmull -0x28(%ebp) /* THEN MULTIPLY?!?!? */

The source hasn't changed since the OpenSolaris days. Since the compiler doesn't know z=x*y affects the results of _getsw(), it reserves the right to reorder, and apparently our gcc4.4 that builds illumos-gate exercises that right. One person who did not see this reports using a different compiler to build, and that clearly can make a difference.

Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to nudge the compiler in the right direction. After that, I now get:

__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */

Where the multiply actually happens in the middle.

The webrev for this change is here:

http://kebe.com/~danmcd/webrevs/5224/

It involves fixing both instances of __mul_set and __div_set. I couldn't find any other set-register calls like this, but there may be more.

Is there a better solution to this? The C code mentions that these functions should be inline assembly. I'm willing to consider that, especially in the light of compilers may change behavior out from underneath us. :(

Any suggestions are welcome!

Thanks,
Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-22 19:38:00 UTC
Permalink
On Wed, Oct 22, 2014 at 03:19:27PM -0400, Dan McDonald via illumos-developer wrote:
...

Fascinating bug!

...
Post by Dan McDonald via illumos-developer
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
I don't think this is the right solution. What you really want is something
to ensure ordering of statements - a barrier of sorts.

...
Post by Dan McDonald via illumos-developer
Is there a better solution to this? The C code mentions that these
functions should be inline assembly. I'm willing to consider that,
especially in the light of compilers may change behavior out from
underneath us. :(
I don't know if we have a compiler-optimization-barrier macro of some sort
(we probably should). I'm not really thrilled with switching these
functions to assembly - C is infinitely easier to read.

Jeff.
--
You measure democracy by the freedom it gives its dissidents, not the
freedom it gives its assimilated conformists.
- Abbie Hoffman


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Marcel Telka via illumos-developer
2014-10-22 19:56:03 UTC
Permalink
Post by Josef 'Jeff' Sipek via illumos-developer
...
Fascinating bug!
...
Post by Dan McDonald via illumos-developer
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
I don't think this is the right solution. What you really want is something
to ensure ordering of statements - a barrier of sorts.
I think the volatile keyword together with the function call (the sequence
point) will give us the required "barrier".
--
+-------------------------------------------+
| Marcel Telka e-mail: ***@telka.sk |
| homepage: http://telka.sk/ |
| jabber: ***@jabber.sk |
+-------------------------------------------+


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-23 12:00:00 UTC
Permalink
Post by Marcel Telka via illumos-developer
Post by Josef 'Jeff' Sipek via illumos-developer
...
Fascinating bug!
...
Post by Dan McDonald via illumos-developer
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
I don't think this is the right solution. What you really want is something
to ensure ordering of statements - a barrier of sorts.
I think the volatile keyword together with the function call (the sequence
point) will give us the required "barrier".
I still don't think that's the right solution. Yes, it accomplishes what we
ultimately want (prevent the compiler from reordering things), but declaring
the variable volatile tells the compiler (and the person reading the code)
that the *variable* is special. This is not the case. In this case, the
variable is not special at all - what is special is the exact ordering of
operations.

Jeff.
--
C is quirky, flawed, and an enormous success.
- Dennis M. Ritchie.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Igor Kozhukhov via illumos-developer
2014-10-23 12:20:01 UTC
Permalink
On 23/10/14 16:00, "Josef 'Jeff' Sipek via illumos-developer"
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Dan McDonald via illumos-developer
Post by Josef 'Jeff' Sipek via illumos-developer
On Wed, Oct 22, 2014 at 03:19:27PM -0400, Dan McDonald via
...
Fascinating bug!
...
Post by Dan McDonald via illumos-developer
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in
order to
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Dan McDonald via illumos-developer
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw>
/* CALL */
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Dan McDonald via illumos-developer
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /*
MULTIPLY */
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Dan McDonald via illumos-developer
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw>
/* CALL AGAIN */
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Dan McDonald via illumos-developer
Where the multiply actually happens in the middle.
I don't think this is the right solution. What you really want is
something
Post by Josef 'Jeff' Sipek via illumos-developer
to ensure ordering of statements - a barrier of sorts.
I think the volatile keyword together with the function call (the sequence
point) will give us the required "barrier".
I still don't think that's the right solution. Yes, it accomplishes what we
ultimately want (prevent the compiler from reordering things), but declaring
the variable volatile tells the compiler (and the person reading the code)
that the *variable* is special. This is not the case. In this case, the
variable is not special at all - what is special is the exact ordering of
operations.
I have no this problem with gcc48 builds on my dilos env.
Probably - you ave found a first (but not last) problem with gcc44 builds.
Try to update code for old compiler - it¹s not good way, I think.
It will be workaround/special fix for old gcc44 compiler and have to fix
it again in another place where it will be available.

Did you try to test this issue by SunStudio build?
I think - SunStudio build has no this issue too, but I didn¹t check it.

Volatile variable in this place - it¹s a not good fix for it - it¹s hack -
and need more tests for it, because changes in libc will have impacts to
all others components and to stability.


-Igor
Post by Josef 'Jeff' Sipek via illumos-developer
Jeff.
--
C is quirky, flawed, and an enormous success.
- Dennis M. Ritchie.
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21175093-0a5a4566
https://www.listbox.com/member/?&
4849
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Keith Wesolowski via illumos-developer
2014-10-23 14:41:02 UTC
Permalink
Post by Igor Kozhukhov via illumos-developer
I have no this problem with gcc48 builds on my dilos env.
I know you're a troll, but I can't allow something like this to end up
in the list archives where someone later might be allowed to think it's
sensible. It isn't.

First, 4.4.4 is the ONLY supported gcc version for building ON. illumos
is open source, so you're welcome to do whatever you like, but please
don't pretend that your anecdote is policy. If and when you do all the
work required not only to port all of ON to gcc 4.8 (or whatever) and to
fix all the new bugs in 4.8 itself, but also to diligently and
methodically verify as well as can be achieved that the change
introduces no regressions, then you should submit an RTI to change the
supported compiler.
Post by Igor Kozhukhov via illumos-developer
Probably - you ave found a first (but not last) problem with gcc44 builds.
Probably not. Compiler behaviour is extremely nonlinear; it is
sensitive to seemingly meaningless change. A tiny increase or decrease
in register pressure, the addition or removal of a single instruction,
or the addition of a keyword in a header file somewhere can all cause
behaviour to change. That in fact seems to be what happened here.
There is no reason to believe the same or similar problems cannot occur
with newer compilers (or older); indeed, changing versions and flags
induces far more radical changes in generated code.

The reality is that there are bugs in ON. Compiler behaviour can and
does mask many of those bugs, often for years. That does not mean that
the problem is the compiler; on the contrary, the solution is to fix the
code so that it is strictly correct. Strictly correct code is the most
robust in the face of unexpected compiler behaviour. Correct code is
explicit about its dependencies and expectations and expresses those in
the ways defined by applicable standards.
Post by Igor Kozhukhov via illumos-developer
Try to update code for old compiler - it¹s not good way, I think.
You think wrong. The code is not being updated to accommodate a quirk
of an old compiler; it is being fixed (or replaced) to be correct so
that any standard-compliant compiler can and will generate reliably
correct assembly from it.
Post by Igor Kozhukhov via illumos-developer
It will be workaround/special fix for old gcc44 compiler and have to fix
it again in another place where it will be available.
Completely and absolutely false. This could not be more wrong if you
put it on a Pedo Bear T-shirt and wore it to Disneyland.
Post by Igor Kozhukhov via illumos-developer
Did you try to test this issue by SunStudio build?
I think - SunStudio build has no this issue too, but I didn¹t check it.
So what? As I've explained, even tiny changes in code and environment
can and will radically affect compiler output. That some piece of
not-quite-100%-correct code happens to yield the desired output from a
particular compiler under particular circumstances does not mean that
the code is correct nor that another compiler is at fault. And I bet
you any amount you care to name that eventually 4.8 will generate the
wrong assembly for this code, too; it's entirely within its rights to do
so.
Post by Igor Kozhukhov via illumos-developer
Volatile variable in this place - it¹s a not good fix for it - it¹s hack -
The problem with using volatile here is that the semantics of volatile
are relatively obscure, which hampers readability and increases the risk
that someone will later break this again. That risk can be mitigated by
a good comment. Thanks to Brad for actually reading the relevant
standard, it's quite clear that in fact this *is* a complete and correct
solution; it is not a hack. It may or may not be the one we prefer,
however.
Post by Igor Kozhukhov via illumos-developer
and need more tests for it, because changes in libc will have impacts to
all others components and to stability.
These are just words; they don't mean anything in the manner you've
assembled them. Every change requires careful testing. The larger the
change, the more testing. Changing compiler versions -- especially
without fixing a known and well-understood bug in the code! -- requires
several orders of magnitude more testing than either changing or
replacing a single function in libc. Any change here can be fully
validated by (a) reading the generated assembly and (b) verifying the
test case provided with the original bug report. It's actually very
easy to ensure that this fix is correct and complete. It is, however,
impossible for anyone to verify that using some newer gcc yields correct
code throughout the millions of lines of code in ON -- including at
least these few that we already know for sure are wrong.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-developer
2014-10-23 14:51:50 UTC
Permalink
Taking one bit of the firestorm to make sure I understand the overarching feelings...
Post by Keith Wesolowski via illumos-developer
The problem with using volatile here is that the semantics of volatile
are relatively obscure, which hampers readability and increases the risk
that someone will later break this again. That risk can be mitigated by
a good comment. Thanks to Brad for actually reading the relevant
standard, it's quite clear that in fact this *is* a complete and correct
solution; it is not a hack. It may or may not be the one we prefer,
however.
So what I'm hearing is:

* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.

* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.

I have two workspaces: One with "volatile" and one with assembly. I have yet to do the amd64 version(s), but the i386 ones *appear* to work. I always found the x86/x64 instruction sets to be terrible, and now I'm remembering why. :)

I'll have dual webrevs up soon, and I'll make sure the "volatile" keyword version is properly commented.

Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Keith Wesolowski via illumos-developer
2014-10-23 15:01:52 UTC
Permalink
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
Post by Dan McDonald via illumos-developer
I have two workspaces: One with "volatile" and one with assembly. I have yet to do the amd64 version(s), but the i386 ones *appear* to work. I always found the x86/x64 instruction sets to be terrible, and now I'm remembering why. :)
I'll have dual webrevs up soon, and I'll make sure the "volatile" keyword version is properly commented.
Thanks. I personally prefer the all-assembly solution but will probably
defer to popular opinion as long as the preferred solution is correct.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-23 15:09:05 UTC
Permalink
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.

Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-developer
2014-10-23 15:51:39 UTC
Permalink
Given the examples already given that demonstrate that the implementation depends on side effects in registers, it strikes me that the only correct answer is assembly. Anything less feels like a partial fix, at least on some architectures.

I'm hopeful this problem (or at least instance of it) is specific to Intel and we won't have to change or validate sparc. I have not looked though.

Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.
Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-developer
2014-10-23 15:52:52 UTC
Permalink
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation depends on side effects in registers, it strikes me that the only correct answer is assembly. Anything less feels like a partial fix, at least on some architectures.
I'm hopeful this problem (or at least instance of it) is specific to Intel and we won't have to change or validate sparc. I have not looked though.
SPARC's is already in assembly:

http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/inc/base_inlines.h#44

Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Marcel Telka via illumos-developer
2014-10-23 15:56:01 UTC
Permalink
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation depends on side effects in registers, it strikes me that the only correct answer is assembly. Anything less feels like a partial fix, at least on some architectures.
I'm hopeful this problem (or at least instance of it) is specific to Intel and we won't have to change or validate sparc. I have not looked though.
Yes. For SPARC it is implemented as inline assembler:

http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/inc/base_inlines.h#42
--
+-------------------------------------------+
| Marcel Telka e-mail: ***@telka.sk |
| homepage: http://telka.sk/ |
| jabber: ***@jabber.sk |
+-------------------------------------------+


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-23 15:56:34 UTC
Permalink
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation
depends on side effects in registers, it strikes me that the only correct
answer is assembly. Anything less feels like a partial fix, at least on
some architectures.
How about asm-inlining the 3(?) relevant instructions in otherwise C
function (credit should go to richlowe)? This will make the code easier to
read (C vs. asm) and the ordering of the set/mult/get would be obvious and
fixed. It should be simple enough that even studio's rudimentary inline asm
facilities should handle it.

Jeff.
Post by Garrett D'Amore via illumos-developer
I'm hopeful this problem (or at least instance of it) is specific to Intel
and we won't have to change or validate sparc. I have not looked though.
Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.
Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21178391-8a85587e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
--
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-developer
2014-10-23 16:01:25 UTC
Permalink
Seems reasonable. I'm never sure how portable across compilers inline assembly is but assuming that it can be done in a way that is portable to studio and gcc (and hopefully by consequence also clang) then I'm good with it. (Yeah I know clang isn't a supported compiler yet. I remain hopeful of changing that someday.)

Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation
depends on side effects in registers, it strikes me that the only correct
answer is assembly. Anything less feels like a partial fix, at least on
some architectures.
How about asm-inlining the 3(?) relevant instructions in otherwise C
function (credit should go to richlowe)? This will make the code easier to
read (C vs. asm) and the ordering of the set/mult/get would be obvious and
fixed. It should be simple enough that even studio's rudimentary inline asm
facilities should handle it.
Jeff.
Post by Garrett D'Amore via illumos-developer
I'm hopeful this problem (or at least instance of it) is specific to Intel
and we won't have to change or validate sparc. I have not looked though.
Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.
Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21178391-8a85587e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
--
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Gordon Ross via illumos-developer
2014-10-23 18:34:49 UTC
Permalink
I'm not sure C plus asm inlines is preferable to a plain old assembly
functions here.


On Thu, Oct 23, 2014 at 11:56 AM, Josef 'Jeff' Sipek via
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation
depends on side effects in registers, it strikes me that the only correct
answer is assembly. Anything less feels like a partial fix, at least on
some architectures.
How about asm-inlining the 3(?) relevant instructions in otherwise C
function (credit should go to richlowe)? This will make the code easier to
read (C vs. asm) and the ordering of the set/mult/get would be obvious and
fixed. It should be simple enough that even studio's rudimentary inline asm
facilities should handle it.
Jeff.
Post by Garrett D'Amore via illumos-developer
I'm hopeful this problem (or at least instance of it) is specific to Intel
and we won't have to change or validate sparc. I have not looked though.
Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.
Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21178391-8a85587e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
--
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175074-7782178a
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-23 18:46:56 UTC
Permalink
Post by Gordon Ross via illumos-developer
I'm not sure C plus asm inlines is preferable to a plain old assembly
functions here.
Why? There is enough additional logic that's *way* easier to read/write in
C and the compiler will do a good job generating the asembly for it.

Jeff.
Post by Gordon Ross via illumos-developer
On Thu, Oct 23, 2014 at 11:56 AM, Josef 'Jeff' Sipek via
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Garrett D'Amore via illumos-developer
Given the examples already given that demonstrate that the implementation
depends on side effects in registers, it strikes me that the only correct
answer is assembly. Anything less feels like a partial fix, at least on
some architectures.
How about asm-inlining the 3(?) relevant instructions in otherwise C
function (credit should go to richlowe)? This will make the code easier to
read (C vs. asm) and the ordering of the set/mult/get would be obvious and
fixed. It should be simple enough that even studio's rudimentary inline asm
facilities should handle it.
Jeff.
Post by Garrett D'Amore via illumos-developer
I'm hopeful this problem (or at least instance of it) is specific to Intel
and we won't have to change or validate sparc. I have not looked though.
Sent from my iPhone
Post by Josef 'Jeff' Sipek via illumos-developer
Post by Keith Wesolowski via illumos-developer
Post by Dan McDonald via illumos-developer
* "volatile" *is* a by-the-language-spec sufficient answer, but confusing w/o a good comment.
* It still may be not the ideal solution, and that any work I have, say, in an all-assembly version of a few of these libc functions should continue.
That's my take. I agree with Jeff that the standard here is not
conducive to readability. What we want is a way to attach the
no-reorder/abstract-machine-only semantics to a function call, but we
don't really have that.
FWIW, I did more reading and it turns out that studio supposedly supports
the same inline-asm trick to prevent optimizations.
Jeff.
--
Bad pun of the week: The formula 1 control computer suffered from a race
condition
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21178391-8a85587e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
--
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175074-7782178a
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21178391-8a85587e
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
--
Research, n.:
Consider Columbus:
He didn't know where he was going.
When he got there he didn't know where he was.
When he got back he didn't know where he had been.
And he did it all on someone else's money.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Marcel Telka via illumos-developer
2014-10-23 15:04:53 UTC
Permalink
Post by Keith Wesolowski via illumos-developer
Post by Igor Kozhukhov via illumos-developer
Volatile variable in this place - it¹s a not good fix for it - it¹s hack -
The problem with using volatile here is that the semantics of volatile
are relatively obscure, which hampers readability and increases the risk
that someone will later break this again. That risk can be mitigated by
a good comment. Thanks to Brad for actually reading the relevant
standard, it's quite clear that in fact this *is* a complete and correct
solution; it is not a hack. It may or may not be the one we prefer,
however.
Yes, the volatile keyword is enough to make sure the compiled code is in the
right order. But, as I already mentioned, we cannot consider the addition of
the volatile keyword as the good enough solution.

Here is the __mul_set() implementation for amd64:

40double
41__mul_set(double x, double y, int *pe) {
42 extern void _putmxcsr(), _getmxcsr();
43 int csr;
44 double z; <=== assuming this is volatile
45
46 _putmxcsr(CSR_DEFAULT);
47 z = x * y;
48 _getmxcsr(&csr);
49 if ((csr & 0x3f) == 0) {
50 *pe = 0;
51 } else {
52 /* Result may not be exact. */
53 *pe = 1;
54 }
55 return (z);
56}

The above piece of the code assumes the compiled multiplication/assignment at
line 47 updates the MXCSR register in the predictable way. Our assumption is
WRONG because we do not know how exactly will the compiler compile such a line.
For example, the compiler is free to clear MXCSR at the end of line 47 (and
before the _getmxcsr() is called).
--
+-------------------------------------------+
| Marcel Telka e-mail: ***@telka.sk |
| homepage: http://telka.sk/ |
| jabber: ***@jabber.sk |
+-------------------------------------------+


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Igor Kozhukhov via illumos-developer
2014-10-23 15:48:40 UTC
Permalink
Post by Keith Wesolowski via illumos-developer
Post by Igor Kozhukhov via illumos-developer
I have no this problem with gcc48 builds on my dilos env.
I know you're a troll, but I can't allow something like this to end up
in the list archives where someone later might be allowed to think it's
sensible. It isn't.
I’m not a troll - I just wanted to show: that it is working with another
solution/env, but you, as always, try to speak about: i’m wrong always and
only you can be correct here :) I think, it’s not correct, but it is my
opinion.
Post by Keith Wesolowski via illumos-developer
First, 4.4.4 is the ONLY supported gcc version for building ON.
It is supported by illumos advocates for RTI process, but it is not a way
to say to others: do not try to use another compiler :)
You will be ignored all time with your questions - I think it’s not
correct for community …
If you have no time take a look another compiler and another changes,
probably my work/comments will be interest to others ...
Post by Keith Wesolowski via illumos-developer
illumos
is open source, so you're welcome to do whatever you like, but please
don't pretend that your anecdote is policy.
I’m not pretend on policy :) I try to show that it is working with another
compiler on another env if try to spend a time for update sources for it -
nothing else.
Post by Keith Wesolowski via illumos-developer
If and when you do all the
work required not only to port all of ON to gcc 4.8 (or whatever) and to
fix all the new bugs in 4.8 itself, but also to diligently and
methodically verify as well as can be achieved that the change
introduces no regressions, then you should submit an RTI to change the
supported compiler.
You said about supported compiler again.
You want to say - all others changes tested by another compiler will be
ignored if these changes will not impact current supported gcc version?
If it is true - I’d like to see comment about it on RTI page.
Post by Keith Wesolowski via illumos-developer
Post by Igor Kozhukhov via illumos-developer
Probably - you ave found a first (but not last) problem with gcc44 builds.
Probably not. Compiler behaviour is extremely nonlinear; it is
sensitive to seemingly meaningless change. A tiny increase or decrease
in register pressure, the addition or removal of a single instruction,
or the addition of a keyword in a header file somewhere can all cause
behaviour to change. That in fact seems to be what happened here.
There is no reason to believe the same or similar problems cannot occur
with newer compilers (or older); indeed, changing versions and flags
induces far more radical changes in generated code.
One interest question here - why all compiler vendors are working on new
compiler updates and applications try to move to new versions of compilers
instead of use what they have 10 years ago? :)
The World is changing.
Post by Keith Wesolowski via illumos-developer
The reality is that there are bugs in ON. Compiler behaviour can and
does mask many of those bugs, often for years. That does not mean that
the problem is the compiler; on the contrary, the solution is to fix the
code so that it is strictly correct. Strictly correct code is the most
robust in the face of unexpected compiler behaviour. Correct code is
explicit about its dependencies and expectations and expresses those in
the ways defined by applicable standards.
Post by Igor Kozhukhov via illumos-developer
Try to update code for old compiler - it¹s not good way, I think.
You think wrong.
I can be agree here, I have mistake in description(English is not my
native language)
I wanted to say - try to ignore new compilers as a reference for try to
find a problem - it is not good way.
If another compiler is working - we can use it for analysing of problem
and identify where we are failed: in real sources or probably on compiler
generated code. Probably we missed addition build flags or something else.
To have additional tools can help - why not?
Post by Keith Wesolowski via illumos-developer
The code is not being updated to accommodate a quirk
of an old compiler; it is being fixed (or replaced) to be correct so
that any standard-compliant compiler can and will generate reliably
correct assembly from it.
Post by Igor Kozhukhov via illumos-developer
It will be workaround/special fix for old gcc44 compiler and have to fix
it again in another place where it will be available.
Completely and absolutely false. This could not be more wrong if you
put it on a Pedo Bear T-shirt and wore it to Disneyland.
Why it false? You want to say - it is only one place where we can see bug
in code and gcc44 can generate incorrect object?
I think - it is not only one place where we can find issue with reordering
It is first place, but it’s not only one, I think.
Post by Keith Wesolowski via illumos-developer
Post by Igor Kozhukhov via illumos-developer
Did you try to test this issue by SunStudio build?
I think - SunStudio build has no this issue too, but I didn¹t check it.
So what? As I've explained, even tiny changes in code and environment
can and will radically affect compiler output. That some piece of
not-quite-100%-correct code happens to yield the desired output from a
particular compiler under particular circumstances does not mean that
the code is correct nor that another compiler is at fault. And I bet
you any amount you care to name that eventually 4.8 will generate the
wrong assembly for this code, too; it's entirely within its rights to do
so.
Post by Igor Kozhukhov via illumos-developer
Volatile variable in this place - it¹s a not good fix for it - it¹s hack -
The problem with using volatile here is that the semantics of volatile
are relatively obscure, which hampers readability and increases the risk
that someone will later break this again. That risk can be mitigated by
a good comment. Thanks to Brad for actually reading the relevant
standard, it's quite clear that in fact this *is* a complete and correct
solution; it is not a hack. It may or may not be the one we prefer,
however.
Post by Igor Kozhukhov via illumos-developer
and need more tests for it, because changes in libc will have impacts to
all others components and to stability.
These are just words; they don't mean anything in the manner you've
assembled them. Every change requires careful testing. The larger the
change, the more testing. Changing compiler versions -- especially
without fixing a known and well-understood bug in the code! -- requires
several orders of magnitude more testing than either changing or
replacing a single function in libc. Any change here can be fully
validated by (a) reading the generated assembly and (b) verifying the
test case provided with the original bug report. It's actually very
easy to ensure that this fix is correct and complete. It is, however,
impossible for anyone to verify that using some newer gcc yields correct
code throughout the millions of lines of code in ON -- including at
least these few that we already know for sure are wrong.
I didn’t say about - easy to move to new compiler.
Of course - need more additional tests on my or another environment with
another compiler/tools - and I tried to ask on IRC about some
scenarios/tests code where I can check it. And if you are remember, your
test code is working on my env, but with additional build flag for another
compiler - and I think it’s OK. I’d like try to test more scenarios, but
I’m working on my env only by my free time,
I didn’t say about - my env is ideal.
I only said - it’s working well to me with my tasks.
I have my env with updates more 2 years and it is working :)
I tried only provide info/feedback about my additional work on
looking/testing new solutions/tools and result can be reused.
But with your concern it’s mean - no need try take a look new
compilers/tools because this way will be failed and ignored all time.
It’s a pity if it is true with illumos community ...

-Igor




-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Josef 'Jeff' Sipek via illumos-developer
2014-10-23 12:19:53 UTC
Permalink
Post by Josef 'Jeff' Sipek via illumos-developer
...
Fascinating bug!
...
Post by Dan McDonald via illumos-developer
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
I don't think this is the right solution. What you really want is something
to ensure ordering of statements - a barrier of sorts.
Ok, some searching online [1] revealed that we may want something like this:

#define compiler_barrier() asm volatile("" ::: "memory")

double __mul_set(double x, double y, int *pe)
{
...
double z;

_putmxcsr(CSR_DEFAULT);
compiler_barrier();
z = x * y;
compiler_barrier();
_getmxcsr(&csr);
...
return (z);
}

This will force the compiler to keep the function calls and multiplication
in order. Of course, I don't know what incantation is necessary for Studio
(maybe just an empty definition since Studio doesn't reorder as aggressively
as gcc).

Jeff.

[1] http://preshing.com/20120625/memory-ordering-at-compile-time/
--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
- Brian W. Kernighan


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Keith Wesolowski via illumos-developer
2014-10-23 15:05:12 UTC
Permalink
Post by Josef 'Jeff' Sipek via illumos-developer
#define compiler_barrier() asm volatile("" ::: "memory")
This really only works because the assembly fragment is declared
volatile. gcc explicitly documents that such a fragment will not be
reordered. Without that, this just causes reloading of all registers
across the fragment. If x, y, and z are all in registers only (which
they may be since nothing takes the address of them), then reloading has
no effect and the dependency on memory does not create an ordering
constraint with respect to arbitrary function calls. This is pretty
subtle, and is why I mentioned taking the address of z in my previous
mail. I do agree that this is sufficient, but it also seems needlessly
expensive: we want the restriction on reordering, but we don't want or
need to induce reloading.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Marcel Telka via illumos-developer
2014-10-22 19:54:29 UTC
Permalink
Post by Dan McDonald via illumos-developer
This is a long note. Please be patient, as it's fascinating. TLDR, I'm
curious if the changes at http://kebe.com/~danmcd/webrevs/5224/ are
sufficient? You should read on, however.
LGTM.
--
+-------------------------------------------+
| Marcel Telka e-mail: ***@telka.sk |
| homepage: http://telka.sk/ |
| jabber: ***@jabber.sk |
+-------------------------------------------+


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Marcel Telka via illumos-developer
2014-10-22 20:09:16 UTC
Permalink
Post by Dan McDonald via illumos-developer
This is a long note. Please be patient, as it's fascinating. TLDR, I'm
curious if the changes at http://kebe.com/~danmcd/webrevs/5224/ are
sufficient? You should read on, however.
LGTM.
Hm, no. I think this solution seems to be not correct :-(. The problem is
that we cannot be sure that z = x * y; will always be compiled using the FP
instructions. If the multiplication is compiled otherwise (for any reason) the
_getsw(&sw) call will return nonsense.


Thanks.
--
+-------------------------------------------+
| Marcel Telka e-mail: ***@telka.sk |
| homepage: http://telka.sk/ |
| jabber: ***@jabber.sk |
+-------------------------------------------+


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Gary Mills via illumos-developer
2014-10-22 20:05:01 UTC
Permalink
Post by Dan McDonald via illumos-developer
/*
* Multiplies two normal or subnormal doubles, returns result and exceptions.
*/
double
__mul_set(double x, double y, int *pe) {
extern void _putsw(), _getsw();
int sw;
double z;
_putsw(0); /* XXX KEBE SAYS WATCH */
z = x * y; /* THIS ORDERING */
_getsw(&sw); /* CAREFULLY! */
if ((sw & 0x3f) == 0) {
*pe = 0;
} else {
/* Result may not be exact. */
*pe = 1;
}
return (z);
}
[...]
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
--
-Gary Mills- -refurb- -Winnipeg, Manitoba, Canada-


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-developer
2014-10-22 20:05:55 UTC
Permalink
Post by Gary Mills via illumos-developer
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
Exactly my question. I have my doubts "volatile" alone is sufficient. I hope others with more experience here can and will speak up.

Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Igor Kozhukhov via illumos-developer
2014-10-22 20:10:25 UTC
Permalink
On 23/10/14 00:05, "Dan McDonald via illumos-developer"
Post by Dan McDonald via illumos-developer
Post by Gary Mills via illumos-developer
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
Exactly my question. I have my doubts "volatile" alone is sufficient. I
hope others with more experience here can and will speak up.
Maybe you can try to check with additional flag: -fno-reorder-functions
As I remember - it was fixed some similar problems with grub builds by
gcc47/48

-Igor
Post by Dan McDonald via illumos-developer
Dan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21175093-0a5a4566
https://www.listbox.com/member/?&
4849
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Hans Rosenfeld via illumos-developer
2014-10-22 20:14:56 UTC
Permalink
Post by Dan McDonald via illumos-developer
Post by Gary Mills via illumos-developer
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
Exactly my question. I have my doubts "volatile" alone is sufficient. I hope others with more experience here can and will speak up.
My first guess was that this was an aliasing violation, but it doesn't
really look like one. In that case I would have suggested compiling this
stuff with -fno-strict-aliasing. Could be worth trying that anyway.


Hans
--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-developer
2014-10-22 20:18:43 UTC
Permalink
If we're sensitive to the assembly output, why don't we just commit the
assembly itself. It seems perfectly reasonable to use "compiler output" as
the first stage of "hand coded assembly". :-)

On Wed, Oct 22, 2014 at 1:14 PM, Hans Rosenfeld via illumos-developer <
On Wed, Oct 22, 2014 at 04:05:55PM -0400, Dan McDonald via
Post by Dan McDonald via illumos-developer
Post by Gary Mills via illumos-developer
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
Exactly my question. I have my doubts "volatile" alone is sufficient.
I hope others with more experience here can and will speak up.
My first guess was that this was an aliasing violation, but it doesn't
really look like one. In that case I would have suggested compiling this
stuff with -fno-strict-aliasing. Could be worth trying that anyway.
Hans
--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-developer
2014-10-22 20:27:08 UTC
Permalink
If we're sensitive to the assembly output, why don't we just commit the assembly itself. It seems perfectly reasonable to use "compiler output" as the first stage of "hand coded assembly". :-)
I'm leaning toward writing the __{mul,div}_set() as assembly routines. I'll inline the status-register queries as well.

Don't know how long it'll take me, but I'm on it.

Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Keith Wesolowski via illumos-developer
2014-10-22 20:41:44 UTC
Permalink
Post by Dan McDonald via illumos-developer
If we're sensitive to the assembly output, why don't we just commit the assembly itself. It seems perfectly reasonable to use "compiler output" as the first stage of "hand coded assembly". :-)
I'm leaning toward writing the __{mul,div}_set() as assembly routines. I'll inline the status-register queries as well.
Don't know how long it'll take me, but I'm on it.
This seems like a very reasonable solution; it offers durable clarity
here. There is another option for doing this in C, although it's not
really that great from a clarity standpoint. If _putsw() and _getsw()
took z as an (ignored, but not __attribute__((__unused__))) argument,
then the compiler would be forced to recognise the dependency.
Obviously, one could use an intermediate function for this purpose,
although there is some risk if it is in the same translation unit. I am
not 100% convinced that making z volatile is sufficient to protect
against reordering; if z were being consumed, that would be reasonable,
but it is being used solely as the target of an assignment expression
and its previous value is discarded. As such, there would be no
dependency that precludes reordering the multiplication until after the
_getsw(). This would also be safe if z were global and in a different
translation unit from _getsw().

However, a worthy opinion will require spending some quality time with
the standard -- time I cannot offer you today. If someone wants to do
that work and write up an analysis, that would be helpful, but replacing
this with inline assembly makes it moot.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-developer
2014-10-22 20:55:24 UTC
Permalink
Post by Keith Wesolowski via illumos-developer
but replacing
this with inline assembly makes it moot.
Or even as an assembly-implemented function (which was my initial thought)?

Dan



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Keith Wesolowski via illumos-developer
2014-10-23 03:44:24 UTC
Permalink
Post by Dan McDonald via illumos-developer
Post by Keith Wesolowski via illumos-developer
but replacing
this with inline assembly makes it moot.
Or even as an assembly-implemented function (which was my initial thought)?
Sure, either way. As long as the critical sequence is all in assembly,
you're fine.


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Brad Walker via illumos-developer
2014-10-23 03:48:24 UTC
Permalink
I think volatile type qualification will do what you want here.

C11, sec. 6.7.3 (7) says: "An object that has volatile-qualified type may
be modified in ways unknown to the implementation or have other unknown
side effects. Therefore any expression referring to such an object shall be
evaluated strictly according to the rules of the abstract machine, as
described in 5.1.2.3. Furthermore, at every sequence point the value last
stored in the object shall agree with that prescribed by the abstract
machine, except as modified by the unknown factors mentioned previously.
(134) What constitutes an access to an object that has volatile-qualified
type is implementation-defined.

...

134 - A volatile declaration may be used to describe an object
corresponding to a memory-mapped input/output port or an object accessed by
an asynchronously interrupting function. Actions on objects so declared
shall not be ‘‘optimized out’’ by an implementation or reordered except as
permitted by the rules for evaluating expressions."

And I think this is the behavior of the compiler shown once you added the
type qualifier.

-brad w.



On Wed, Oct 22, 2014 at 2:05 PM, Dan McDonald via illumos-developer <
Post by Dan McDonald via illumos-developer
Post by Gary Mills via illumos-developer
Post by Dan McDonald via illumos-developer
The source hasn't changed since the OpenSolaris days. Since the
compiler doesn't know z=x*y affects the results of _getsw(), it
reserves the right to reorder, and apparently our gcc4.4 that builds
illumos-gate exercises that right.
So the side effect is the real problem. Is there a way to ensure
that the compiler knows about that side effect?
Exactly my question. I have my doubts "volatile" alone is sufficient. I
hope others with more experience here can and will speak up
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Richard Yao via illumos-developer
2014-10-23 19:01:00 UTC
Permalink
Post by Dan McDonald via illumos-developer
This is a long note. Please be patient, as it's fascinating. TLDR, I'm curious if the changes at http://kebe.com/~danmcd/webrevs/5224/ are sufficient? You should read on, however.
Community member Richard PALO reported a bug to us, and I told him to open an illumos bug for it. He did, and it's the aforementioned 5224. He put a lot of detail in there. I'm going to attempt to simplify it a bit to get to what I discovered, and sorta fixed with the help of Rich Lowe.
failure:f precision 17 for 01/03 => +0.33333333333333332 (+0.333333333333333314830)
See how the rounded version's last digit goes to 2, but the longer version should really force it to 1? Yeah, that's a problem.
Richard P did truss on libc, and discovered that newer versions of libc (anything after oi_151a9) would fail. He also discovered that there was a problem with libc's __mul_set(), which printf's floating-point-to-printable conversions end up using. A disassembly of old __mul_set() and newer __mul_set() shows disturbing things.
/*
* Multiplies two normal or subnormal doubles, returns result and exceptions.
*/
double
__mul_set(double x, double y, int *pe) {
extern void _putsw(), _getsw();
int sw;
double z;
_putsw(0); /* XXX KEBE SAYS WATCH */
z = x * y; /* THIS ORDERING */
_getsw(&sw); /* CAREFULLY! */
if ((sw & 0x3f) == 0) {
*pe = 0;
} else {
/* Result may not be exact. */
*pe = 1;
}
return (z);
}
This code relies on undefined behavior. It would be better to put sw and
z into a struct. Then pass the address of the struct to `_getsw()`. That
should avoid reordering. That should be sufficient to inform the
compiler that and `_getsw(&sw)` cannot be ordered. Making the struct
volatile for good measure would not hurt.
Post by Dan McDonald via illumos-developer
__mul_set+0x16: 6a 00 pushl $0x0
__mul_set+0x18: e8 cb 06 00 00 call +0x6cb <_putsw> /* XXX KEBE SAYS CALL... */
__mul_set+0x1d: 83 c4 10 addl $0x10,%esp
__mul_set+0x20: dd 45 08 fldl 0x8(%ebp)
__mul_set+0x23: dc 4d 10 fmull 0x10(%ebp) /* XXX KEBE SAYS MULTIPLY... */
__mul_set+0x26: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x29: 9b fwait
__mul_set+0x2a: 83 ec 0c subl $0xc,%esp
__mul_set+0x2d: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x30: 50 pushl %eax
__mul_set+0x31: e8 a2 06 00 00 call +0x6a2 <_getsw> /* AND CALL AGAIN */
__mul_set+0x25: e8 3f 05 00 00 call +0x53f <_putsw> /* CALL... */
__mul_set+0x2a: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x2d: 89 04 24 movl %eax,(%esp)
__mul_set+0x30: e8 24 05 00 00 call +0x524 <_getsw> /* ... CALL AGAIN?!?! */
__mul_set+0x35: 8b 45 f4 movl -0xc(%ebp),%eax
__mul_set+0x38: 83 e0 3f andl $0x3f,%eax
__mul_set+0x3b: 0f 95 c2 setne %dl
__mul_set+0x3e: 0f b6 d2 movzbl %dl,%edx
__mul_set+0x41: 8b 45 18 movl 0x18(%ebp),%eax
__mul_set+0x44: 89 10 movl %edx,(%eax)
__mul_set+0x46: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x49: dc 4d d8 fmull -0x28(%ebp) /* THEN MULTIPLY?!?!? */
The source hasn't changed since the OpenSolaris days. Since the compiler doesn't know z=x*y affects the results of _getsw(), it reserves the right to reorder, and apparently our gcc4.4 that builds illumos-gate exercises that right. One person who did not see this reports using a different compiler to build, and that clearly can make a difference.
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
http://kebe.com/~danmcd/webrevs/5224/
It involves fixing both instances of __mul_set and __div_set. I couldn't find any other set-register calls like this, but there may be more.
Is there a better solution to this? The C code mentions that these functions should be inline assembly. I'm willing to consider that, especially in the light of compilers may change behavior out from underneath us. :(
Any suggestions are welcome!
Thanks,
Dan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/26621013-05fb2a43
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com

Loading...