Dan McDonald via illumos-developer
2014-10-22 19:19:27 UTC
This is a long note. Please be patient, as it's fascinating. TLDR, I'm curious if the changes at http://kebe.com/~danmcd/webrevs/5224/ are sufficient? You should read on, however.
Community member Richard PALO reported a bug to us, and I told him to open an illumos bug for it. He did, and it's the aforementioned 5224. He put a lot of detail in there. I'm going to attempt to simplify it a bit to get to what I discovered, and sorta fixed with the help of Rich Lowe.
Richard P was trying to get rounding of his floating point numbers when printing. They were failing. One of his many test programs in 5224 spews out lots of failures over the course of many tests. Here's one example failure:
failure:f precision 17 for 01/03 => +0.33333333333333332 (+0.333333333333333314830)
See how the rounded version's last digit goes to 2, but the longer version should really force it to 1? Yeah, that's a problem.
Richard P did truss on libc, and discovered that newer versions of libc (anything after oi_151a9) would fail. He also discovered that there was a problem with libc's __mul_set(), which printf's floating-point-to-printable conversions end up using. A disassembly of old __mul_set() and newer __mul_set() shows disturbing things.
Here's the C source for 32-bit:
/*
* Multiplies two normal or subnormal doubles, returns result and exceptions.
*/
double
__mul_set(double x, double y, int *pe) {
extern void _putsw(), _getsw();
int sw;
double z;
_putsw(0); /* XXX KEBE SAYS WATCH */
z = x * y; /* THIS ORDERING */
_getsw(&sw); /* CAREFULLY! */
if ((sw & 0x3f) == 0) {
*pe = 0;
} else {
/* Result may not be exact. */
*pe = 1;
}
return (z);
}
Note the ordering of the _putsw(), the multiply, and the _getsw(). For this function to work, they MUST execute in that order. And they do on OI 151a9:
__mul_set+0x16: 6a 00 pushl $0x0
__mul_set+0x18: e8 cb 06 00 00 call +0x6cb <_putsw> /* XXX KEBE SAYS CALL... */
__mul_set+0x1d: 83 c4 10 addl $0x10,%esp
__mul_set+0x20: dd 45 08 fldl 0x8(%ebp)
__mul_set+0x23: dc 4d 10 fmull 0x10(%ebp) /* XXX KEBE SAYS MULTIPLY... */
__mul_set+0x26: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x29: 9b fwait
__mul_set+0x2a: 83 ec 0c subl $0xc,%esp
__mul_set+0x2d: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x30: 50 pushl %eax
__mul_set+0x31: e8 a2 06 00 00 call +0x6a2 <_getsw> /* AND CALL AGAIN */
But on ALMOST ANYTHING more recent (OmniOS of any supported release, SmartOS, OI Hipster), we get this:
__mul_set+0x25: e8 3f 05 00 00 call +0x53f <_putsw> /* CALL... */
__mul_set+0x2a: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x2d: 89 04 24 movl %eax,(%esp)
__mul_set+0x30: e8 24 05 00 00 call +0x524 <_getsw> /* ... CALL AGAIN?!?! */
__mul_set+0x35: 8b 45 f4 movl -0xc(%ebp),%eax
__mul_set+0x38: 83 e0 3f andl $0x3f,%eax
__mul_set+0x3b: 0f 95 c2 setne %dl
__mul_set+0x3e: 0f b6 d2 movzbl %dl,%edx
__mul_set+0x41: 8b 45 18 movl 0x18(%ebp),%eax
__mul_set+0x44: 89 10 movl %edx,(%eax)
__mul_set+0x46: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x49: dc 4d d8 fmull -0x28(%ebp) /* THEN MULTIPLY?!?!? */
The source hasn't changed since the OpenSolaris days. Since the compiler doesn't know z=x*y affects the results of _getsw(), it reserves the right to reorder, and apparently our gcc4.4 that builds illumos-gate exercises that right. One person who did not see this reports using a different compiler to build, and that clearly can make a difference.
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to nudge the compiler in the right direction. After that, I now get:
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
The webrev for this change is here:
http://kebe.com/~danmcd/webrevs/5224/
It involves fixing both instances of __mul_set and __div_set. I couldn't find any other set-register calls like this, but there may be more.
Is there a better solution to this? The C code mentions that these functions should be inline assembly. I'm willing to consider that, especially in the light of compilers may change behavior out from underneath us. :(
Any suggestions are welcome!
Thanks,
Dan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Community member Richard PALO reported a bug to us, and I told him to open an illumos bug for it. He did, and it's the aforementioned 5224. He put a lot of detail in there. I'm going to attempt to simplify it a bit to get to what I discovered, and sorta fixed with the help of Rich Lowe.
Richard P was trying to get rounding of his floating point numbers when printing. They were failing. One of his many test programs in 5224 spews out lots of failures over the course of many tests. Here's one example failure:
failure:f precision 17 for 01/03 => +0.33333333333333332 (+0.333333333333333314830)
See how the rounded version's last digit goes to 2, but the longer version should really force it to 1? Yeah, that's a problem.
Richard P did truss on libc, and discovered that newer versions of libc (anything after oi_151a9) would fail. He also discovered that there was a problem with libc's __mul_set(), which printf's floating-point-to-printable conversions end up using. A disassembly of old __mul_set() and newer __mul_set() shows disturbing things.
Here's the C source for 32-bit:
/*
* Multiplies two normal or subnormal doubles, returns result and exceptions.
*/
double
__mul_set(double x, double y, int *pe) {
extern void _putsw(), _getsw();
int sw;
double z;
_putsw(0); /* XXX KEBE SAYS WATCH */
z = x * y; /* THIS ORDERING */
_getsw(&sw); /* CAREFULLY! */
if ((sw & 0x3f) == 0) {
*pe = 0;
} else {
/* Result may not be exact. */
*pe = 1;
}
return (z);
}
Note the ordering of the _putsw(), the multiply, and the _getsw(). For this function to work, they MUST execute in that order. And they do on OI 151a9:
__mul_set+0x16: 6a 00 pushl $0x0
__mul_set+0x18: e8 cb 06 00 00 call +0x6cb <_putsw> /* XXX KEBE SAYS CALL... */
__mul_set+0x1d: 83 c4 10 addl $0x10,%esp
__mul_set+0x20: dd 45 08 fldl 0x8(%ebp)
__mul_set+0x23: dc 4d 10 fmull 0x10(%ebp) /* XXX KEBE SAYS MULTIPLY... */
__mul_set+0x26: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x29: 9b fwait
__mul_set+0x2a: 83 ec 0c subl $0xc,%esp
__mul_set+0x2d: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x30: 50 pushl %eax
__mul_set+0x31: e8 a2 06 00 00 call +0x6a2 <_getsw> /* AND CALL AGAIN */
But on ALMOST ANYTHING more recent (OmniOS of any supported release, SmartOS, OI Hipster), we get this:
__mul_set+0x25: e8 3f 05 00 00 call +0x53f <_putsw> /* CALL... */
__mul_set+0x2a: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x2d: 89 04 24 movl %eax,(%esp)
__mul_set+0x30: e8 24 05 00 00 call +0x524 <_getsw> /* ... CALL AGAIN?!?! */
__mul_set+0x35: 8b 45 f4 movl -0xc(%ebp),%eax
__mul_set+0x38: 83 e0 3f andl $0x3f,%eax
__mul_set+0x3b: 0f 95 c2 setne %dl
__mul_set+0x3e: 0f b6 d2 movzbl %dl,%edx
__mul_set+0x41: 8b 45 18 movl 0x18(%ebp),%eax
__mul_set+0x44: 89 10 movl %edx,(%eax)
__mul_set+0x46: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x49: dc 4d d8 fmull -0x28(%ebp) /* THEN MULTIPLY?!?!? */
The source hasn't changed since the OpenSolaris days. Since the compiler doesn't know z=x*y affects the results of _getsw(), it reserves the right to reorder, and apparently our gcc4.4 that builds illumos-gate exercises that right. One person who did not see this reports using a different compiler to build, and that clearly can make a difference.
Rich Lowe suggested that I declare 'z' above to be a VOLATILE, in order to nudge the compiler in the right direction. After that, I now get:
__mul_set+0x21: e8 19 05 00 00 call +0x519 <_putsw> /* CALL */
__mul_set+0x26: dd 45 e0 fldl -0x20(%ebp)
__mul_set+0x29: dc 4d d8 fmull -0x28(%ebp) /* MULTIPLY */
__mul_set+0x2c: dd 5d e8 fstpl -0x18(%ebp)
__mul_set+0x2f: 8d 45 f4 leal -0xc(%ebp),%eax
__mul_set+0x32: 89 04 24 movl %eax,(%esp)
__mul_set+0x35: e8 f5 04 00 00 call +0x4f5 <_getsw> /* CALL AGAIN */
Where the multiply actually happens in the middle.
The webrev for this change is here:
http://kebe.com/~danmcd/webrevs/5224/
It involves fixing both instances of __mul_set and __div_set. I couldn't find any other set-register calls like this, but there may be more.
Is there a better solution to this? The C code mentions that these functions should be inline assembly. I'm willing to consider that, especially in the light of compilers may change behavior out from underneath us. :(
Any suggestions are welcome!
Thanks,
Dan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com