struct doesn't handle NaN values?

Discussion:

Grant Edwards

2004-05-13 21:37:50 UTC

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

Of course, if you want to worry about endian issues here, I think
you're SOL with using struct.

Yes, I want to worry about endian issues. I'm communicating
via DeviceNet, and DeviceNet specifies little-endian. I don't
want to assume the host is little-endian, so I guess I'm SOL. :(

OTOH, I suppose if I assume that 32-bit integers and floats
have the same byte order, I can use 32-bit native integers as
an intermediate format:

nanString = struct.pack("f",float('nan'))
nanInt, = struct.unpack("I",nanString)
extNan = struct.pack("<I",nanInt)

That's ugly, and I'm sure will break someday, but I guess it
works for now.

I suppose I should take a look at the sources for struct and
see if I can fix it...

--
Grant Edwards grante Yow! How's it going in
at those MODULAR LOVE UNITS??
visi.com

Tim Peters

2004-05-14 02:51:09 UTC

Permalink

[Grant Edwards]

My question was which "native" and "standard" mode?
There appear to be two different "modes": "byte order" and "size and
alignment". Which of the two modes determines the floating
point representation to be used? My interpretation of the doc
was the latter: use native FP representation when it says
"native" in the "size and alignment" column and use IEEE when
it says "standard" in the "size and alignment" column.

Yes, that's correct.

...

In order to provide robust translation between native and IEEE
floating point formats, Python is going to have to know what
the native format is.

Of course.

Recognizing and generating IEEE NaNs, infinities, 0's and-
denormals is easy enough.

[etc]

There's nothing new to be said about any of this, and I don't have time to
pursue it regardless. If you want to commit to improving the story here,
please do. Others have tried, over the course of a decade, but nothing has
come of it apart from the PEP 754 reference implementation (which doesn't
address struct or pickle issues). It's a large task to give Python a *good*
x-platform 754 story, but it would indeed be easy to make large isolated
improvements in small areas on major platforms.

Grant Edwards

2004-05-13 19:44:59 UTC

Permalink

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

x = float('nan')
struct.pack("<f",x)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
SystemError: frexp() result out of range

struct.unpack("<f",'\xff\xff\xff\xff')

(-6.8056469327705772e+38,)

I don't have my copy of 754 at hand, but I'm pretty sure that
0xffffffff is a NaN (printf on IA32 Linux agrees) and not
-6.8056469327705772e+38 as claimed by struct.unpack().

--
Grant Edwards grante Yow! Of course, you
at UNDERSTAND about the PLAIDS
visi.com in the SPIN CYCLE --

Grant Edwards

2004-05-14 01:59:18 UTC

Permalink

Which part of the C library is broken?

Which C library?

That's what I'm askeing? I was told the C library was broken.
I wanted to know what C library.

The breakage has to do with the character values they use to
denote infinities and NaN,

Right. That's got nothing to do with what I'm currently
whining about, which was struct's failure when byte ordering is
specified to convert properly between a native NaN and IEEE 754
NaN (even when native format is IEEE 754) . I'm quite happy
with the way string<->native-float handles NaNs on the host I'm
using. Not that it wouldn't be nice for it to be consistent
across platforms.

not (as far as I know) with the detection of them. Whether
they are "broken" is a matter of interpretation, since the C
standard didn't specify what the library should expect and
return when converting from string to float and vice versa.
What is true is that they aren't consistent.

Though I realize there are consistency issues, they work quite
well enough for my application at the moment. Struct, however,
doesn't.

I presumed that struct was doing the conversion itself. The
doc specified IEEE format, it would have to do it's own
conversion since it couldn't assume that the host used IEEE
format, and as you say, there's no portable library support
that can be relied upon.

I don't think so.

Eh? So you think that struct is using a C library to do the
conversion, or that there is portable C library support for
converting between native FP format and IEEE 754?

I believe that the reference to IEEE means that it generally
expects bit compatible IEEE representations.

Hmm, what would be the difference would be between standrd IEEE
representation and a "bit-compatible IEEE representation".

There may be differences, but I doubt if struct is aware of
them. As usual, the source would be definitive.

It also contains a reference to a module that handles the
matter.

I read it. It only handles double-precision values, and I'm
working with single-precision values.

I expect it would be easy enough to convert. I just read it,
and it doesn't look at all complicated.

Probably not. I've done it before (in C) but having struct
convert the values properly would be right thing to do rather
expect the user to check for certain values that struct doesn't
convert correctly.

--
Grant Edwards grante Yow! My LESLIE GORE record
at is BROKEN...
visi.com

John Roth

2004-05-14 01:38:31 UTC

Permalink

"Grant Edwards" <grante at visi.com> wrote in message

Which part of the C library is broken?

Which C library?

That's what I'm asked? I was told the C library was broken. I
wanted to know what C library.

The breakage has to do with the character values they
use to denote infinities and NaN, not (as far as
I know) with the detection of them. Whether they are
"broken" is a matter of interpretation, since the C standard
didn't specify what the library should expect and return
when converting from string to float and vice versa.
What is true is that they aren't consistent.

Python runs on 20 different systems, many of which have
multiple operating systems, each of which has its own C
library with its own problems. It's not even a standards
issue: the older standards didn't specify what the C library's
conversion routines should do.

I don't think so. I believe that the reference to IEEE means
that it generally expects bit compatible IEEE representations.
There may be differences, but I doubt if struct is aware of
them. As usual, the source would be definitive.

I referred you to PEP 754 for a reason. That PEP contains
a thorough discussion of the issues in the treatment of special
values in Python floating point.
http://www.python.org/peps/pep-0754.html
It also contains a reference to a module that handles
the matter.

I read it. It only handles double-precision values, and I'm
working with single-precision values.

I expect it would be easy enough to convert. I just read
it, and it doesn't look at all complicated.

John Roth

--
Grant Edwards grante Yow! I just had my

entire

at INTESTINAL TRACT coated
visi.com with TEFLON!

John Roth

2004-05-13 21:59:49 UTC

Permalink

"Grant Edwards" <grante at visi.com> wrote in message

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

I believe that struct uses the c library as is,

That's not what the docs say. The struct docs says it converts
between native Python values and IEEE 754. If that's not the
case and it converts between Python float format and "native C
library" format, then the docs need to be changed [and I'll
need to write something that converts to/from IEEE 754 format.]

Actually, it doesn't. It does say IEEE, but it doesn't say
IEEE 754. While that's a nit, it also doesn't say it converts
anything other than byte order. When you look at the source,
I suspect that what you'll find is a straight copy from the buffer
into wherever the float object keeps the actual value, or the
reverse, of course.

like the float support in general. Unfortunately, that leaves
everyone at the mercy of the rather inconsistent and
ideosyncratic implementations of corner cases in existing c
libraries.

1) 0xffffffff is treated correctly as a NaN.
2) NaN's are detected and converted to strings as 'nan'.
Which part of the C library is broken?

Which C library? Python runs on 20 different systems,
many of which have multiple operating systems, each
of which has its own C library with its own problems.
It's not even a standards issue: the older standards didn't
specify what the C library's conversion routines should do.

I referred you to PEP 754 for a reason. That PEP contains
a thorough discussion of the issues in the treatment of special
values in Python floating point.

http://www.python.org/peps/pep-0754.html

It also contains a reference to a module that handles
the matter.

John Roth

--
Grant Edwards grante Yow! AIEEEEE! I am

having

at an UNDULATING

EXPERIENCE!

visi.com

Grant Edwards

2004-05-13 22:12:10 UTC

Permalink

Which part of the C library is broken?

Which C library?

That's what I'm asked? I was told the C library was broken. I
wanted to know what C library.

I read it. It only handles double-precision values, and I'm
working with single-precision values.

--
Grant Edwards grante Yow! I just had my entire
at INTESTINAL TRACT coated
visi.com with TEFLON!

John Roth

2004-05-13 20:18:10 UTC

Permalink

"Grant Edwards" <grante at visi.com> wrote in message

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

x = float('nan')
struct.pack("<f",x)

File "<stdin>", line 1, in ?
SystemError: frexp() result out of range

struct.unpack("<f",'\xff\xff\xff\xff')

(-6.8056469327705772e+38,)
I don't have my copy of 754 at hand, but I'm pretty sure that
0xffffffff is a NaN (printf on IA32 Linux agrees) and not
-6.8056469327705772e+38 as claimed by struct.unpack().

I believe that struct uses the c library as is, like the
float support in general. Unfortunately, that leaves
everyone at the mercy of the rather inconsistent
and ideosyncratic implementations of corner cases
in existing c libraries. See PEP 754 for a discussion
of the issues.

John Roth

--
Grant Edwards grante Yow! Of course, you
at UNDERSTAND about the PLAIDS
visi.com in the SPIN CYCLE --

Grant Edwards

2004-05-13 21:28:37 UTC

Permalink

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

I believe that struct uses the c library as is,

like the float support in general. Unfortunately, that leaves
everyone at the mercy of the rather inconsistent and
ideosyncratic implementations of corner cases in existing c
libraries.

But, my c library seems to handle it correctly:

1) 0xffffffff is treated correctly as a NaN.
2) NaN's are detected and converted to strings as 'nan'.

Which part of the C library is broken?

--
Grant Edwards grante Yow! AIEEEEE! I am having
at an UNDULATING EXPERIENCE!
visi.com

David M. Cooke

2004-05-13 21:08:41 UTC

Permalink

Perhaps I'm doing something wrong: the struct module docs say
it's IEE 754, but I can't figure out how to get it to handle
NaN values correctly (either packing or unpacking).

x = float('nan')
struct.pack("<f",x)

File "<stdin>", line 1, in ?
SystemError: frexp() result out of range

struct.pack('f', x)

'\x00\x00\xc0\x7f'

...which is a NaN since the exponent part is all 1's and the
significand is non-zero.

struct.unpack("<f",'\xff\xff\xff\xff')

(-6.8056469327705772e+38,)

Again,

struct.unpack('f', '\xff\xff\xff\xff')

(nan,)

Of course, if you want to worry about endian issues here, I think
you're SOL with using struct.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Tim Peters

2004-05-13 22:58:36 UTC

Permalink

[Grant Edwards]

There's a table that clearly defines when it [struct] uses "native" vs.
"standard" byte-order and size-and-alignment.
One assumes that the floating point _representation_ could
likewise be native or standard (IEEE),

Yes, this is the case.

but it never specifies which FP representation is used when.

The same as everything else: in native mode, whatever float and double
representation the platform uses is what struct uses, just as in native mode
struct uses whatever the platform uses for chars, shorts, ints and longs.
In standard mode, the representation is forced to IEEE 754 float or double
format. But it's still the case that all behavior wrt NaNs, Infs, and
signed zeroes is an accident in standard mode. Indeed, it's precisely
*because* standard mode tries to force the representation to a known format
(and Python has no idea whether the platform it's running on uses 754 format
natively or not) that these accidents occur. C89 predates 754 adoption, and
so offers no portable facilities even for recognizing whether a thing is a
NaN, Inf, or signed 0. "Standard" C tricks like

if (x != x) { /* then x is a NaN */ }

don't actually work across platforms (although many with limited x-platform
experience believe they do).

Since "IEEE" is mentioned in the discussion of "standard size and
alignment",

And only there.

I would guess that FP native vs. standard representation
matches the native vs. standard state of "size and alignment".

I'm not sure what that sentence said, but bet it's right <wink>.

John Roth

2004-05-14 01:43:18 UTC

Permalink

"Grant Edwards" <grante at visi.com> wrote in message

In article <mailman.538.1084489118.25742.python-list at python.org>, Tim
Are there architectures that support multiple floating point
representations that can only be determined at run-time?

IBM mainframes. However, I don't believe Python supports
them, and in any case, one of them is vanilla IEEE 754.

John Roth

--
Grant Edwards grante Yow! I have a TINY

BOWL in

at my HEAD
visi.com

Grant Edwards

2004-05-13 21:42:00 UTC

Permalink

the struct module docs say it's IEE 754, but I can't figure
out how to get it to handle NaN values correctly (either
packing or unpacking).

All Python behavior in the presence of 754 special values
(infs, NaNs, signed zeroes) is a platform-dependent accident.

I guess the doc that claim struct supports IEEE 754 need to
have a few footnotes added.

There's a growing list of these in PEP 42 (under
"Non-accidental IEEE-754 support"), but nobody even bothers to
keep that up to date.

x = float('nan')

It's even an accident that this line didn't raise an exception
(it does, for example, under the Windows Python).

That would be fine. Either the correct answer or an exception
I can handle would be acceptable.

struct.unpack("<f",'\xff\xff\xff\xff') (-6.8056469327705772e+38,)

The C routine that gets invoked here is _PyFloat_Unpack4(), in
floatobject.c. As the comment there says,
/* XXX This sadly ignores Inf/NaN issues */
That is, the outcome of this is also an accident.

Ah. That needs to be fixed. It should either return a correct
value or raise an exception. Silently returning wrong answers
is hardly the way of "least surprises".

--
Grant Edwards grante Yow! I'm DESPONDENT... I
at hope there's something
visi.com DEEP-FRIED under this
miniature DOMED STADIUM...

Grant Edwards

2004-05-13 22:37:47 UTC

Permalink

Post by Grant Edwards

/* XXX This sadly ignores Inf/NaN issues */
That is, the outcome of this is also an accident.

Ah. That needs to be fixed. It should either return a correct
value or raise an exception. Silently returning wrong answers
is hardly the way of "least surprises".

A lot of people would like the whole mess to be fixed.
Unfortunately, nobody has stepped forward to do it,
considering that it's quite a job to do right.

Fixing struct would be a start, but I'm a little confused by
the documentation.

There's a table that clearly defines when it uses "native" vs.
"standard" byte-order and size-and-alignment.

One assumes that the floating point _representation_ could
likewise be native or standard (IEEE), but it never specifies
which FP representation is used when. Since "IEEE" is
mentioned in the discussion of "standard size and alignment", I
would guess that FP native vs. standard representation matches
the native vs. standard state of "size and alignment".

Is that how others interpret the doc?

Is that the behavior others would want?

--
Grant Edwards grante Yow! They don't hire
at PERSONAL PINHEADS,
visi.com Mr. Toad!

John Roth

2004-05-13 22:11:22 UTC

Permalink

"Grant Edwards" <grante at visi.com> wrote in message

Post by Grant Edwards

the struct module docs say it's IEE 754, but I can't figure
out how to get it to handle NaN values correctly (either
packing or unpacking).

All Python behavior in the presence of 754 special values
(infs, NaNs, signed zeroes) is a platform-dependent accident.

I guess the doc that claim struct supports IEEE 754 need to
have a few footnotes added.

There's a growing list of these in PEP 42 (under
"Non-accidental IEEE-754 support"), but nobody even bothers to
keep that up to date.

x = float('nan')

It's even an accident that this line didn't raise an exception
(it does, for example, under the Windows Python).

That would be fine. Either the correct answer or an exception
I can handle would be acceptable.

struct.unpack("<f",'\xff\xff\xff\xff') (-6.8056469327705772e+38,)

The C routine that gets invoked here is _PyFloat_Unpack4(), in
floatobject.c. As the comment there says,
/* XXX This sadly ignores Inf/NaN issues */
That is, the outcome of this is also an accident.

Ah. That needs to be fixed. It should either return a correct
value or raise an exception. Silently returning wrong answers
is hardly the way of "least surprises".

A lot of people would like the whole mess to be fixed. Unfortunately,
nobody has stepped forward to do it, considering that it's quite
a job to do right. Given that there's probably a bunch of stuff out there
that depends on the current system dependent idiosyncracies,
it would have to be done with the "from future import ..." multi-release
phase in.

Now that I've found the reference module, I'll probably put it
into PyFIT in the next few days, but that simply patches the
problem for one application.

John Roth

Post by Grant Edwards
--
Grant Edwards grante Yow! I'm DESPONDENT... I
at hope there's something
visi.com DEEP-FRIED under this
miniature DOMED STADIUM...

Jeff Epler

2004-05-14 01:34:48 UTC

Permalink

In order to provide robust translation between native and IEEE
floating point formats, Python is going to have to know what
the native format is.

No, it merely needs to use ldexp() with the proper values. Did you try
reading what Python actually does? See Objects/floatobject.c's
function family _PyFloat_{Unpack,Pack}{4,8}

Jeff

Tim Peters

2004-05-13 20:09:10 UTC

Permalink

[Grant Edwards]
Not really.

the struct module docs say it's IEE 754, but I can't figure out
how to get it to handle NaN values correctly (either packing or
unpacking).

All Python behavior in the presence of 754 special values (infs, NaNs,
signed zeroes) is a platform-dependent accident. There's a growing list of
these in PEP 42 (under "Non-accidental IEEE-754 support"), but nobody even
bothers to keep that up to date.

x = float('nan')

It's even an accident that this line didn't raise an exception (it does, for
example, under the Windows Python).

struct.pack("<f",x)

File "<stdin>", line 1, in ?
SystemError: frexp() result out of range

An accident of what your platform C's frexp() happens to do with a NaN.

struct.unpack("<f",'\xff\xff\xff\xff') (-6.8056469327705772e+38,)

The C routine that gets invoked here is _PyFloat_Unpack4(), in
floatobject.c. As the comment there says,

/* XXX This sadly ignores Inf/NaN issues */

That is, the outcome of this is also an accident.

Grant Edwards

2004-05-14 02:26:58 UTC

Permalink

Post by Jeff Epler

In order to provide robust translation between native and IEEE
floating point formats, Python is going to have to know what
the native format is.

No, it merely needs to use ldexp() with the proper values.

If you don't know what the native representations for NaN and
infinity are, how do you know what to pass to ldexp() when the
conversion routine has been passed an IEEE NaN/Inf and needs to
create a native one?

If you do pass ldexp() the proper exponent for a native NaN,
are you sure it will work? I would expect that doing so would
be an error.

How do you detect a native NaN using ldexp() or frexp() or some
other standard C library function?

Post by Jeff Epler
Did you try reading what Python actually does?

Yes.

Post by Jeff Epler
See Objects/floatobject.c's function family _PyFloat_{Unpack,Pack}{4,8}

I did. It states quite clearly in floatobject.h:

Bug: What this does is undefined if x is a NaN or infinity.

Bug: What this does is undefined if the string represents a NaN or infinity.

I'd like to fix those Bugs so that the conversions between
native NaN/Inf and IEEE NaN/Inf works. I don't see how you can
get around the requirement for knowledge about the native
representation of NaNs and Infinities.

--
Grant Edwards grante Yow! I'm in ATLANTIC CITY
at riding in a comfortable
visi.com ROLLING CHAIR...

Grant Edwards

2004-05-14 03:58:10 UTC

Permalink

Post by Tim Peters
There's nothing new to be said about any of this, and I don't
have time to pursue it regardless. If you want to commit to
improving the story here, please do. Others have tried, over
the course of a decade, but nothing has come of it apart from
the PEP 754 reference implementation (which doesn't address
struct or pickle issues). It's a large task to give Python a
*good* x-platform 754 story, but it would indeed be easy to
make large isolated improvements in small areas on major
platforms.

I'd like to work on the pack/unpack code so that pickle/strcut
handle NaNs and Infinities. It should be easy enough on
platforms that use 754 as the native format, and seems like a
reasonable first step. However, I've got a gcc patch to finish
first, and ...

--
Grant Edwards grante Yow! Put FIVE DOZEN red
at GIRDLES in each CIRCULAR
visi.com OPENING!!

Grant Edwards

2004-05-14 01:07:25 UTC

Permalink

but it never specifies which FP representation is used when.

My question was which "native" and "standard" mode? There
appear to be two different "modes": "byte order" and "size and
alignment". Which of the two modes determines the floating
point representation to be used? My interpretation of the doc
was the latter: use native FP representation when it says
"native" in the "size and alignment" column and use IEEE when
it says "standard" in the "size and alignment" column.

But it's still the case that all behavior wrt NaNs, Infs, and
signed zeroes is an accident in standard mode. Indeed, it's
precisely *because* standard mode tries to force the
representation to a known format (and Python has no idea
whether the platform it's running on uses 754 format natively
or not) that these accidents occur.

In order to provide robust translation between native and IEEE
floating point formats, Python is going to have to know what
the native format is.

C89 predates 754 adoption, and so offers no portable
facilities even for recognizing whether a thing is a NaN, Inf,
or signed 0. "Standard" C tricks like
if (x != x) { /* then x is a NaN */ }
don't actually work across platforms (although many with
limited x-platform experience believe they do).

Recognizing and generating IEEE NaNs, infinities, 0's and
denormals is easy enough.

Recognizing and generating native infinities, 0's and denormals
would require some compile-time configuration, but that's not
difficult either. All the C compilers I've used in the past
dozen or two years provide pre-processor symbols to tell you
want architecture you're compiling for. If one doesn't want to
rely on that, compiling and running some simple test programs
a-la autoconf should be able to determine pretty reliably if
the host is using IEEE representation or not.

Since the vast majority of hosts out there use IEEE
representation, and the C compiler can tell you that at
compile-tiem, I see no reason why struct can't be made to work
better. IIRC, the other FP representations I've worked with
(TI and DEC) were both minor variations on IEEE 754 and both
provided NaNs and infinities. Why shouldn't we expect struct
to convert an IEEE NaN into a native NaN (and the reverse)?

Are there architectures that support multiple floating point
representations that can only be determined at run-time?

I would guess that FP native vs. standard representation
matches the native vs. standard state of "size and alignment".

I'm not sure what that sentence said, but bet it's right <wink>.

I tried to state it more clearly above.

--
Grant Edwards grante Yow! I have a TINY BOWL in
at my HEAD
visi.com