05/12/2010 @09:16:02 ^10:31:11
serialisation of floating point data
Yesterday the topic came up of how to store floating point variables in a file or send them over the network so they can be reloaded later, possibly by another machine, possibly with a different architecture, and be exactly the same as when they were stored. This is a general problem, called serialisation.
Consider for example the problem of one human brain, containing a complex image or abstract concept, having to transfer that image or concept into another human brain. This is a serialisation problem, one at which I am very bad... better known as "talking to people".
For integral data types it's easy. You just write the thing as a stream of bytes. The only thing you really have to do is make sure you're reading the bytes in the right order (endianness.) This is a solved problem, go read about network byte order and host byte order for instance.
For floats and doubles it's a bit more complicated though. How do you write out a double and then read it back later, knowing you have the same thing? Here are the outlines of three approaches.
Type punning
Stick the double in a union along with an array of the same size as a double.
union { double d; uint8_t c[sizeof(double)]; } u;
Then you write your double into u.d and loop over the bytes, writing each one out at a time, just like writing out an int32_t or whatever.
Pros Fairly simple. Probably looks like the serialisation code you have already.
Cons Not particularly portable. Assumes floating point data format is the same on both ends. While this is more or less true in modern times it cannot be relied upon. Even if that's a risk you're willing to take you still have to worry about endianness.
Write the double as a string
Convert the double into an array of characters with sprintf. To read it back, use strtod. C99 has a printf format specifier "%a" that seems made for this.
Pros Has the nice bonus that strings are human-readable.
Cons How do you know how much precision to use so you read back exactly the same number as you wrote? The manual page for printf suggests that's what it is for, but of course %a doesn't exist in worlds where C99 support doesn't exist (guess who I mean!)
frexp/ldexp
This is an approach I didn't think of and had to try out myself.
These are two floating point functions (defined in math.h, link with -m). The first decomposes a double into an integer exponent and a normalized fractional mantissa in the interval [½,1). The second puts it back again. Now what?
My naive approach was to multiply the fractional part by INT64_MAX to convert it into an int64_t. Now you have two integers (of sizes 4 and 8 bytes respectively, so you're using up 4 extra bytes per double stored) and these can be serialised using your existing functions for integral data types.
struct fpack { int32_t exp; int64_t frac; }; void serialize(struct fpack *pack, double d) { double mantissa = frexp(d, &(pack->exp)); pack->frac = INT64_MAX * mantissa; } double unserialize(struct fpack *pack) { double mantissa = (double)(pack->frac) / INT64_MAX; return ldexp(mantissa, pack->exp); }
I've tested this a bit - get a double, convert it to a struct and back, compare it to the original - and it seems to work for most everything I threw at it (several thousand random strings of digits and a decimal point somewhere) but with the exceptions of infinities, "Not-A-Number" cases etc.
Pros uses standard functions so it should be portable.
Cons not enough testing/edge case handling.
I suspect printf("%a") is doing something like this but a more clever variant. But ultimately it begins by calling frexp.
Conclusion
Strings are probably best, but I like that trick with frexp/ldexp. It just needs more testing and attention to edge cases.
And of course if you want strings but your platform doesn't have printf("%a") you could always pilfer a free implementation that's compatible with your licence.